Functionathon: a manual data mining workflow to generate functional hypotheses for uncharacterized human proteins and its application by undergraduate students

PubMed

Paik

Y.-K.

Lane

Kawamura

et al. (

2018

)

Launching the C-HPP pilot project for functional characterization of identified proteins with no known function

J. Proteome Res.

4042

–

4050

Duek

Gateau

Bairoch

et al. (

2018

)

Exploring the uncharacterized human proteome using neXtProt

J. Proteome Res.

4211

–

4226

Duek

and

Lane

(

2019

)

Worming into the uncharacterized human proteome

J. Proteome Res.

4143

–

4153

Vandenbrouck

Pineau

and

Lane

(

2020

)

The functionally unannotated proteome of human male tissues: a shared resource to uncover new protein functions associated with reproductive biology

J. Proteome Res.

4782

–

4794

Auchincloss

L.C.

Laursen

S.L.

Branchaw

J.L.

et al. (

2014

)

Assessment of course-based undergraduate research experiences: a meeting report

CBE Life Sci. Educ.

–

Pope

W.H.

Bowman

C.A.

Russell

D.A.

et al. (

2015

)

Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

Elife

, e06416.

Ramsey

McIntosh

Renfro

et al. (

2021

)

Crowdsourcing biocuration: the Community Assessment of Community Annotation with Ontologies (CACAO)

bioRxiv

, 2021.04.30.440339.

Bowling

B.V.

Schultheis

P.J.

and

Strome

E.D.

(

2016

)

Implementation and assessment of a yeast orphan gene research project: involving undergraduates in authentic research experiences and progressing our understanding of uncharacterized open reading frames

Yeast

–

10.

The UniProt Consortium

. (

2017

)

UniProt: the universal protein knowledgebase

Nucleic Acids Res.

D158

–

D169

PubMed

11.

Schreiber

Patricio

Muffato

et al. (

2014

)

TreeFam v9: a new website, more species and orthology-on-the-fly

Nucleic Acids Res.

D922

–

D925

12.

Huerta-Cepas

Szklarczyk

Heller

et al. (

2019

)

EggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses

Nucleic Acids Res.

D309

–

D314

13.

Altenhoff

A.M.

Levy

Zarowiecki

et al. (

2019

)

OMA standalone: orthology inference among public and custom genomes and transcriptomes

Genome Res.

1152

–

1163

14.

Zimmermann

Stephens

Nam

S.Z.

et al. (

2018

)

A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core

J. Mol. Biol.

430

2237

–

2243

15.

Waterhouse

Bertoni

Bienert

et al. (

2018

)

SWISS-MODEL: homology modelling of protein structures and complexes

Nucleic Acids Res.

W296

–

W303

16.

Madeira

Park

Y.M.

Lee

et al. (

2019

)

The EMBL-EBI search and sequence analysis tools APIs in2019

Nucleic Acids Res.

W636

–

W641

17.

Mitchell

A.L.

Attwood

T.K.

Babbitt

P.C.

et al. (

2019

)

InterPro in 2019: improving coverage, classification and access to protein sequence annotations

Nucleic Acids Res.

D351

–

D360

18.

Horton

Park

K.J.

Obayashi

et al. (

2007

)

WoLF PSORT: protein localization predictor

Nucleic Acids Res.

W585

–

W587

19.

Almagro Armenteros

J.J.

Sønderby

C.K.

Sønderby

S.K.

et al. (

2017

)

DeepLoc: prediction of protein subcellular localization using deep learning

Bioinformatics

3387

–

3395

20.

Krogh

Larsson

Von Heijne

et al. (

2001

)

Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes

J. Mol. Biol.

305

567

–

580

21.

Käll

Krogh

and

Sonnhammer

E.L.L.

(

2004

)

A combined transmembrane topology and signal peptide prediction method

J. Mol. Biol.

338

1027

–

1036

22.

Armenteros

J.J.A.

Salvatore

Emanuelsson

et al. (

2019

)

Detecting sequence signals in targeting peptides using deep learning

Life Sci. Alliance

, e201900429.

23.

Bannai

Tamada

Maruyama

et al. (

2002

)

Extensive feature detection of N-terminal protein sorting signals

Bioinformatics

298

–

305

24.

Almagro Armenteros

J.J.

Tsirigos

K.D.

Sønderby

C.K.

et al. (

2019

)

SignalP 5.0 improves signal peptide predictions using deep neural networks

Nat. Biotechnol.

420

–

423

25.

Claros

M.G.

and

Vincens

(

1996

)

Computational method to predict mitochondrially imported proteins and their targeting sequences

Eur. J. Biochem.

241

779

–

786

26.

Bendtsen

J.D.

Jensen

L.J.

Blom

et al. (

2004

)

Feature-based prediction of non-classical and leaderless protein secretion

Protein Eng. Des. Sel.

349

–

356

27.

Nguyen Ba

A.N.

Pogoutse

Provart

et al. (

2009

)

NLStradamus: a simple hidden Markov model for nuclear localization signal prediction

BMC Bioinform.

, 202.

28.

Lin

J.R

and

(

2013

)

SeqNLS: nuclear localization signal prediction based on frequent pattern mining and linear motif scoring

PLoS One

, e76864.

29.

La Cour

Kiemer

Mølgaard

et al. (

2004

)

Analysis and prediction of leucine-rich nuclear export signals

Protein Eng. Des. Sel.

527

–

536

30.

Marquis

Pei

et al. (

2015

)

LocNES: a computational tool for locating classical NESs in CRM1 cargo proteins

Bioinformatics

1357

–

1365

31.

Eisenhaber

Bork

and

Eisenhaber

(

1999

)

Prediction of potential GPI-modification sites in proprotein sequences

J. Mol. Biol.

292

741

–

758

32.

Orchard

Ammari

Aranda

et al. (

2014

)

The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases

Nucleic Acids Res.

D358

–

D363

33.

Uhlén

Fagerberg

Hallström

B.M.

et al. (

2015

)

Proteomics. Tissue-based map of the human proteome

Science

347

, 1260419.

34.

Hruz

Laule

Szabo

et al. (

2008

)

Genevestigator V3: a reference expression database for the meta-analysis of transcriptomes

Adv. Bioinformatics

2008

–

35.

Zhu

Wong

A.K.

Krishnan

et al. (

2015

)

Targeted exploration and analysis of large cross-platform human transcriptomic compendia

Nat. Methods

211

–

214

36.

Muruganujan

Ebert

et al. (

2019

)

PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools

Nucleic Acids Res.

D419

–

D426

37.

Bult

C.J.

Blake

J.A.

Smith

C.L.

et al. (

2019

)

Mouse Genome Database (MGD) 2019

Nucleic Acids Res.

D801

–

D806

38.

Howe

D.G.

Bradford

Y.M.

Conlin

et al. (

2013

)

ZFIN, the Zebrafish model organism database: increased support for mutants and transgenics

Nucleic Acids Res.

D854

–

39.

Nenni

M.J.

Fisher

M.E.

James-Zorn

et al. (

2019

)

Xenbase: facilitating the use of Xenopus to model human disease

Front. Physiol.

, 154.

40.

Larkin

Marygold

S.J.

Antonazzo

et al. (

2021

)

FlyBase: updates to the Drosophila melanogaster knowledge base

Nucleic Acids Res.

D899

–

D907

41.

Harris

T.W.

Arnaboldi

Cain

et al. (

2020

)

WormBase: a modern model organism information resource

Nucleic Acids Res.

D762

–

D767

PubMed

42.

Birling

M.C.

Yoshiki

Adams

D.J.

et al. (

2019

)

A resource of targeted mutant mouse lines for 5,061 genes

bioRxiv

416

–

419

43.

Ashburner

Ball

C.A.

Blake

J.A.

et al. (

2000

)

Gene Ontology: tool for the unification of biology

Nat. Genet.

–

44.

Huntley

R.P.

Sawford

Mutowo-Meullenet

et al. (

2015

)

The GOA database: gene ontology annotation updates for 2015

Nucleic Acids Res.

D1057

–

D1063

45.

Carbon

Ireland

Mungall

C.J.

et al. (

2009

)

AmiGO: online access to ontology and annotation data

Bioinformatics

288

–

289

46.

Giglio

Tauber

Nadendla

et al. (

2019

)

Eco, the evidence & conclusion ontology: community standard for evidence information

Nucleic Acids Res.

D1186

–

D1194

47.

Merchant

S.S.

Prochnik

S.E.

Vallon

et al. (

2007

)

The Chlamydomonas genome reveals the evolution of key animal and plant functions

Science (80-)

318

245

–

251

48.

Okamura

Aoki

Obayashi

et al. (

2015

)

COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems

Nucleic Acids Res.

D82

–

D86

49.

Erdmann

V.A.

Szymanski

Hochberg

et al. (

2000

)

Non-coding, mRNA-like RNAs database Y2K

Nucleic Acids Res.

197

–

200

50.

Skarnes

W.C.

Rosen

West

A.P.

et al. (

2011

)

A conditional knockout resource for the genome-wide study of mouse gene function

Nature

474

337

–

344

51.

Radivojac

Clark

W.T.

Oron

T.R.

et al. (

2013

)

A large-scale evaluation of computational protein function prediction

Nat. Methods

221

–

227

52.

Ran

F.A.

Hsu

P.D.

Wright

et al. (

2013

)

Genome engineering using the CRISPR-Cas9 system

Nat. Protoc.

2281

–

2308

53.

Firth

A.L.

Dargitz

C.T.

Qualls

S.J.

et al. (

2014

)

Generation of multiciliated cells in functional airway epithelia from human induced pluripotent stem cells

Proc. Natl. Acad. Sci. USA

111

, E1723–E1730.

54.

Chu

H.W.

Rios

Huang

et al. (

2015

)

CRISPR-Cas9-mediated gene knockout in primary human airway epithelial cells reveals a proinflammatory role for MUC18

Gene Ther.

822

–

829

55.

Radford

Slattery

Jennings

et al. (

2012

)

Carcinogens induce loss of the primary cilium in human renal proximal tubular epithelial cells independently of effects on the cell cycle

Am. J. Physiol. - Ren. Physiol.

302

F905

–

F916

56.

Norris

D.P.

and

Grimes

D.T.

(

2012

)

Mouse models of ciliopathies: the state of the art

DMM Dis. Model. Mech.

299

–

312

57.

Jamsai

and

O’Bryan

M.K.

(

2011

)

Mouse models in male fertility research

Asian J. Androl.

139

–

151

58.

Tamowski

Aston

K.I.

and

Carrell

D.T.

(

2010

)

The use of transgenic mouse models in the study of male infertility

Syst. Biol. Reprod. Med.

260

–

273

59.

Werner

M.E.

and

Mitchell

B.J.

(

2013

)

Using Xenopus skin to study cilia development and function

Methods Enzymol.

525

191

–

217

60.

Walentek

and

Quigley

I.K.

(

2017

)

What we can learn from a tadpole about ciliopathies and airway diseases: using systems biology in Xenopus to study cilia and mucociliary epithelia

Genesis

61.

Choksi

S.P.

Babu

Lau

et al. (

2014

)

Systematic discovery of novel ciliary genes through functional genomics in the zebrafish

Development

141

3410

–

3419

62.

Sheppard

E.C.

Rogers

Harmer

N.J.

et al. (

2019

)

A universal fluorescence-based toolkit for real-time quantification of DNA and RNA nuclease activity

Sci. Rep.

, 8853.

63.

Franz-Wachtel

Eisler

S.A.

Krug

et al. (

2012

)

Global detection of protein kinase d-dependent phosphorylation events in nocodazole-treated human cells

Mol. Cell. Proteomics

160

–

170

64.

Woo

Kim

Lee

et al. (

2011

)

Modulation of exosome‐mediated mRNA turnover by interaction of GTP‐binding protein 1 (GTPBP1) with its target mRNAs

FASEB J.

2757

–

2769

65.

Chassé

Boulben

Costache

et al. (

2017

)

Analysis of translation using polysome profiling

Nucleic Acids Res.

, e15.

66.

Dominguez

Tsai

Y.H.

Weatheritt

et al. (

2016

)

An extensive program of periodic alternative splicing linked to cell cycle progression

Elife

, e10288.

67.

McPheeters

D.S.

and

Wise

J.A.

(

2013

)

Measurement of in vivo RNA synthesis rates

Meth. Enzymol.

530

117

–

135

68.

Guo

Iida

Bhavani

G.S.L.

et al. (

2021

)

Deficiency of TMEM53 causes a previously unknown sclerosing bone disorder by dysregulation of BMP-SMAD signaling

Nat. Commun.

, 2046.

69.

Gaudet

Livstone

M.S.

Lewis

S.E.

et al. (

2011

)

Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium

Brief. Bioinform.

449

–

462

70.

Rafi

and

Greenland

(

2020

)

Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise

BMC Med. Res. Methodol.

, 244.

71.

Buniello

Macarthur

J.A.L.

Cerezo

et al. (

2019

)

The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019

Nucleic Acids Res.

D1005

–

D1012

72.

Watanabe

Stringer

Frei

et al. (

2019

)

A global overview of pleiotropy and genetic architecture in complex traits

Nat. Genet.

1339

–

1348

73.

Zahn-Zabal

Attwood

T.K.

Foundation

T.G.

et al. (

2019

)

A critical guide to the neXtProt knowledgebase: querying using SPARQL

F1000Research

, 791.

74.

Mendes de Farias

Sima

A.C.

Dessimoz

et al. (

2020

)

A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL

F1000Research

, 1822.

75.

Zhou

Jiang

Bergquist

T.R.

et al. (

2019

)

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

Genome Biol.

, 244.

76.

Zhao

Zhang

Jiang

et al. (

2020

)

NPF: network propagation for protein function prediction

BMC Bioinform.

, 355.

77.

Zhang

Lane

Omenn

G.S.

et al. (

2019

)

Blinded testing of function annotation for uPE1 proteins by I-TASSER/COFACTOR pipeline using the 2018–2019 additions to neXtProt and the CAFA3 challenge

J. Proteome Res.

4154

–

4166

78.

Balakrishnan

Harris

M.A.

Huntley

et al. (

2013

)

A guide to best practices for Gene Ontology (GO) manual annotation

Database

2013

, bat054.

79.

Melaine

Com

Bellaud

et al. (

2018

)

Deciphering the dark proteome: use of the testis and characterization of two dark proteins

J. Proteome Res.

4197

–

4210

80.

Bontems

Fish

R.J.

Borlat

et al. (

2014

)

C2orf62 and TTC17 are involved in actin organization and ciliogenesis in zebrafish and human

PLoS One

, e86476.