Abstract

The Fungal Secretome KnowledgeBase (FunSecKB) provides a resource of secreted fungal proteins, i.e. secretomes, identified from all available fungal protein data in the NCBI RefSeq database. The secreted proteins were identified using a well evaluated computational protocol which includes SignalP, WolfPsort and Phobius for signal peptide or subcellular location prediction, TMHMM for identifying membrane proteins, and PS-Scan for identifying endoplasmic reticulum (ER) target proteins. The entries were mapped to the UniProt database and any annotations of subcellular locations that were either manually curated or computationally predicted were included in FunSecKB. Using a web-based user interface, the database is searchable, browsable and downloadable by using NCBI’s RefSeq accession or gi number, UniProt accession number, keyword or by species. A BLAST utility was integrated to allow users to query the database by sequence similarity. A user submission tool was implemented to support community annotation of subcellular locations of fungal proteins. With the complete fungal data from RefSeq and associated web-based tools, FunSecKB will be a valuable resource for exploring the potential applications of fungal secreted proteins.

Database URL:http://proteomics.ysu.edu/secretomes/fungi.php

Introduction

Fungi play an important role in carbon cycling as they use secreted enzymes to break down lignocelluloses and other biopolymers then transporting the resulting products into the cells as their food. The secreted proteins in plant associated fungi play important roles in plant and fungi symbiosis or fungal pathogenicity (1). Fungal secreted proteins also play important roles in the development of fungal diseases in human (2,3). Secreted fungal enzymes have found a wide range of applications in the food, feed, pulp and paper, bioethanol and textile industries (4).

Signal-peptide dependent secreted proteins contain a signal peptide (SP) at the N-terminus that directs the ribosomes to the rough endoplasmic reticulum (ER) for completing polypeptide synthesis (5,6). The signal peptide, typically 15–30 amino acids long and consisting of 15–20 hydrophobic amino acid residues, is cleaved off during translocation across the membrane. While some proteins without an N-terminal signal peptide can be found in the ER and the Golgi, over 90% of human secreted proteins (7) and ∼90% of the Aspergillus niger extracellular proteins identified by mass spectrometry contain classical N-terminal signal peptides (8). There are also examples of non-classically secreted proteins in fungi, including the Saccharomyces cerevisiae mating pheromone a-factor (9) and two galectins from Coprinus cinereus (10), but it is generally believed that the vast majority of secreted fungal proteins are processed by the classical secretory pathway (8).

The term secretome is often used to refer to the complete set of secreted proteins in an organism (2,11,12). However, the term has also been used to include the set of proteins involved in the secretory pathway (13,14). In the work described here, the secretome only includes the secreted proteins in an organism. Along with an increased number of species having genomes being completely sequenced, we see an increased number of publications on fungal secretome identification and analysis using both computational and experimental approaches (15). For example, secretomes have been reported in following fungi including A. niger (8), Candida albicans (16), Phanerochaete chrysosporium (17), Sclerotinia sclerotiorum (18), Fusarium graminearum (19) and Ustilago maydis (20). Considering the biological importance of secreted proteins and their potential industrial applications, we developed a knowledgebase of fungal secretomes for identification, annotation and curation of both computationally predicted and experimentally identified fungal secreted proteins. This knowledgebase is designed to serve as a central portal for providing as well as collecting information on fungal secretomes.

Data collection and database implementation

The fungal protein sequences were retrieved from the NCBI Reference Sequence collection (RefSeq) database (release April, 2010) (http://www.ncbi.nlm.nih.gov/RefSeq/). The rational for choosing the RefSeq protein data set was that RefSeq provides a comprehensive, integrated, non-redundant, well-annotated set of proteins and also the corresponding nucleotide sequences were also linked for these protein sequences in their database (21). The data in the fungal secretome knowledgebase (FunSecKB) were obtained from the following three sources: (i) the features predicted using computational approaches; (ii) subcellular locations annotated in UniProtKB; and (iii) our manual curation with experimental evidence obtained from recent literature.

Computational methods for prediction of secreted proteins

The fungal protein sequences downloaded from the NCBI RefSeq database were processed using the following programs including SignalP (version 3.0, http://www.cbs.dtu.dk/services/SignalP/) (22), Phobius (http://phobius.binf.ku.dk/) (23,24), WolfPsort (http://wolfpsort.org/) (25,26) and TargetP (http://www.cbs.dtu.dk/services/TargetP/) (27), for signal peptide and subcellular location prediction. We chose these four predictors because they were previously evaluated favorably and widely used by the fungal secretome research community (8,16,28). TMHMM (http://www.cbs.dtu.dk/services/TMHMM) was used to identify proteins having transmembrane domains (29) and PS-Scan (http://www.expasy.org/tools/scanprosite/) was used to scan ER targeting sequence (Prosite: PS00014) (30). With each of the programs, the default parameters for eukaryotes or fungi were used. For SignalP prediction, only entries that were predicted having a ‘mostly likely cleavage site’ by SignalP-NN algorithm and a ‘signal peptide’ by SignalP-HMM algorithm were considered to be true signal peptide ‘positives’, using the N-terminal 70 amino acids (22). For predicting membrane proteins using TMHMM, the entries having membrane domains not located within the N-terminus (the first 70 amino acids) were treated as real membrane proteins. Protein sequences predicted to have a signal peptide by SignalP were further processed using FragAnchor to identify the glycosylphosphatidyinositol (GPI) anchors (http://navet.ics.hawaii.edu/∼fraganchor/NNHMM/NNHMM.html) (31). Protein sequences predicted as having a GPI anchor may be attached to the outside of the plasma membrane or may be secreted to be targeted to the cell wall (32).

We recently performed the accuracy evaluation of the computational methods, using 241 experimentally identified secreted proteins and 5992 non-secreted proteins in fungi that were retrieved from UniProt/Swiss-Prot data set, and found that the highest prediction accuracy (92.1% in sensitivity and 98.9% in specificity) was achieved by combining SignalP, WolfPsort and Phobius for signal peptide prediction, TMHMM for eliminating membrane proteins, and PS-Scan for removing ER targeting proteins (28). Thus, the secretomes defined in this study include the manually curated secreted proteins along with the proteins predicted as having a signal peptide at their N-terminus by SignalP and Phobius and with a subcellular location predicted as extracellular by WolfPsort, but not having a transmembrane domain or an ER targeting signal. The information provided by TargetP and fragAnchor were also included in the annotation which may be useful for identifying mitochondrial targeted proteins or GPI anchored membrane or cell wall proteins. An overview of the database’s features are shown in Figure 1.

Overview of FunSecKB. To search the database users can enter NCBI RefSeq gi or accession number, UniProt accession number, keywords or species. The database consists of information generated using seven prediction tools and subcellular location annotated in UniProtKB and our own manual curation. Users can browse through the results using the web user-interface. Links to external databases and resources are also provided for further exploration. Whole secretome sequences can be downloaded and BLAST utility can be accessed from the database interface.
Figure 1.

Overview of FunSecKB. To search the database users can enter NCBI RefSeq gi or accession number, UniProt accession number, keywords or species. The database consists of information generated using seven prediction tools and subcellular location annotated in UniProtKB and our own manual curation. Users can browse through the results using the web user-interface. Links to external databases and resources are also provided for further exploration. Whole secretome sequences can be downloaded and BLAST utility can be accessed from the database interface.

Linking RefSeq proteins to UniProtKB annotation

The fungal protein entries in FunSecKB are linked to the UniProtKB using the mapping information generated in UniProtKB (ftp://ftp.uniprot.org/pub/databases/uniprot/current release/knowledgebase/idmapping/) (33). We also integrated the subcellular location information of fungal proteins annotated in UniProtKB including curated (reviewed, from the UniProtKB/Swiss-Prot data set) and predicted (unreviewed, from the UniProtKB/TrEMBL data set). In addition, we also included manually curated protein entries in UniProtKB/Swiss-Prot data set which could not be mapped to entries in the RefSeq database.

Manual curation and community annotation

FunSecKB supports community curation of subcellular locations of fungal proteins based on published experimental evidence. A submission form was developed for users to provide subcellular location annotation and the literature source to support the annotation. After our curator’s validation, these data will be incorporated into the database. Currently we have manually curated more than two hundred secreted proteins from A. niger (8). Manual curation is an ongoing process, thus additional secreted proteins will be manually curated and integrated into the database with time.

The information from the above three sources are integrated in the annotation (Figure 1). The annotated entries are linked to the RefSeq database in NCBI and UniProtKB as well as related literature for entries manually curated by our curators or the community. The data will be updated when a new RefSeq data set is released from NCBI (http://www.ncbi.nlm.nih.gov/RefSeq/).

Data access

FunSecKB can be accessed through the database web interface at http://proteomics.ysu.edu/secretomes/fungi.php. There are three approaches to accessing the data including: (i) search individual proteins using NCBI’s RefSeq gi or accession number, UniProt accession number, keyword or by species; (ii) search or download the whole secretome or a subset of manually curated secreted proteins of a species and (iii) search all fungal proteins or fungal secreted proteins using BLAST.

The annotation page contains the summary and the details of subcellular locations predicted by the tools mentioned above and annotation retrieved from UniProtKB. Each entry is linked to both RefSeq and UniProtKB. The secretome, including predicted and curated secreted proteins from a particular species, can be searched and downloaded by selecting a species from the species list for complete genomes or inputting a species name for others not having a complete genome. The protein sequences of the secretome from a species can be downloaded into a fasta file. Manually curated secreted proteins consist of proteins retrieved from UniProtKB/Swiss-Prot with subcellular locations labeled as ‘reviewed’ and proteins curated by our curators and the users. The proteins curated by us and by the community are supported by experimental evidence for their subcellular location annotation and the related literature can be found on the same page. The annotation page also contains the primary protein sequence (Figure 1). The database interface provides a link to the BLAST input interface to search through the proteins retrieved from RefSeq: either all fungal proteins or just the fungal secretomes.

Preliminary data analysis

Currently FunSecKB contains a total of 478 073 fungal protein sequences including 23 878 predicted and/or curated secreted proteins from a total of 118 fungal species. This includes 52 fungal species, with one species having two different varieties, having a complete predicted proteome set. We performed a preliminary analysis on the 53 complete secretomes of 52 fungal species including 43 Ascomycetes, 7 Basidiomycetes (with Cryptococcus neoformans having two varieties) and 2 Microsporidia (Table 1). Overall, fungal species having an expanded genome size encode more proteins in their predicted proteomes (r = 0.75) (Figure 2a). Ajellomyces dermatitidis and Postia placenta are two outliers. For the P. placenta genome of 69 Mb the RefSeq only has 9083 predicted proteins, however, Martinez et al. (2009) reported 17 173 proteins predicted from the P. placenta genome (34). Thus the discrepancy may be caused by lagged database update. The reason for the A. dermatitidis data is not known.

Relationship between genome size, proteome size and secretome size in fungi. (a) genome size and proteome size; (b) proteome size and secretome size; (c) proteome size and GPI-anchored secreted proteins and (d) proteome size and soluble secreted proteins.
Figure 2.

Relationship between genome size, proteome size and secretome size in fungi. (a) genome size and proteome size; (b) proteome size and secretome size; (c) proteome size and GPI-anchored secreted proteins and (d) proteome size and soluble secreted proteins.

Table 1.

Summary of genome size, proteome size, secretome size in different fungi

SpeciesPhylumGenome (Mb)Predicted ProteomePredicted SecretomeCurated SecretomeGPI-anchored SecretomeSoluble SecretomeSecretome (%)GPI-anchored Portion (%)
Ajellomyces capsulatusAscomycota3193132240251992.411.2
Ajellomyces dermatitidisAscomycota7495873350512843.515.2
Ashbya gossypiiAscomycota8472593221722.022.6
Aspergillus clavatusAscomycota28912157117715006.312.4
Aspergillus flavusAscomycota3613 487951251008517.110.5
Aspergillus fumigatusAscomycota29963062458745506.511.9
Aspergillus nidulansAscomycota30954170429766287.410.8
Aspergillus nigerAscomycota3414 102832253827505.99.9
Aspergillus oryzaeAscomycota3712 07484328857587.010.1
Aspergillus terreusAscomycota2910 40177423707047.49.0
Botryotinia fuckelianaAscomycota3916 3897554926634.612.2
Candida albicansAscomycota2814 633449411173323.126.1
Candida dubliniensisAscomycota1658601840551293.129.9
Candida glabrataAscomycota125192121748732.339.7
Candida tropicalisAscomycota1562542121641483.430.2
Chaetomium globosumAscomycota3411 0488621677957.87.8
Clavispora lusitaniaeAscomycota1659361690401292.823.7
Coccidioides immitisAscomycota2910 4402632412222.515.6
Debaryomyces hanseniiAscomycota1263351481381102.325.7
Gibberella zeaeAscomycota3611 69090011027987.711.3
Kluyveromyces lactisAscomycota115357113537762.132.7
Lachancea thermotoleransAscomycota105091128029992.522.7
Lodderomyces elongisporusAscomycota1657991390341052.424.5
Magnaporthe griseaAscomycota4014 01014713127134410.58.6
Neosartorya fischeriAscomycota3310 40675121786737.210.4
Neurospora crassaAscomycota39984459210765166.012.8
Penicillium chrysogenumAscomycota3212 79170351026015.514.5
Penicillium marneffeiAscomycota2910 6635380794595.014.7
Phaeosphaeria nodorumAscomycota3716 0021103110110026.99.2
Pichia guilliermondiiAscomycota1159201590331262.720.8
Pichia pastorisAscomycota95040105031742.129.5
Pichia stipitisAscomycota1558161440351092.524.3
Podospora anserinaAscomycota3310 2727891897007.711.3
Pyrenophora tritici-repentisAscomycota3712 1699420938497.79.9
Saccharomyces cerevisiaeAscomycota125885156101411152.726.3
Schizosaccharomyces japonicusAscomycota114824109071022.36.4
Schizosaccharomyces pombeAscomycota1350011124371052.26.3
Sclerotinia sclerotiorumAscomycota3814 4466231885354.314.1
Talaromyces stipitatusAscomycota3613 2525800655154.411.2
Uncinocarpus reesiiAscomycota2277603120452674.014.4
Vanderwaltozyma polysporaAscomycota155376116028882.224.1
Yarrowia lipolyticaAscomycota2264722995782214.626.1
Zygosaccharomyces rouxiiAscomycota124994120033872.427.5
Coprinopsis cinereaBasidiomycota3613 54691781068116.811.6
Cryptococcus neoformans (neoformans B-3501A)Basidiomycota1965781860341522.818.3
Cryptococcus neoformans (neoformans JEC21)Basidiomycota2165941810301512.716.6
Laccaria bicolorBasidiomycota5918 2156500995513.615.2
Malassezia globosaBasidiomycota94286134081263.16.0
Moniliophthora perniciosaBasidiomycota2713 6494650394263.48.4
Postia placentaBasidiomycota6990833910223694.35.6
Ustilago maydisBasidiomycota2065484312214106.64.9
Encephalitozoon cuniculiMicrosporidia319961720170.90.0
Enterocytozoon bieneusiMicrosporidia436322100210.60.0
Other species998367366
Total478 07323 87810673014
SpeciesPhylumGenome (Mb)Predicted ProteomePredicted SecretomeCurated SecretomeGPI-anchored SecretomeSoluble SecretomeSecretome (%)GPI-anchored Portion (%)
Ajellomyces capsulatusAscomycota3193132240251992.411.2
Ajellomyces dermatitidisAscomycota7495873350512843.515.2
Ashbya gossypiiAscomycota8472593221722.022.6
Aspergillus clavatusAscomycota28912157117715006.312.4
Aspergillus flavusAscomycota3613 487951251008517.110.5
Aspergillus fumigatusAscomycota29963062458745506.511.9
Aspergillus nidulansAscomycota30954170429766287.410.8
Aspergillus nigerAscomycota3414 102832253827505.99.9
Aspergillus oryzaeAscomycota3712 07484328857587.010.1
Aspergillus terreusAscomycota2910 40177423707047.49.0
Botryotinia fuckelianaAscomycota3916 3897554926634.612.2
Candida albicansAscomycota2814 633449411173323.126.1
Candida dubliniensisAscomycota1658601840551293.129.9
Candida glabrataAscomycota125192121748732.339.7
Candida tropicalisAscomycota1562542121641483.430.2
Chaetomium globosumAscomycota3411 0488621677957.87.8
Clavispora lusitaniaeAscomycota1659361690401292.823.7
Coccidioides immitisAscomycota2910 4402632412222.515.6
Debaryomyces hanseniiAscomycota1263351481381102.325.7
Gibberella zeaeAscomycota3611 69090011027987.711.3
Kluyveromyces lactisAscomycota115357113537762.132.7
Lachancea thermotoleransAscomycota105091128029992.522.7
Lodderomyces elongisporusAscomycota1657991390341052.424.5
Magnaporthe griseaAscomycota4014 01014713127134410.58.6
Neosartorya fischeriAscomycota3310 40675121786737.210.4
Neurospora crassaAscomycota39984459210765166.012.8
Penicillium chrysogenumAscomycota3212 79170351026015.514.5
Penicillium marneffeiAscomycota2910 6635380794595.014.7
Phaeosphaeria nodorumAscomycota3716 0021103110110026.99.2
Pichia guilliermondiiAscomycota1159201590331262.720.8
Pichia pastorisAscomycota95040105031742.129.5
Pichia stipitisAscomycota1558161440351092.524.3
Podospora anserinaAscomycota3310 2727891897007.711.3
Pyrenophora tritici-repentisAscomycota3712 1699420938497.79.9
Saccharomyces cerevisiaeAscomycota125885156101411152.726.3
Schizosaccharomyces japonicusAscomycota114824109071022.36.4
Schizosaccharomyces pombeAscomycota1350011124371052.26.3
Sclerotinia sclerotiorumAscomycota3814 4466231885354.314.1
Talaromyces stipitatusAscomycota3613 2525800655154.411.2
Uncinocarpus reesiiAscomycota2277603120452674.014.4
Vanderwaltozyma polysporaAscomycota155376116028882.224.1
Yarrowia lipolyticaAscomycota2264722995782214.626.1
Zygosaccharomyces rouxiiAscomycota124994120033872.427.5
Coprinopsis cinereaBasidiomycota3613 54691781068116.811.6
Cryptococcus neoformans (neoformans B-3501A)Basidiomycota1965781860341522.818.3
Cryptococcus neoformans (neoformans JEC21)Basidiomycota2165941810301512.716.6
Laccaria bicolorBasidiomycota5918 2156500995513.615.2
Malassezia globosaBasidiomycota94286134081263.16.0
Moniliophthora perniciosaBasidiomycota2713 6494650394263.48.4
Postia placentaBasidiomycota6990833910223694.35.6
Ustilago maydisBasidiomycota2065484312214106.64.9
Encephalitozoon cuniculiMicrosporidia319961720170.90.0
Enterocytozoon bieneusiMicrosporidia436322100210.60.0
Other species998367366
Total478 07323 87810673014
Table 1.

Summary of genome size, proteome size, secretome size in different fungi

SpeciesPhylumGenome (Mb)Predicted ProteomePredicted SecretomeCurated SecretomeGPI-anchored SecretomeSoluble SecretomeSecretome (%)GPI-anchored Portion (%)
Ajellomyces capsulatusAscomycota3193132240251992.411.2
Ajellomyces dermatitidisAscomycota7495873350512843.515.2
Ashbya gossypiiAscomycota8472593221722.022.6
Aspergillus clavatusAscomycota28912157117715006.312.4
Aspergillus flavusAscomycota3613 487951251008517.110.5
Aspergillus fumigatusAscomycota29963062458745506.511.9
Aspergillus nidulansAscomycota30954170429766287.410.8
Aspergillus nigerAscomycota3414 102832253827505.99.9
Aspergillus oryzaeAscomycota3712 07484328857587.010.1
Aspergillus terreusAscomycota2910 40177423707047.49.0
Botryotinia fuckelianaAscomycota3916 3897554926634.612.2
Candida albicansAscomycota2814 633449411173323.126.1
Candida dubliniensisAscomycota1658601840551293.129.9
Candida glabrataAscomycota125192121748732.339.7
Candida tropicalisAscomycota1562542121641483.430.2
Chaetomium globosumAscomycota3411 0488621677957.87.8
Clavispora lusitaniaeAscomycota1659361690401292.823.7
Coccidioides immitisAscomycota2910 4402632412222.515.6
Debaryomyces hanseniiAscomycota1263351481381102.325.7
Gibberella zeaeAscomycota3611 69090011027987.711.3
Kluyveromyces lactisAscomycota115357113537762.132.7
Lachancea thermotoleransAscomycota105091128029992.522.7
Lodderomyces elongisporusAscomycota1657991390341052.424.5
Magnaporthe griseaAscomycota4014 01014713127134410.58.6
Neosartorya fischeriAscomycota3310 40675121786737.210.4
Neurospora crassaAscomycota39984459210765166.012.8
Penicillium chrysogenumAscomycota3212 79170351026015.514.5
Penicillium marneffeiAscomycota2910 6635380794595.014.7
Phaeosphaeria nodorumAscomycota3716 0021103110110026.99.2
Pichia guilliermondiiAscomycota1159201590331262.720.8
Pichia pastorisAscomycota95040105031742.129.5
Pichia stipitisAscomycota1558161440351092.524.3
Podospora anserinaAscomycota3310 2727891897007.711.3
Pyrenophora tritici-repentisAscomycota3712 1699420938497.79.9
Saccharomyces cerevisiaeAscomycota125885156101411152.726.3
Schizosaccharomyces japonicusAscomycota114824109071022.36.4
Schizosaccharomyces pombeAscomycota1350011124371052.26.3
Sclerotinia sclerotiorumAscomycota3814 4466231885354.314.1
Talaromyces stipitatusAscomycota3613 2525800655154.411.2
Uncinocarpus reesiiAscomycota2277603120452674.014.4
Vanderwaltozyma polysporaAscomycota155376116028882.224.1
Yarrowia lipolyticaAscomycota2264722995782214.626.1
Zygosaccharomyces rouxiiAscomycota124994120033872.427.5
Coprinopsis cinereaBasidiomycota3613 54691781068116.811.6
Cryptococcus neoformans (neoformans B-3501A)Basidiomycota1965781860341522.818.3
Cryptococcus neoformans (neoformans JEC21)Basidiomycota2165941810301512.716.6
Laccaria bicolorBasidiomycota5918 2156500995513.615.2
Malassezia globosaBasidiomycota94286134081263.16.0
Moniliophthora perniciosaBasidiomycota2713 6494650394263.48.4
Postia placentaBasidiomycota6990833910223694.35.6
Ustilago maydisBasidiomycota2065484312214106.64.9
Encephalitozoon cuniculiMicrosporidia319961720170.90.0
Enterocytozoon bieneusiMicrosporidia436322100210.60.0
Other species998367366
Total478 07323 87810673014
SpeciesPhylumGenome (Mb)Predicted ProteomePredicted SecretomeCurated SecretomeGPI-anchored SecretomeSoluble SecretomeSecretome (%)GPI-anchored Portion (%)
Ajellomyces capsulatusAscomycota3193132240251992.411.2
Ajellomyces dermatitidisAscomycota7495873350512843.515.2
Ashbya gossypiiAscomycota8472593221722.022.6
Aspergillus clavatusAscomycota28912157117715006.312.4
Aspergillus flavusAscomycota3613 487951251008517.110.5
Aspergillus fumigatusAscomycota29963062458745506.511.9
Aspergillus nidulansAscomycota30954170429766287.410.8
Aspergillus nigerAscomycota3414 102832253827505.99.9
Aspergillus oryzaeAscomycota3712 07484328857587.010.1
Aspergillus terreusAscomycota2910 40177423707047.49.0
Botryotinia fuckelianaAscomycota3916 3897554926634.612.2
Candida albicansAscomycota2814 633449411173323.126.1
Candida dubliniensisAscomycota1658601840551293.129.9
Candida glabrataAscomycota125192121748732.339.7
Candida tropicalisAscomycota1562542121641483.430.2
Chaetomium globosumAscomycota3411 0488621677957.87.8
Clavispora lusitaniaeAscomycota1659361690401292.823.7
Coccidioides immitisAscomycota2910 4402632412222.515.6
Debaryomyces hanseniiAscomycota1263351481381102.325.7
Gibberella zeaeAscomycota3611 69090011027987.711.3
Kluyveromyces lactisAscomycota115357113537762.132.7
Lachancea thermotoleransAscomycota105091128029992.522.7
Lodderomyces elongisporusAscomycota1657991390341052.424.5
Magnaporthe griseaAscomycota4014 01014713127134410.58.6
Neosartorya fischeriAscomycota3310 40675121786737.210.4
Neurospora crassaAscomycota39984459210765166.012.8
Penicillium chrysogenumAscomycota3212 79170351026015.514.5
Penicillium marneffeiAscomycota2910 6635380794595.014.7
Phaeosphaeria nodorumAscomycota3716 0021103110110026.99.2
Pichia guilliermondiiAscomycota1159201590331262.720.8
Pichia pastorisAscomycota95040105031742.129.5
Pichia stipitisAscomycota1558161440351092.524.3
Podospora anserinaAscomycota3310 2727891897007.711.3
Pyrenophora tritici-repentisAscomycota3712 1699420938497.79.9
Saccharomyces cerevisiaeAscomycota125885156101411152.726.3
Schizosaccharomyces japonicusAscomycota114824109071022.36.4
Schizosaccharomyces pombeAscomycota1350011124371052.26.3
Sclerotinia sclerotiorumAscomycota3814 4466231885354.314.1
Talaromyces stipitatusAscomycota3613 2525800655154.411.2
Uncinocarpus reesiiAscomycota2277603120452674.014.4
Vanderwaltozyma polysporaAscomycota155376116028882.224.1
Yarrowia lipolyticaAscomycota2264722995782214.626.1
Zygosaccharomyces rouxiiAscomycota124994120033872.427.5
Coprinopsis cinereaBasidiomycota3613 54691781068116.811.6
Cryptococcus neoformans (neoformans B-3501A)Basidiomycota1965781860341522.818.3
Cryptococcus neoformans (neoformans JEC21)Basidiomycota2165941810301512.716.6
Laccaria bicolorBasidiomycota5918 2156500995513.615.2
Malassezia globosaBasidiomycota94286134081263.16.0
Moniliophthora perniciosaBasidiomycota2713 6494650394263.48.4
Postia placentaBasidiomycota6990833910223694.35.6
Ustilago maydisBasidiomycota2065484312214106.64.9
Encephalitozoon cuniculiMicrosporidia319961720170.90.0
Enterocytozoon bieneusiMicrosporidia436322100210.60.0
Other species998367366
Total478 07323 87810673014

The proportion of the secretomes in the proteomes in different species varies significantly from <1% in Encephalitozoon cuniculi and Enterocytozoon bieneusi, two Microsporidia species (unicellular parasites), to >10% in Magnaporthe grisea, a rice pathogenic fungus (Table 1). Overall, predicted secretome sizes increase with expanded proteome sizes in fungal species (r = 0.83) (Figure 2b). We further identified GPI-anchored proteins in the predicted secretome, which represent insoluble portions of secreted proteins that are components of cell walls or attached to the outside of cell membrane. We see that both insoluble and soluble portions are increased with increased proteome size in different fungal species (Figure 2c and 2d).

The functional categorization of predicted secretomes was analyzed using the rpsBLAST tool in the NCBI BLAST package to search the conserved domain database (35). The highly encoded secreted protein families having more than 50 members in the whole database are listed in Table 2. Preliminary functional analysis revealed that the fungal secretomes largely consist of enzymes, particularly hydrolases, which are used to breakdown carbohydrates, lipids, proteins and all other types of organic materials by fungi (Table 2). Furthermore, a total of 10 397 secreted proteins have GO annotations in UniProtKB. Among them, molecular functional classification using GOSlimViewer (http://agbase.msstate.edu/cgi-bin/tools/goslimviewer_select.pl) showed 43% were hydrolases including peptidases (Figure 3) (36). These enzymes have potential applications in biofuel production. The database user interface features an easy to use option to download predicted secretomes from completely sequenced fungal species. This provides a resource for further detailed species specific or interspecies comparative analysis.

Molecular functional classification of fungal secreted proteins using GOSlimViewer.
Figure 3.

Molecular functional classification of fungal secreted proteins using GOSlimViewer.

Table 2.

Highly encoded secreted protein families in fungi

CDD functional domainsNumbers
pfam00135, COesterase, Carboxylesterase314
pfam03443, Glyco hydro 61, Glycosyl hydrolase family 61301
COG0277, GlcD, FAD/FMN-containing dehydrogenases287
cd04077, Peptidases S8 PCSK9 ProteinaseK like: Peptidase S8 family domain in ProteinaseK-like proteins223
pfam00450, Peptidase S10, Serine carboxypeptidase215
pfam00295, Glyco hydro 28, Glycosyl hydrolases family 28207
pfam00067, p450, Cytochrome P450160
pfam00933, Glyco hydro 3, Glycosyl hydrolase family 3 N terminal domain156
cd05474, pepsin-like proteinases secreted from pathogens to degrade host proteins154
COG2303, BetA, Choline dehydrogenase and related flavoproteins152
pfam01083, Cutinase139
pfam09362, DUF1996, Domain of unknown function (DUF1996)136
pfam00264, Tyrosinase, Common central domain of tyrosinase130
TIGR03388, ascorbase, L-ascorbate oxidase, plant type128
cd04056, Peptidases S53, Peptidase domain in the S53 family124
pfam04389, Peptidase M28, Peptidase family M28122
COG5309, COG5309, Exo-beta-1,3-glucanase121
pfam04616, Glyco hydro 43, Glycosyl hydrolases family 43114
cd00519, Lipase 3, Lipase (class 3)106
PRK02106, PRK02106, choline dehydrogenase100
COG2730, BglC, Endoglucanase99
pfam00328, Acid phosphat A, Histidine acid phosphatase98
pfam03856, SUN, Beta-glucosidase (SUN family)97
pfam07519, Tannase, Tannase and feruloyl esterase97
smart00656, Amb all, Amb all domain94
pfam00457, Glyco hydro 11, Glycosyl hydrolases family 1192
cd06097, Aspergillopepsin like: Aspergillopepsin like, aspartic proteases of fungal origin91
cd02877, GH18 hevamine XipI class III88
pfam00331, Glyco hydro 10, Glycosyl hydrolase family 1088
pfam01565, FAD binding 4, FAD binding domain87
pfam03583, LIP, Secretory lipase87
pfam03659, Glyco hydro 71, Glycosyl hydrolase family 7187
pfam01185, Hydrophobin, Fungal hydrophobin85
pfam01532, Glyco hydro 47, Glycosyl hydrolase family 4779
cd02181, GH16 MLG1 glucanase78
cd05471, Pepsin-like aspartic proteases, bilobal enzymes that cleave bonds in peptides at acidic pH77
cd05384, SCP PRY1 like, SCP-like extracellular protein domain, PRY1-like sub-family restricted to fungi75
cd07203, Fungal Phospholipase B-like; cPLA2 GrpIVA homologs; catalytic domain71
pfam00840, Glyco hydro 7, Glycosyl hydrolase family 771
pfam00150, Cellulase, Cellulase (glycosyl hydrolase family 5)70
pfam11790, Glyco hydro cc, Glycosyl hydrolase catalytic core70
pfam01522, Polysacc deac 1, Polysaccharide deacetylase69
pfam07971, Glyco hydro 92, Glycosyl hydrolase family 9268
smart00636, Glyco 18, Glycosyl hydrolase family 1868
cd00842, MPP ASMase, acid sphingomyelinase and related proteins67
cd03457, intradiol dioxygenase like, Intradiol dioxygenase supgroup67
pfam03663, Glyco hydro 76, Glycosyl hydrolase family 7667
pfam05577, Peptidase S28, Serine carboxypeptidase S2867
pfam12296, HsbA, Hydrophobic surface binding protein A65
cd02183, GH16 GPI glucanosyltransferase64
COG0654, 2-polyprenyl-6-methoxyphenol hydroxylase and related FAD-dependent oxidoreductases63
pfam01055, Glyco hydro 31, Glycosyl hydrolases family 3162
cd06248, Peptidase M14 Carboxypeptidase A/B-like subfamily61
pfam02128, Peptidase M36, Fungalysin metallopeptidase (M36)61
pfam04185, Phosphoesterase, Phosphoesterase family61
pfam11765, Hyphal reg CWP, Hyphally regulated cell wall protein60
pfam01328, Peroxidase 2, Peroxidase, family 259
pfam01828, Peptidase A4, Peptidase A4 family58
pfam03198, Glyco hydro 72, Glycolipid anchored surface protein57
cd01846, Fatty acyltransferase-like subfamily of the SGNH hydrolases, a diverse family of lipases and esterases56
pfam02102, Peptidase M35, Deuterolysin metalloprotease (M35)56
pfam00723, Glyco hydro 15, Glycosyl hydrolases family 1554
pfam00128, Alpha-amylase, Alpha amylase, catalytic domain53
cd08588, Catalytic domain of Arabidopsis thaliana PI-PLC X domain-containing protein52
PHA03247, PHA03247, large tegument protein UL36; Provisional52
pfam01301, Glyco hydro 35, Glycosyl hydrolases family 3551
pfam11937, DUF3455, Protein of unknown function (DUF3455)51
CDD functional domainsNumbers
pfam00135, COesterase, Carboxylesterase314
pfam03443, Glyco hydro 61, Glycosyl hydrolase family 61301
COG0277, GlcD, FAD/FMN-containing dehydrogenases287
cd04077, Peptidases S8 PCSK9 ProteinaseK like: Peptidase S8 family domain in ProteinaseK-like proteins223
pfam00450, Peptidase S10, Serine carboxypeptidase215
pfam00295, Glyco hydro 28, Glycosyl hydrolases family 28207
pfam00067, p450, Cytochrome P450160
pfam00933, Glyco hydro 3, Glycosyl hydrolase family 3 N terminal domain156
cd05474, pepsin-like proteinases secreted from pathogens to degrade host proteins154
COG2303, BetA, Choline dehydrogenase and related flavoproteins152
pfam01083, Cutinase139
pfam09362, DUF1996, Domain of unknown function (DUF1996)136
pfam00264, Tyrosinase, Common central domain of tyrosinase130
TIGR03388, ascorbase, L-ascorbate oxidase, plant type128
cd04056, Peptidases S53, Peptidase domain in the S53 family124
pfam04389, Peptidase M28, Peptidase family M28122
COG5309, COG5309, Exo-beta-1,3-glucanase121
pfam04616, Glyco hydro 43, Glycosyl hydrolases family 43114
cd00519, Lipase 3, Lipase (class 3)106
PRK02106, PRK02106, choline dehydrogenase100
COG2730, BglC, Endoglucanase99
pfam00328, Acid phosphat A, Histidine acid phosphatase98
pfam03856, SUN, Beta-glucosidase (SUN family)97
pfam07519, Tannase, Tannase and feruloyl esterase97
smart00656, Amb all, Amb all domain94
pfam00457, Glyco hydro 11, Glycosyl hydrolases family 1192
cd06097, Aspergillopepsin like: Aspergillopepsin like, aspartic proteases of fungal origin91
cd02877, GH18 hevamine XipI class III88
pfam00331, Glyco hydro 10, Glycosyl hydrolase family 1088
pfam01565, FAD binding 4, FAD binding domain87
pfam03583, LIP, Secretory lipase87
pfam03659, Glyco hydro 71, Glycosyl hydrolase family 7187
pfam01185, Hydrophobin, Fungal hydrophobin85
pfam01532, Glyco hydro 47, Glycosyl hydrolase family 4779
cd02181, GH16 MLG1 glucanase78
cd05471, Pepsin-like aspartic proteases, bilobal enzymes that cleave bonds in peptides at acidic pH77
cd05384, SCP PRY1 like, SCP-like extracellular protein domain, PRY1-like sub-family restricted to fungi75
cd07203, Fungal Phospholipase B-like; cPLA2 GrpIVA homologs; catalytic domain71
pfam00840, Glyco hydro 7, Glycosyl hydrolase family 771
pfam00150, Cellulase, Cellulase (glycosyl hydrolase family 5)70
pfam11790, Glyco hydro cc, Glycosyl hydrolase catalytic core70
pfam01522, Polysacc deac 1, Polysaccharide deacetylase69
pfam07971, Glyco hydro 92, Glycosyl hydrolase family 9268
smart00636, Glyco 18, Glycosyl hydrolase family 1868
cd00842, MPP ASMase, acid sphingomyelinase and related proteins67
cd03457, intradiol dioxygenase like, Intradiol dioxygenase supgroup67
pfam03663, Glyco hydro 76, Glycosyl hydrolase family 7667
pfam05577, Peptidase S28, Serine carboxypeptidase S2867
pfam12296, HsbA, Hydrophobic surface binding protein A65
cd02183, GH16 GPI glucanosyltransferase64
COG0654, 2-polyprenyl-6-methoxyphenol hydroxylase and related FAD-dependent oxidoreductases63
pfam01055, Glyco hydro 31, Glycosyl hydrolases family 3162
cd06248, Peptidase M14 Carboxypeptidase A/B-like subfamily61
pfam02128, Peptidase M36, Fungalysin metallopeptidase (M36)61
pfam04185, Phosphoesterase, Phosphoesterase family61
pfam11765, Hyphal reg CWP, Hyphally regulated cell wall protein60
pfam01328, Peroxidase 2, Peroxidase, family 259
pfam01828, Peptidase A4, Peptidase A4 family58
pfam03198, Glyco hydro 72, Glycolipid anchored surface protein57
cd01846, Fatty acyltransferase-like subfamily of the SGNH hydrolases, a diverse family of lipases and esterases56
pfam02102, Peptidase M35, Deuterolysin metalloprotease (M35)56
pfam00723, Glyco hydro 15, Glycosyl hydrolases family 1554
pfam00128, Alpha-amylase, Alpha amylase, catalytic domain53
cd08588, Catalytic domain of Arabidopsis thaliana PI-PLC X domain-containing protein52
PHA03247, PHA03247, large tegument protein UL36; Provisional52
pfam01301, Glyco hydro 35, Glycosyl hydrolases family 3551
pfam11937, DUF3455, Protein of unknown function (DUF3455)51
Table 2.

Highly encoded secreted protein families in fungi

CDD functional domainsNumbers
pfam00135, COesterase, Carboxylesterase314
pfam03443, Glyco hydro 61, Glycosyl hydrolase family 61301
COG0277, GlcD, FAD/FMN-containing dehydrogenases287
cd04077, Peptidases S8 PCSK9 ProteinaseK like: Peptidase S8 family domain in ProteinaseK-like proteins223
pfam00450, Peptidase S10, Serine carboxypeptidase215
pfam00295, Glyco hydro 28, Glycosyl hydrolases family 28207
pfam00067, p450, Cytochrome P450160
pfam00933, Glyco hydro 3, Glycosyl hydrolase family 3 N terminal domain156
cd05474, pepsin-like proteinases secreted from pathogens to degrade host proteins154
COG2303, BetA, Choline dehydrogenase and related flavoproteins152
pfam01083, Cutinase139
pfam09362, DUF1996, Domain of unknown function (DUF1996)136
pfam00264, Tyrosinase, Common central domain of tyrosinase130
TIGR03388, ascorbase, L-ascorbate oxidase, plant type128
cd04056, Peptidases S53, Peptidase domain in the S53 family124
pfam04389, Peptidase M28, Peptidase family M28122
COG5309, COG5309, Exo-beta-1,3-glucanase121
pfam04616, Glyco hydro 43, Glycosyl hydrolases family 43114
cd00519, Lipase 3, Lipase (class 3)106
PRK02106, PRK02106, choline dehydrogenase100
COG2730, BglC, Endoglucanase99
pfam00328, Acid phosphat A, Histidine acid phosphatase98
pfam03856, SUN, Beta-glucosidase (SUN family)97
pfam07519, Tannase, Tannase and feruloyl esterase97
smart00656, Amb all, Amb all domain94
pfam00457, Glyco hydro 11, Glycosyl hydrolases family 1192
cd06097, Aspergillopepsin like: Aspergillopepsin like, aspartic proteases of fungal origin91
cd02877, GH18 hevamine XipI class III88
pfam00331, Glyco hydro 10, Glycosyl hydrolase family 1088
pfam01565, FAD binding 4, FAD binding domain87
pfam03583, LIP, Secretory lipase87
pfam03659, Glyco hydro 71, Glycosyl hydrolase family 7187
pfam01185, Hydrophobin, Fungal hydrophobin85
pfam01532, Glyco hydro 47, Glycosyl hydrolase family 4779
cd02181, GH16 MLG1 glucanase78
cd05471, Pepsin-like aspartic proteases, bilobal enzymes that cleave bonds in peptides at acidic pH77
cd05384, SCP PRY1 like, SCP-like extracellular protein domain, PRY1-like sub-family restricted to fungi75
cd07203, Fungal Phospholipase B-like; cPLA2 GrpIVA homologs; catalytic domain71
pfam00840, Glyco hydro 7, Glycosyl hydrolase family 771
pfam00150, Cellulase, Cellulase (glycosyl hydrolase family 5)70
pfam11790, Glyco hydro cc, Glycosyl hydrolase catalytic core70
pfam01522, Polysacc deac 1, Polysaccharide deacetylase69
pfam07971, Glyco hydro 92, Glycosyl hydrolase family 9268
smart00636, Glyco 18, Glycosyl hydrolase family 1868
cd00842, MPP ASMase, acid sphingomyelinase and related proteins67
cd03457, intradiol dioxygenase like, Intradiol dioxygenase supgroup67
pfam03663, Glyco hydro 76, Glycosyl hydrolase family 7667
pfam05577, Peptidase S28, Serine carboxypeptidase S2867
pfam12296, HsbA, Hydrophobic surface binding protein A65
cd02183, GH16 GPI glucanosyltransferase64
COG0654, 2-polyprenyl-6-methoxyphenol hydroxylase and related FAD-dependent oxidoreductases63
pfam01055, Glyco hydro 31, Glycosyl hydrolases family 3162
cd06248, Peptidase M14 Carboxypeptidase A/B-like subfamily61
pfam02128, Peptidase M36, Fungalysin metallopeptidase (M36)61
pfam04185, Phosphoesterase, Phosphoesterase family61
pfam11765, Hyphal reg CWP, Hyphally regulated cell wall protein60
pfam01328, Peroxidase 2, Peroxidase, family 259
pfam01828, Peptidase A4, Peptidase A4 family58
pfam03198, Glyco hydro 72, Glycolipid anchored surface protein57
cd01846, Fatty acyltransferase-like subfamily of the SGNH hydrolases, a diverse family of lipases and esterases56
pfam02102, Peptidase M35, Deuterolysin metalloprotease (M35)56
pfam00723, Glyco hydro 15, Glycosyl hydrolases family 1554
pfam00128, Alpha-amylase, Alpha amylase, catalytic domain53
cd08588, Catalytic domain of Arabidopsis thaliana PI-PLC X domain-containing protein52
PHA03247, PHA03247, large tegument protein UL36; Provisional52
pfam01301, Glyco hydro 35, Glycosyl hydrolases family 3551
pfam11937, DUF3455, Protein of unknown function (DUF3455)51
CDD functional domainsNumbers
pfam00135, COesterase, Carboxylesterase314
pfam03443, Glyco hydro 61, Glycosyl hydrolase family 61301
COG0277, GlcD, FAD/FMN-containing dehydrogenases287
cd04077, Peptidases S8 PCSK9 ProteinaseK like: Peptidase S8 family domain in ProteinaseK-like proteins223
pfam00450, Peptidase S10, Serine carboxypeptidase215
pfam00295, Glyco hydro 28, Glycosyl hydrolases family 28207
pfam00067, p450, Cytochrome P450160
pfam00933, Glyco hydro 3, Glycosyl hydrolase family 3 N terminal domain156
cd05474, pepsin-like proteinases secreted from pathogens to degrade host proteins154
COG2303, BetA, Choline dehydrogenase and related flavoproteins152
pfam01083, Cutinase139
pfam09362, DUF1996, Domain of unknown function (DUF1996)136
pfam00264, Tyrosinase, Common central domain of tyrosinase130
TIGR03388, ascorbase, L-ascorbate oxidase, plant type128
cd04056, Peptidases S53, Peptidase domain in the S53 family124
pfam04389, Peptidase M28, Peptidase family M28122
COG5309, COG5309, Exo-beta-1,3-glucanase121
pfam04616, Glyco hydro 43, Glycosyl hydrolases family 43114
cd00519, Lipase 3, Lipase (class 3)106
PRK02106, PRK02106, choline dehydrogenase100
COG2730, BglC, Endoglucanase99
pfam00328, Acid phosphat A, Histidine acid phosphatase98
pfam03856, SUN, Beta-glucosidase (SUN family)97
pfam07519, Tannase, Tannase and feruloyl esterase97
smart00656, Amb all, Amb all domain94
pfam00457, Glyco hydro 11, Glycosyl hydrolases family 1192
cd06097, Aspergillopepsin like: Aspergillopepsin like, aspartic proteases of fungal origin91
cd02877, GH18 hevamine XipI class III88
pfam00331, Glyco hydro 10, Glycosyl hydrolase family 1088
pfam01565, FAD binding 4, FAD binding domain87
pfam03583, LIP, Secretory lipase87
pfam03659, Glyco hydro 71, Glycosyl hydrolase family 7187
pfam01185, Hydrophobin, Fungal hydrophobin85
pfam01532, Glyco hydro 47, Glycosyl hydrolase family 4779
cd02181, GH16 MLG1 glucanase78
cd05471, Pepsin-like aspartic proteases, bilobal enzymes that cleave bonds in peptides at acidic pH77
cd05384, SCP PRY1 like, SCP-like extracellular protein domain, PRY1-like sub-family restricted to fungi75
cd07203, Fungal Phospholipase B-like; cPLA2 GrpIVA homologs; catalytic domain71
pfam00840, Glyco hydro 7, Glycosyl hydrolase family 771
pfam00150, Cellulase, Cellulase (glycosyl hydrolase family 5)70
pfam11790, Glyco hydro cc, Glycosyl hydrolase catalytic core70
pfam01522, Polysacc deac 1, Polysaccharide deacetylase69
pfam07971, Glyco hydro 92, Glycosyl hydrolase family 9268
smart00636, Glyco 18, Glycosyl hydrolase family 1868
cd00842, MPP ASMase, acid sphingomyelinase and related proteins67
cd03457, intradiol dioxygenase like, Intradiol dioxygenase supgroup67
pfam03663, Glyco hydro 76, Glycosyl hydrolase family 7667
pfam05577, Peptidase S28, Serine carboxypeptidase S2867
pfam12296, HsbA, Hydrophobic surface binding protein A65
cd02183, GH16 GPI glucanosyltransferase64
COG0654, 2-polyprenyl-6-methoxyphenol hydroxylase and related FAD-dependent oxidoreductases63
pfam01055, Glyco hydro 31, Glycosyl hydrolases family 3162
cd06248, Peptidase M14 Carboxypeptidase A/B-like subfamily61
pfam02128, Peptidase M36, Fungalysin metallopeptidase (M36)61
pfam04185, Phosphoesterase, Phosphoesterase family61
pfam11765, Hyphal reg CWP, Hyphally regulated cell wall protein60
pfam01328, Peroxidase 2, Peroxidase, family 259
pfam01828, Peptidase A4, Peptidase A4 family58
pfam03198, Glyco hydro 72, Glycolipid anchored surface protein57
cd01846, Fatty acyltransferase-like subfamily of the SGNH hydrolases, a diverse family of lipases and esterases56
pfam02102, Peptidase M35, Deuterolysin metalloprotease (M35)56
pfam00723, Glyco hydro 15, Glycosyl hydrolases family 1554
pfam00128, Alpha-amylase, Alpha amylase, catalytic domain53
cd08588, Catalytic domain of Arabidopsis thaliana PI-PLC X domain-containing protein52
PHA03247, PHA03247, large tegument protein UL36; Provisional52
pfam01301, Glyco hydro 35, Glycosyl hydrolases family 3551
pfam11937, DUF3455, Protein of unknown function (DUF3455)51

Discussion

While constructing our database, a similar fungal secretome database (FSD, http://fsd.snu.ac.kr/) was published by Choi et al. (37). However, there are several important differences between the two databases (Table 3). We used RefSeq data while the FSD used only completely sequenced fungal genome data including some ‘work in progress’ genomes (37). The prediction methods used for identification of secreted proteins were also different. The FSD used a three-layer hierarchical identification rule based on 9 different programs and considered entries to be secreted proteins as long as any one of the tools predicted it to be secreted, thus the number of secreted proteins were much higher than the number predicted in our database. For example, in A. niger, we predicted 832 secreted proteins in the strain CBS 513.88, while Choi et al. (37) predicted 1831 secreted proteins in the same strain and 2616 secreted proteins in the ATCC1015 strain in the FSD (37). However, there were only from 691 to 881 proteins which were predicted to be secreted, with 160 of them being confirmed experimentally in the ATCC1015 strain by Tsang et al. (8). Thus, we believe the methods used in the FSD significantly over-estimated the number of secreted proteins in fungi. In addition, the search for the FSD is limited to using the sequence locus name and can not be searched with NCBI gi and accession number, UniProt accession number or keywords. There is also not a curation tool available for the community annotation in FSD (37).

Table 3.

Comparison of the two independently developed fungal secretome databases

FSDFunSecKB
Data sourceFungal genomesFungal proteins in RefSeq
Prediction toolsSignalP3.0; SigCleave; SigPred; RPSP; TMHMM2.0c; TargetP1.1b; PsortII; PredictNLS; SecretomeP1.0fSignalP 3.0; Phobius1.01; WolfPsort0.2; TargetP1.1b, TMHMM2.0c; PS-Scan
Data accessSequence locus name; BLASTKeywords, RefSeq gi or accession, UniProt accession; BLAST
Community curation toolNot availableAvailable
FSDFunSecKB
Data sourceFungal genomesFungal proteins in RefSeq
Prediction toolsSignalP3.0; SigCleave; SigPred; RPSP; TMHMM2.0c; TargetP1.1b; PsortII; PredictNLS; SecretomeP1.0fSignalP 3.0; Phobius1.01; WolfPsort0.2; TargetP1.1b, TMHMM2.0c; PS-Scan
Data accessSequence locus name; BLASTKeywords, RefSeq gi or accession, UniProt accession; BLAST
Community curation toolNot availableAvailable
Table 3.

Comparison of the two independently developed fungal secretome databases

FSDFunSecKB
Data sourceFungal genomesFungal proteins in RefSeq
Prediction toolsSignalP3.0; SigCleave; SigPred; RPSP; TMHMM2.0c; TargetP1.1b; PsortII; PredictNLS; SecretomeP1.0fSignalP 3.0; Phobius1.01; WolfPsort0.2; TargetP1.1b, TMHMM2.0c; PS-Scan
Data accessSequence locus name; BLASTKeywords, RefSeq gi or accession, UniProt accession; BLAST
Community curation toolNot availableAvailable
FSDFunSecKB
Data sourceFungal genomesFungal proteins in RefSeq
Prediction toolsSignalP3.0; SigCleave; SigPred; RPSP; TMHMM2.0c; TargetP1.1b; PsortII; PredictNLS; SecretomeP1.0fSignalP 3.0; Phobius1.01; WolfPsort0.2; TargetP1.1b, TMHMM2.0c; PS-Scan
Data accessSequence locus name; BLASTKeywords, RefSeq gi or accession, UniProt accession; BLAST
Community curation toolNot availableAvailable

In addition to the signal-peptide dependent secreted proteins using the classical ER-Golgi secretory pathway, there are non-classical, signal peptide independent, secretory pathways in all domains of organisms. Mammalian and bacterial leadless secreted proteins have been collected and used to implement the prediction software, SecretomeP, for predicting these proteins (http://www.cbs.dtu.dk/services/SecretomeP/) (38,39). The tool has not been trained with fungal-specific data and the accuracy for predicting fungal non-classical secreted protein could not be evaluated, thus we did not include this tool in our data processing. Although the FSD used SecretomeP to predict non-classical secreted proteins, the predicted secreted proteins were not included in the secretome analysis; including them would make the putative secretome >40% of whole proteome (37). Nevertheless, the FunSecKB and the FSD databases could complement each other as different data sources, prediction tools and data access utilities were implemented.

In summary, we constructed FunSecKB to identify, annotate and curate the secreted proteins in fungi. The data can be searched using protein identifiers or keywords, and by species. Most of the secreted proteins are currently predicted by computational tools. However, the community can use the curation module implemented in our site to manually curate subcellular locations of fungal proteins having experimental evidence. The resource described in the work is expected to provide a query and curation system that will help the community to further understand the secretome biology and explore various potential applications of fungal secreted proteins in bio-processing or environmental remediation industries.

Acknowledgements

We thank Gary Walker at YSU and the anonymous reviewers for providing helpful comments on improving the article.

Funding

Youngstown State University (YSU) Research Council grant (2009-2010 #04-10 to X.J.M.); YSU research professorship (to X.J.M.); College of Science, Technology, Engineering, and Mathematics Dean’s reassigned time (to X.J.M.). Funding for open access charge: the School of Graduate Studies and Research, Youngstown State University, Ohio, USA.

Conflict of interest. None Declared.

References

1
Kamoun
S
Deising
H
The secretome of plant-associated fungi and oomycetes
The Mycota V–Plant Relationships
2009
2nd
Berlin, Heidelberg
Springer
(pg. 
173
-
180
)
2
Cooper
KG
Woods
JP
Secreted dipeptidyl peptidase IV activity in the dimorphic fungal pathogen Histoplasma capsulatum
Infect. Immun.
2009
, vol. 
77
 (pg. 
2447
-
2454
)
3
Osherov
N
The virulence of Aspergillus fumigatus
New Insights in Medical Mycology
2007
Netherlands
Springer
(pg. 
185
-
212
)
4
O’Toole
N
Min
XJ
Storms
R
Butler
G
Tsang
A
Sequence-based analysis of fungal secretomes
Appl. Mycol. Biotechnol. Bioinform.
2006
, vol. 
6
 (pg. 
277
-
296
)
5
Blobel
G
Dobberstein
B
Transfer of proteins across membranes. I. Presence of proteolytically processed and unprocessed nascent immunoglobulin light chains on membrane-bound ribosomes of murine myeloma
J. Cell. Biol.
1975
, vol. 
67
 (pg. 
835
-
851
)
6
von Heijne
G
The signal peptide
J. Membr. Biol.
1990
, vol. 
115
 (pg. 
195
-
201
)
7
Scott
M
Lu
G
Hallett
M
et al. 
The Hera database and its use in the characterization of endoplasmic reticulum proteins
Bioinformatics
2004
, vol. 
20
 (pg. 
937
-
944
)
8
Tsang
A
Butler
G
Powlowski
J
et al. 
Analytical and computational approaches to define the Aspergillus niger secretome
Fungal Genetics Biol.
2009
, vol. 
46
 (pg. 
S153
-
S160
)
9
Chen
P
Sapperstein
SK
Choi
JD
et al. 
Biogenesis of the Saccharomyces cerevisiae mating pheromone a-factor
J. Cell. Biol.
1997
, vol. 
136
 (pg. 
251
-
269
)
10
Boulianne
RP
Liu
Y
Aebi
M
et al. 
Fruiting body development in Coprinus cinereus: regulated expression of two galectins secreted by a non-classical pathway
Microbiology
2000
, vol. 
146
 (pg. 
1841
-
1853
)
11
Greenbaum
D
Luscombe
NM
Jansen
R
et al. 
Interrelating different types of genomic data, from proteome to secretome: ‘oming in on function
Genome Res.
2001
, vol. 
11
 (pg. 
1463
-
1468
)
12
Hathout
Y
Approaches to the study of the cell secretome
Expert Rev. Proteomics
2007
, vol. 
4
 (pg. 
239
-
248
)
13
Tjalsma
H
Bolhuis
A
Jongbloed
JD
et al. 
Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome
Microbiol. Mol. Biol. Rev.
2000
, vol. 
64
 (pg. 
515
-
547
)
14
Simpson
JC
Mateos
A
Pepperkok
R
Maturation of the mammalian secretome
Genome Biol.
2007
, vol. 
8
 pg. 
211
 
15
Bouws
H
Wattenberg
A
Zorn
H
Fungal secretomes-nature's toolbox for white biotechnology
Appl. Microbiol. Biotechnol.
2008
, vol. 
80
 (pg. 
381
-
388
)
16
Lee
SA
Wormsley
S
Kamoun
S
et al. 
An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms
Yeast
2003
, vol. 
20
 (pg. 
595
-
610
)
17
Wymelenberg
AV
Sabat
G
Martinez
D
et al. 
The Phanerochaete chrysosporium secretome: database predictions and initial mass spectrometry peptide identifications in cellulose-grown medium
J. Biotechnol.
2005
, vol. 
118
 (pg. 
17
-
34
)
18
Yajima
W
Kav
NN
The proteome of the phytopathogenic fungus Sclerotinia sclerotiorum
Proteomics
2006
, vol. 
6
 (pg. 
5995
-
6007
)
19
Paper
JM
Scott-Craig
JS
Adhikari
ND
et al. 
Comparative proteomics of extracellular proteins in vitro and in planta from the pathogenic fungus Fusarium graminearum
Proteomics
2007
, vol. 
7
 (pg. 
3171
-
3183
)
20
Mueller
O
Kahmann
R
Aguilar
G
et al. 
The secretome of the maize pathogen Ustilago maydis
Fungal Genet. Biol.
2008
, vol. 
1
 (pg. 
S63
-
S70
)
21
Pruitt
KD
Tatusova
T
Maglott
DR
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
Nucleic Acids Res.
2007
, vol. 
35
 
Database issue
(pg. 
D61
-
D65
)
22
Bendtsen
JD
Nielsen
H
von Heijne
G
et al. 
Improved prediction of signal peptides: SignalP 3.0
J. Mol. Biol.
2004
, vol. 
340
 (pg. 
783
-
795
)
23
Käll
L
Krogh
A
Sonnhammer
EL
A combined transmembrane topology and signal peptide prediction method
J. Mol. Biol.
2004
, vol. 
338
 (pg. 
1027
-
1036
)
24
Käll
L
Krogh
A
Sonnhammer
EL
Advantages of combined transmembrane topology and signal peptide prediction - the Phobius web server
Nucleic Acids Res.
2007
, vol. 
35
 
Web Server issue
(pg. 
W429
-
W432
)
25
Horton
P
Park
KJ
Obayashi
T
et al. 
WoLF PSORT: protein localization predictor
Nucleic Acids Res.
2007
, vol. 
35
 
Web Server issue
(pg. 
W585
-
W587
)
26
Sprenger
J
Fink
JL
Teasdale
RD
Evaluation and comparison of mammalian subcellular localization prediction methods
BMC Bioinformatics
2006
, vol. 
7
 
Suppl. 5
pg. 
S3
 
27
Olof Emanuelsson
O
Henrik Nielsen
H
Brunak
S
et al. 
Predicting subcellular localization of proteins based on their N-terminal amino acid sequence
J. Mol. Biol.
2000
, vol. 
300
 (pg. 
1005
-
1016
)
28
Min
XJ
Development of computational protocols for secreted protein prediction in different eukaryotes
J. Proteomics Bioinform.
2010
, vol. 
4
 (pg. 
143
-
147
)
29
Emanuelsson
O
Brunak
S
von Heijne
G
et al. 
Locating proteins in the cell using TargetP, SignalP and related tools
Nat. Protoc.
2007
, vol. 
2
 (pg. 
953
-
971
)
30
de Castro
E
Sigrist
CJ
Gattiker
A
et al. 
ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins
Nucleic Acids Res.
2006
, vol. 
34
 
Web Server issue
(pg. 
W362
-
W365
)
31
Poisson
G
Chauve
C
Chen
X
et al. 
FragAnchor a large scale all Eukaryota predictor of Glycosylphosphatidylinositol-anchor in protein sequences by qualitative scoring
Genomics, Proteomics Bioinform.
2007
, vol. 
5
 (pg. 
121
-
130
)
32
de Groot
PW
Ram
AF
Klis
FM
Features and functions of covalently linked proteins in fungal cell walls
Fungal Genet. Biol.
2005
, vol. 
42
 (pg. 
657
-
675
)
33
Wu
CH
Apweiler
R
Bairoch
A
et al. 
The Universal Protein Resource (UniProt): an expanding universe of protein information
Nucleic Acids Res.
2006
, vol. 
34
 
Database issue
(pg. 
D187
-
D191
)
34
Martinez
D
Challacombe
J
Morgenstern
I
et al. 
Genome, transcriptome, and secretome analysis of wood decay fungus Postia placenta supports unique mechanisms of lignocellulose conversion
Proc. Natl Acad. Sci. USA
2009
, vol. 
106
 (pg. 
1954
-
1959
)
35
Marchler-Bauer
A
Anderson
JB
Chitsaz
F
et al. 
CDD: specific functional annotation with the Conserved Domain Database
Nucleic Acids Res.
2009
, vol. 
37
 
Database issue
(pg. 
D205
-
D210
)
36
McCarthy
FM
Wang
N
Magee
GB
et al. 
AgBase: a functional genomics resource for agriculture
BMC Genomics
2006
, vol. 
7
 pg. 
229
 
37
Choi
J
Park
J
Kim
D
et al. 
Fungal secretome database: integrated platform for annotation of fungal secretomes
BMC Genomics
2010
, vol. 
11
 pg. 
105
 
38
Bendtsen
JD
Jensen
LJ
Blom
N
et al. 
Feature based prediction of non-classical and leaderless protein secretion
Protein Eng. Des. Sel.
2004
, vol. 
17
 (pg. 
349
-
356
)
39
Bendtsen
JD
Kiemer
L
Fausbøll
A
et al. 
Non-classical protein secretion in bacteria
BMC Microbiol.
2005
, vol. 
5
 pg. 
58
 
This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.