Abstract

Trichophyton rubrum is the most common etiological agent of dermatophytoses worldwide, which is able to degrade keratinized tissues. The sequencing of the genome of different dermatophyte species has provided a large amount of data, including tandem repeats that may play a role in genetic variability and in the pathogenesis of these fungi. Tandem repeats are adjacent DNA sequences of 2–200 nucleotides in length, which exert regulatory and adaptive functions. These repetitive DNA sequences are found in different classes of fungal proteins, especially those involved in cell adhesion, a determinant factor for the establishment of fungal infection. The objective of this study was to develop a Dermatophyte Tandem Repeat Database (DTRDB) for the storage and identification of tandem repeats in T. rubrum and six other dermatophyte species. The current version of the database contains 35 577 tandem repeats detected in 16 173 coding sequences. The repeats can be searched using entry parameters such as repeat unit length (nt—nucleotide), repeat number, variability score, and repeat sequence motif. These data were used to study the relative frequency and distribution of repeats in the sequences, as well as their possible functions in dermatophytes. A search of the database revealed that these repeats occur in 22–33% of genes transcribed in dermatophytes where they could be involved in the success of adaptation to the host tissue and establishment of infection. The repeats were detected in transcripts that are mainly related to three biological processes: regulation, adhesion, and metabolism. The database developed enables users to identify and analyse tandem repeat regions in target genes related to pathogenicity and fungal–host interactions in dermatophytes and may contribute to the discovery of new targets for the development of antifungal agents.

Database URL:http://comp.mch.ifsuldeminas.edu.br/dtrdb/

Introduction

Dermatophytes are a group of filamentous fungi that can invade and colonize keratinized tissues in humans and animals. Infections caused by these fungi are the most common in the world (1). Dermatophytes are specialized in infecting keratinized tissues such as nails, skin and hair and can be classified according to their preferred habitat as geophilic, zoophilic and anthropophilic (2). Trichophyton rubrum is an anthropophilic dermatophyte that is responsible for ∼70% of dermatophytoses in humans (3). An aggravating factor of infection with this dermatophyte is the fact that T. rubrum can cause invasive infections in immunocompromised patients, which can become deep and generalized infections (4). Because of their clinical importance, the genomes of T. rubrum and of six other species have been sequenced and are available at http://www.broadinstitute.org/annotation/genome/dermatophyte_comparative (5), recently upgraded in ENSEMBL FUNGI: http://fungi.ensembl.org. These data are important to increase our knowledge about key aspects of the virulence of dermatophytes, their ability to colonize specific niches, and host interactions. The availability of the genomes of these dermatophytes opens the possibility for different types of analysis, including the search for tandem repeat regions which are associated with virulence and environmental adaptation in some organisms (6).

Tandem repeats are hypervariable, sequentially repeated sequences that can be classified into microsatellites (1–9 bp) or minisatellites (≥10 bp) according to the length of the repeat unit (7). Tandem repeats play an important role in the regulation of gene expression and phenotypic variation and have been associated with pathogenicity in different microorganisms, particularly yeasts such as Candida albicans (6). In Aspergillus fumigatus, Levdansky et al. (8) showed that genes with tandem repeats play a key role in the pathogen–host interaction. The role of these repeats in dermatophyte fungi is still not well understood. However, it is believed that tandem repeats increase cell–cell aggregation, especially when they are found in regions that encode cell surface proteins such as adhesins. Minisatellites (>9 bp) present in these proteins can trigger recombination events and the formation of new adhesins, providing the fungus with a rich repertoire of properties, conferring phenotypic plasticity and permitting rapid adaptation to stressful environments (9). For example, in Saccharomyces cerevisiae, variations in repeat number were positively associated with the ability to increase cell adhesion (10). Richard and Dujon (11), studying minisatellite repeats, reported that 50–60% of the genes encoding cell wall and cell adhesion proteins in fungi contained this type of tandem repeat.

It should be noted that, because of their conservation in evolution, tandem repeats are not found in all genes, but rather tend to be present in genes that respond to changes in environmental conditions. Consequently, some of these tandem repeats can serve as a mechanism of adaptation to the environment by mediating phenotypic alterations and favoring pathogen–host interactions (7).

In dermatophytes, adhesins are the determinants of infection of the host cell and are therefore key factors for the virulence of these fungi (12). During the early stage of infection with dermatophytes, the conidia must overcome the innate defense mechanisms of the host and adhere to the epidermis, followed by germination of the arthroconidia and hyphal penetration of the stratum corneum. During the adhesion of arthroconidia to the surface of the stratum corneum, long fibrillar structures are formed, which seem to anchor and connect the arthroconidium to the tissue surface, preventing their removal from the host tissue (13).

Recently, microarray gene expression data of T. rubrum grown in culture medium with keratin have shown strong induction of a gene that encodes a hypothetical protein. In silico analysis of this sequence revealed an adhesin-like protein rich in tandem repeat sequences of glycine, glutamine and proline, which is characterized by the presence of mucin, flocculin and collagen domains. The similarity of the sequence of this protein with other cell surface proteins of pathogenic fungi such as Aspergillus fumigatus and Metarhizium anisopliae, which are potentially related to virulence, adhesion and germination, support the role of this putative adhesin in pathogen–host interactions. These data were further evaluated by gene expression analysis using quantitative PCR during the interaction of T. rubrum conidia with human keratinocytes. The results showed expressive induction of the gene encoding the putative adhesion at 6 and 24 h of fungal infection, suggesting its importance for virulence-related processes and fungus–host interactions (14).

Within this context, the objective of this study was to develop a Dermatophyte Tandem Repeat Database (DTRDB) and a pipeline for automation of the processes of identification and storage of these repeats using different technologies. This database was used to identify and analyse tandem repeat regions in target coding genes related to pathogenicity and parasite–host interactions in dermatophyte species, particularly T. rubrum.

Materials and methods

Construction of the database

The MySQL relational database management system was used for storage of the data. A front-end web interface was developed using web technologies such as HTML, CSS, JQuery and ASP.NET Web Forms (C# language) for communication with the database. The database was constructed using a 3-tier architecture, including the user interface, the code and the database. In addition to the tables responsible for storing the data, the database possesses SQL queries for manipulation of the data in stored procedures. The Entity Relationship Diagram is available as supplementary data (Supplementary Figure S1). DTRDB runs on a Windows Server 2012 operation system with the Microsoft IIS web server. The tools used for identification of tandem repeats in the pipeline run on an Ubuntu Linux server.

Identification of repeats

The analysis was limited to tandem repeat arrangements in coding sequences. The Tandem Repeat Finder algorithm was used for the identification of intragenic repeats using sequences of transcribed genes present in public databases (15). The following parameters defined based on the studies of Legendre et al. (16) and Vinces et al. (17) were used: matching weight 2, mismatching penalty 5, indel penalty 5, match probability 0.8, indel probability 0.1, score ≥40, and maximum period 500. These parameters can be used to identify perfect and degenerate repeats. For analysis of repeat variability, a variability score was calculated for each repeat using the SERV algorithm (16). The repeats were divided into variability groups in which repeats with a score of 1 or higher (VARScore ≥ 1) are classified as highly mutable and repeats with a score between 0 and 1 as variable (18).

Conservation of repeats

Conservation of the repeats between species was analysed by local alignment with the Blast tool using an e-value of 1e−05 (19). Repeats showing identity to at least one species were defined as conserved. The percentage of conservation was calculated by dividing the number of identity repeats by the total number of repeats in the organism.

Sequences of transcribed genes

The fungal transcriptome of Trichophyton rubrum CBS 118892, Trichophyton tonsurans CBS 112818, Trichophyton equinum CBS 127.97, Microsporum gypseum CBS 118893, Microsporum canis CBS 113480, Arthroderma benhamiae CBS 112371, and Trichophyton verrucosum HKI 0517 analysed in this study were obtained from the Broad Institute internet site at http://www.broadinstitute.org/annotation/genome/dermatophyte_comparative in May 2014. These data are also available in public databases such as NCBI.

Functional annotation

Functional annotations were generated for all transcripts of T. rubrum with variable tandem repeats using the Blast2Go tool (20) and stringent parameters (e-value of 1e−05). In addition, fungal adhesins were predicted using the FaaPred tool (12), with a threshold ≥0.5.

Results and discussion

Using a web browser, the DTRDB database provides interactive access not only to the stored data, but also to a pipeline that automates the identification and storage of tandem repeats in submitted sequences available through an intranet (Figure 1). The database currently contains 35 577 tandem repeats identified in 16 173 sequences of coding genes of seven dermatophyte species. A web-based user interface divided into two main modules was developed: ‘Submit Sequences’ (intranet) and ‘Browse’ (open).

Schematic representation of the architecture of the Dermatophyte Tandem Repeat Database.
Figure 1.

Schematic representation of the architecture of the Dermatophyte Tandem Repeat Database.

The ‘Submit Sequences’ module enables to send sequences through the intranet for the identification and storage of tandem repeats (Figure 2A). The ‘Browse’ module provides three types of queries for the stored repeats: (i) ‘Profile Repeats’ enables visualization of the profile of stored tandem repeats by selecting a species. This profile contains information such as the number of repeats identified, genes with the most variable repeats and distribution of repeats per unit, and enables users to download the dataset of the stored data (Figure 2B). (ii) ‘Query Repeats’ permits to search genes containing repeats that meet entry parameters such as repeat unit, exponent (repeat unit copy number) and variability score. Once a gene has been selected, the repeats it contains are shown. A repeat can then be selected and it is verified whether this motif is found in any other gene stored in the database. Additionally, it is possible to access information of the selected gene through integration with the NCBI website (Figure 2C). (iii) ‘Search Gene Repeats’ enables to search repeats based on the gene identifier (Broad Institute pattern) or keyword present in its annotation (Figure 2D). In the case of T. rubrum, the stored functional categories according to the Gene Ontology (21), PFAM (22) and MIPS PEDANT Funcat (23) terms are also shown.

Screens of the web pipeline. (A) Submission form of the fasta file for the identification and storage of repetitive sequences. (B) Query of repeats and functional information of the dermatophyte Trichophyton rubrum. (C) Profile of repeats existing in a certain organism. (D) Query of repeats in an organism using filters.
Figure 2.

Screens of the web pipeline. (A) Submission form of the fasta file for the identification and storage of repetitive sequences. (B) Query of repeats and functional information of the dermatophyte Trichophyton rubrum. (C) Profile of repeats existing in a certain organism. (D) Query of repeats in an organism using filters.

Pipeline

The DTRDB allows to perform the following basic tasks: (i) identification of tandem repeats using a fasta file submitted via the web interface (intranet); (ii) storage of the repeats in a relational database; (iii) search of repeat patterns using filters such as unit size, length, and conservation; (iv) visualization of the repeat profile in a certain stored organism, and (v) search of functional information about genes of the dermatophyte T. rubrum. The pipeline (Figure 2) is available (the submission of files is only possible via an intranet) at http://comp.mch.ifsuldeminas.edu.br/dtrdb.

Profile of tandem repeats in dermatophytes

The pipeline developed enabled us to identify, store and query tandem repeats in T. rubrum and related dermatophytes (Trichophyton tonsurans, Trichophyton equinum, Microsporum gypseum, Microsporum canis, Arthroderma benhamiae, and Trichophyton verrucosum) obtained from the Broad Institute internet site (2014).

As can be seen in Table 1, the number of repeats identified ranged from 3724 in M. canis to 6720 in A. benhamiae. No correlation was observed between the size or number of sequences and the number of tandem repeats. T. rubrum exhibited 4616 repeats in 10 416 transcribed genes (13.54 Mb), while 6720 repeats were identified in 7980 transcribed gene sequences (11.83 Mb) of A. benhamiae. Similar results have been reported by Mayer; Leese and Tollrian (24). A total of 4616 tandem repeats were identified in T. rubrum genome. However, the genome assembly is still incomplete and may suffer alterations. Furthermore, it should be taken into consideration that the quantifications and percentages presented are not accurate. The DTRDB database showed that these repeat are distributed across 2348 sequences of a total of 10 418 transcribed genes, corresponding to a repeat density of 22.53% in the sequences of transcribed genes. Of these 4616 repeats, 4191 were identified in 2075 hypothetical genes, while the remaining 425 repeats were identified in 273 previously annotated sequences. Thus, the tandem repeats were predominantly concentrated in hypothetical transcribed genes.

Table 1.

Profile of tandem repeats in transcribed genes of dermatophytes

T. rubrumT. tonsuransT. equinumM. gypseumM. canisT. verrucosumA. benhamiaeSize of transcribed genes (Mba)13.5412.0011.9012.7913.0011.7811.83Number of transcribed genesb10 418852386798907891580247980Repeats4616451846344829372465366720Conservationc19.5%43.25%42.25%2.96%2.44%22.77%22.02%Largest repeat unit228405309378296220233
a

One million base pairs or megabase pair.

b

Number of sequences of transcribed genes obtained from the Broad Institute site in October 2014.

c

Percentage of repeat conservation in relation to the other species.

Table 1.

Profile of tandem repeats in transcribed genes of dermatophytes

T. rubrumT. tonsuransT. equinumM. gypseumM. canisT. verrucosumA. benhamiaeSize of transcribed genes (Mba)13.5412.0011.9012.7913.0011.7811.83Number of transcribed genesb10 418852386798907891580247980Repeats4616451846344829372465366720Conservationc19.5%43.25%42.25%2.96%2.44%22.77%22.02%Largest repeat unit228405309378296220233
a

One million base pairs or megabase pair.

b

Number of sequences of transcribed genes obtained from the Broad Institute site in October 2014.

c

Percentage of repeat conservation in relation to the other species.

The pipeline enabled us to obtain the distribution of repeats according to repeat unit. Table 2 shows the number of repeat units that occurs at least 10 times in the coding gene sequence. The relative abundance in megabase was calculated by dividing the number of repeats by the size of the transcribed genes in megabase (Mb).

Table 2.

Occurrence of tandem repeat units and relative abundancea

UnitT. rubrumT. tonsuransT. equinumM. gypseumM. canisT. verrucosumA. benhamiae211 (0.8)74 (5.5)91 (6.7)3471 (34.8)536 (39.6)527 (38.9)392 (29)259 (19.1)848 (62.6)902 (66.6)417 (1.3)23 (1.7)19 (1.4)65 (4.9)67 (4.9)514 (1.3)10 (0.7)20 (1.5)25 (1.8)70 (5.2)59 (4.4)6459 (33.9)498 (36.8)513 (37.9)554 (40.9)271 (20.1)684 (50.5)710 (52.4)714 (1.3)27 (2)29 (2.1)34 (2.5)12 (0.9)68 (5.2)67 (4.9)824 (1.8)42 (3.1)45 (3.3)39 (2.9)23 (1.7)136 (10.4)122 (9.1)9578 (42.7)561 (41.4)561 (41.4)584 (43.1)420 (31.2)728 (53.8)713 (52.7)1054 (4)62 (4.6)77 (5.7)71 (5.2)50 (3.7)139 (10.3)127 (9.4)11101 (7.5)108 (8)115 (8.5)122 (9.1)124 (9.2)221 (16.3)241 (17.8)12927 (68.5)874 (64.5)881 (65.7)919 (67.9)757 (56)1050 (77.5)1087 (80.3)13153 (11.3)119 (8.8)128 (9.5)147 (10.9)130 (9.6)192 (14.2)203 (15)14129 (9.5)119 (8.8)127 (9.4)136 (10.4)109 (8.5)186 (13.7)189 (14)15497 (36.8)460 (34)464 (34.3)501 (37.1)423 (31.2)576 (42.5)584 (43.1)1692 (6.8)74 (5.5)75 (5.5)71 (5.2)67 (4.9)116 (8.6)132 (9.7)1744 (3.2)46 (3.4)54 (4)56 (4.1)53 (3.9)89 (6.6)89 (6.6)18359 (26.5)299 (22.8)316 (23.3)352 (26)298 (22.9)395 (29.2)402 (29.7)1933 (2.4)27 (2)31 (2.3)26 (1.9)23 (1.7)54 (4)75 (5.5)2030 (2.2)25 (1.8)28 (2.7)43 (3.2)21 (1.6)57 (4.3)61 (4.6)21211 (15.6)190 (14.3)199 (14.7)191 (14.2)191 (14.2)220 (16.2)212 (15.7)2220 (1.5)29 (2.1)26 (1.9)40 (3)22 (1.6)43 (3.2)45 (3.3)2323 (1.7)18 (1.3)16 (1.2)24 (1.8)20 (1.5)37 (2.7)36 (2.7)24121 (8.9)105 (7.8)99 (7.3)120 (8.9)129 (9.5)144 (10.6)148 (10.9)2512 (0.9)21 (1.6)18 (1.3)2611 (0.8)20 (1.5)19 (1.4)2753 (3.9)46 (3.4)45 (3.3)66 (4.9)45 (3.3)64 (4.7)55 (4.6)2811 (0.8)20 (1.5)2911 (0.8)3043 (3.2)53 (3.9)43 (3.2)38 (2.9)35 (2.6)48 (3.5)53 (3.9)3110 (0.7)3323 (1.7)17 (1.3)22 (1.6)30 (2.2)20 (1.5)17 (1.3)18 (1.3)3620 (1.5)20 (1.5)22 (1.6)17 (1.3)21 (1.6)31 (2.3)22 (1.6)3918 (1.3)12 (0.9)11 (0.8)17 (1.3)4217 (1.3)14 (1.3)12 (0.9)13 (1)14 (1.3)19 (1.4)4515 (1.2)11 (0.8)13 (1)4814 (1.3)5110 (0.7)
a

Relative abundance (in parentheses) is the total number of repeats per megabase of the sequence analysed. The table shows only tandem repeats where the repeat unit occurs at least 10 times (complete data is available in DTRDB).

Table 2.

Occurrence of tandem repeat units and relative abundancea

UnitT. rubrumT. tonsuransT. equinumM. gypseumM. canisT. verrucosumA. benhamiae211 (0.8)74 (5.5)91 (6.7)3471 (34.8)536 (39.6)527 (38.9)392 (29)259 (19.1)848 (62.6)902 (66.6)417 (1.3)23 (1.7)19 (1.4)65 (4.9)67 (4.9)514 (1.3)10 (0.7)20 (1.5)25 (1.8)70 (5.2)59 (4.4)6459 (33.9)498 (36.8)513 (37.9)554 (40.9)271 (20.1)684 (50.5)710 (52.4)714 (1.3)27 (2)29 (2.1)34 (2.5)12 (0.9)68 (5.2)67 (4.9)824 (1.8)42 (3.1)45 (3.3)39 (2.9)23 (1.7)136 (10.4)122 (9.1)9578 (42.7)561 (41.4)561 (41.4)584 (43.1)420 (31.2)728 (53.8)713 (52.7)1054 (4)62 (4.6)77 (5.7)71 (5.2)50 (3.7)139 (10.3)127 (9.4)11101 (7.5)108 (8)115 (8.5)122 (9.1)124 (9.2)221 (16.3)241 (17.8)12927 (68.5)874 (64.5)881 (65.7)919 (67.9)757 (56)1050 (77.5)1087 (80.3)13153 (11.3)119 (8.8)128 (9.5)147 (10.9)130 (9.6)192 (14.2)203 (15)14129 (9.5)119 (8.8)127 (9.4)136 (10.4)109 (8.5)186 (13.7)189 (14)15497 (36.8)460 (34)464 (34.3)501 (37.1)423 (31.2)576 (42.5)584 (43.1)1692 (6.8)74 (5.5)75 (5.5)71 (5.2)67 (4.9)116 (8.6)132 (9.7)1744 (3.2)46 (3.4)54 (4)56 (4.1)53 (3.9)89 (6.6)89 (6.6)18359 (26.5)299 (22.8)316 (23.3)352 (26)298 (22.9)395 (29.2)402 (29.7)1933 (2.4)27 (2)31 (2.3)26 (1.9)23 (1.7)54 (4)75 (5.5)2030 (2.2)25 (1.8)28 (2.7)43 (3.2)21 (1.6)57 (4.3)61 (4.6)21211 (15.6)190 (14.3)199 (14.7)191 (14.2)191 (14.2)220 (16.2)212 (15.7)2220 (1.5)29 (2.1)26 (1.9)40 (3)22 (1.6)43 (3.2)45 (3.3)2323 (1.7)18 (1.3)16 (1.2)24 (1.8)20 (1.5)37 (2.7)36 (2.7)24121 (8.9)105 (7.8)99 (7.3)120 (8.9)129 (9.5)144 (10.6)148 (10.9)2512 (0.9)21 (1.6)18 (1.3)2611 (0.8)20 (1.5)19 (1.4)2753 (3.9)46 (3.4)45 (3.3)66 (4.9)45 (3.3)64 (4.7)55 (4.6)2811 (0.8)20 (1.5)2911 (0.8)3043 (3.2)53 (3.9)43 (3.2)38 (2.9)35 (2.6)48 (3.5)53 (3.9)3110 (0.7)3323 (1.7)17 (1.3)22 (1.6)30 (2.2)20 (1.5)17 (1.3)18 (1.3)3620 (1.5)20 (1.5)22 (1.6)17 (1.3)21 (1.6)31 (2.3)22 (1.6)3918 (1.3)12 (0.9)11 (0.8)17 (1.3)4217 (1.3)14 (1.3)12 (0.9)13 (1)14 (1.3)19 (1.4)4515 (1.2)11 (0.8)13 (1)4814 (1.3)5110 (0.7)
a

Relative abundance (in parentheses) is the total number of repeats per megabase of the sequence analysed. The table shows only tandem repeats where the repeat unit occurs at least 10 times (complete data is available in DTRDB).

It can be observed that the largest number of tandem repeats in transcribed genes of dermatophytes are found in repeat units that are divisible by three. Consequently, the most prevalent repeats do not alter the reading frame, suggesting that they generate proteins with repetitive patterns (25). Indeed, Figure 3 shows that the repeats are mainly found in repeat units that are divisible by three, especially 3–21 bp, which account for ∼70% of all repeats in dermatophytes.

Distribution of tandem repeats according to repeat unit length (nt).
Figure 3.

Distribution of tandem repeats according to repeat unit length (nt).

Different parameters have been used in studies investigating tandem repeats in different fungal species (26), but no studies are available for dermatophytes. Karaoglu and Meyer (27) conducted a survey of perfect short tandem repeats (1–6 bp per repeat unit) with a minimum length of 10 bp in the genome of nine fungal species using a Python-based algorithm specifically developed for their study. The authors identified 14 319 repeats in the genome of Neurospora crassa (38 Mb), with a relative abundance of 377 repeats per megabase. In contrast, another study identified 13 292 short repeats (1–6 bp per repeat unit) in the genome of Neurospora crassa using the Phobos tool developed by the authors; however, imperfect repeats were also considered (24).

The patterns of the most abundant tandem repeats in transcribed genes are similar in all dermatophytes. The CAG repeat is the most frequent in all dermatophyte species. The same was observed by Singh et al. (28) in the genome of Puccinia triticina. Huntley and Clark (29), who analysed the genome of 12 different organisms, found the CAG repeat to be the most prevalent in coding regions of the genome of Drosophila. Table 3 shows the most prevalent repeats (>20 occurrences) in transcribed genes of seven dermatophyte species.

Table 3.

Most prevalent patterns of repeats

T. rubrumRepeatUnit lengthNumber CAG3125 GCA351 CAA335 CAGCAGCAA928 AGC325T. tonsurans CAG398 GCA348 CAA346 AGA331 AGC327 GAA325T. equinum CAG396 CAA346 GCA343 GAA331 AGA328 AGC326M. gypseum CAG381 GAA338 GCA334 AGA329 CAA322M. canis CAG371 GAA325T. verrucosum CAG3109 GAA394 AGA373 AAG360 TCT348 GCA345 CAA344 CTT336 TTC334 AGC329 CTG325 ACA322 AG222 CAGCAA621A. benhamiae CAG3123 GAA3105 AGA373 AAG356 TCT347 CTT346 CAA342 GCA342 TTC333 CTG330 AG229 TC226 CAGCAA621
Table 3.

Most prevalent patterns of repeats

T. rubrumRepeatUnit lengthNumber CAG3125 GCA351 CAA335 CAGCAGCAA928 AGC325T. tonsurans CAG398 GCA348 CAA346 AGA331 AGC327 GAA325T. equinum CAG396 CAA346 GCA343 GAA331 AGA328 AGC326M. gypseum CAG381 GAA338 GCA334 AGA329 CAA322M. canis CAG371 GAA325T. verrucosum CAG3109 GAA394 AGA373 AAG360 TCT348 GCA345 CAA344 CTT336 TTC334 AGC329 CTG325 ACA322 AG222 CAGCAA621A. benhamiae CAG3123 GAA3105 AGA373 AAG356 TCT347 CTT346 CAA342 GCA342 TTC333 CTG330 AG229 TC226 CAGCAA621

Figure 4 shows the results grouped according to repeat unit lengths of 1–10 bp, 11–100 bp, and >100 bp. There was a predominance of minisatellites, especially considering repeats with <40 bp per unit. In addition, the number of repeats decreases with increasing unit length. This finding has also been reported by Gibbons and Rokas (30) who analysed tandem repeats in intragenic regions of 10 Aspergillus genomes.

Relative abundance of grouped repeats.
Figure 4.

Relative abundance of grouped repeats.

Variable number of tandem repeats in Trichophyton rubrum

Genome studies on the sources of phenotypic variation have mainly focused on single nucleotide polymorphisms (SNPs) (31). In this study, we intended to identify and describe variable tandem repeats in T. rubrum. We hypothesized that these repeats can influence phenotypes by causing instability in important genes of this organism. Among 10 418 transcribed genes, 453 contain variable repeats (VARScore between 0 and 1) and 68 contain highly variable repeats (VARScore ≥ 1). Supplementary Table S1 (Supplemental Material) lists annotated (tentative) genes containing variable repeats and their respective functional categories. Table 4 shows the variation in tandem repeats between some genes of dermatophytes involved in different processes. The genes rich in variable repeats are related to different biological functions such as transcription factors, cell wall biosynthesis, and cell adhesion as shown in Figure 5.

Table 4.

Variable repeats

T. rubrumT. tonsuransT. equinumM. gypseumM. canisT. verrucosumA. benhamiaeGene nameRepeatVARScoreTERG_0877145×23.44.9745×545×9NoNo45×4.5NoTERG_0076812×6.80.33No12×3.8NoNoNo12×5.8TERG_037366×54.71.776×39.76×44.76×48.7No6×28.7NoTERG_051896×43.51.436×42.56×42.56×23.56×12.5NoNoTERG_010423×271.113×153×15NoNo3×203×12
Table 4.

Variable repeats

T. rubrumT. tonsuransT. equinumM. gypseumM. canisT. verrucosumA. benhamiaeGene nameRepeatVARScoreTERG_0877145×23.44.9745×545×9NoNo45×4.5NoTERG_0076812×6.80.33No12×3.8NoNoNo12×5.8TERG_037366×54.71.776×39.76×44.76×48.7No6×28.7NoTERG_051896×43.51.436×42.56×42.56×23.56×12.5NoNoTERG_010423×271.113×153×15NoNo3×203×12

Functional categories of genes containing variable tandem repeats.
Figure 5.

Functional categories of genes containing variable tandem repeats.

Tandem repeats in adhesins

Approximately 10% of all coding sequences of T. rubrum that contain tandem repeats were classified as adhesins by the FaaPred tool and these repeats are strongly related to the adhesion capacity of these proteins (10). Different known fungal adhesins are rich in variable tandem repeats and have been extensively studied in Candida albicans. In the ALS family of C. albicans, Hoyer et al. (32) found the number of copies of the tandem repeat in the central domain of each ALS gene to vary between isolates. Oh et al. (33) showed that adhesins with more repeat units have a greater adhesion capacity than those with fewer repeat units. In Aspergillus fumigatus, Levdansky et al. (34) demonstrated that genes containing tandem repeats play an important role in the pathogen–host interaction. The authors disrupted the Afu3g08990 gene, which contains an 18-bp tandem repeat unit that repeats itself 32 times. Suppression of the protein previously characterized as hypothetical resulted in a phenotype with lower adhesion capacity.

Conclusion

The results of the present study enabled the identification and categorization of different genes containing variable repetitive regions in T. rubrum. The genes rich in variable tandem repeats are related to different biological functions such as transcription factors, cell wall biosynthesis, and cell adhesion. The database for analysis of tandem repeats in dermatophytes allowed access to these repetitive patterns in coding regions of the genome of recently sequenced dermatophytes, permitting a better understanding of the nature and functional role of genes containing tandem repeats. The different tandem repeat patterns identified may reveal new molecular targets for the discovery of antifungal drugs and should increase our understanding of the role of these repetitive sequences in the pathogenicity of dermatophytes.

Supplementary data

Supplementary data are available at Database Online.

Acknowledgements

This study was supported by grants from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, Grants 2014/23841-3 and 2012/03845-9), a doctoral fellowship from FAPESP granted to T.A.B. (Grant 2012/02920-7), and a doctoral fellowship from CAPES granted to M.E.F. We thank the staff of the Biotechnology Unit, UNAERP, and of Federal Institute of Education, Science and Technology of South of Minas Gerais - IFSULDEMINAS, for their general support.

Conflict of interest. None declared.

References

1

Garber

G.
(2001) An overview of fungal infections.
Drugs
, 61(Suppl 1), 1–12.

2

Weitzman

I.
, Summerbell
R.C.
(1995) The dermatophytes.
Clin. Microbiol. Rev
., 8, 240–259.

3

Leng

W.
, Liu
T.
, Li
R.
(2008) Proteomic profile of dormant Trichophyton rubrum conidia.
BMC Genomics
, 9, 303.

4

Marconi

V.C.
, Kradin
R.
, Marty
F.M.
(2010) Disseminated dermatophytosis in a patient with hereditary hemochromatosis and hepatic cirrhosis: case report and review of the literature.
Med. Mycol
., 48, 518–527.

5

Martinez

D.A.
, Oliver
B.G.
, Graser
Y.
(2012) Comparative genome analysis of Trichophyton rubrum and related dermatophytes reveals candidate genes involved in infection.
mBio
, 3, e00259–e00212.

6

Levdansky

E.
, Kashi
O.
, Sharon
H.
(2010) The Aspergillus fumigatus cspA gene encoding a repeat-rich cell wall protein is important for normal conidial cell wall architecture and interaction with host cells.
Eukaryot. Cell
, 9, 1403–1415.

7

Gemayel

R.
, Cho
J.
, Boeynaems
S.
, Verstrepen
K.J.
(2012) Beyond junk-variable tandem repeats as facilitators of rapid evolution of regulatory and coding sequences.
Genes
, 3, 461–480.

8

Levdansky

E.
, Sharon
H.
, Osherov
N.
(2008) Coding fungal tandem repeats as generators of fungal diversity.
Fungal Biol. Rev
., 22, 85–96.

9

Verstrepen

K.J.
, Klis
F.M.
(2006) Flocculation, adhesion and biofilm formation in yeasts.
Mol. Microbiol
., 60, 5–15.

10

Verstrepen

K.J.
, Jansen
A.
, Lewitter
F.
, Fink
G.R.
(2005) Intragenic tandem repeats generate functional variability.
Nat. Genet
., 37, 986–990.

11

Richard

G.F.
, Dujon
B.
(2006) Molecular evolution of minisatellites in hemiascomycetous yeasts.
Mol. Biol. Evol
., 23, 189–202.

12

Ramana

J.
, Gupta
D.
(2010) FaaPred: a SVM-based prediction method for fungal adhesins and adhesin-like proteins.
PloS One
, 5, e9695.

13

Nenoff

P.
, Kruger
C.
, Ginter-Hanselmayer
G.
, Tietz
H.J.
(2014) Mycology – an update. Part 1: dermatomycoses: causative agents, epidemiology and pathogenesis.
J. Dtsch. Dermatol. Ges
., 12, 188–209. quiz 210, 188–211; quiz 212.

14

Bitencourt

T.A.
, Macedo
C.
, Franco
M.E.
(2016) Transcription profile of Trichophyton rubrum conidia grown on keratin reveals the induction of an adhesin-like protein gene with a tandem repeat pattern.
BMC Genomics
, 17, 249.

15

Benson

G.
(1999) Tandem repeats finder: a program to analyze DNA sequences.
Nucleic Acids Res
., 27, 573–580.

16

Legendre

M.
, Pochet
N.
, Pak
T.
, Verstrepen
K.J.
(2007) Sequence-based estimation of minisatellite and microsatellite repeat variability.
Genome Res
., 17, 1787–1796.

17

Vinces

M.D.
, Legendre
M.
, Caldara
M.
(2009) Unstable tandem repeats in promoters confer transcriptional evolvability.
Science
, 324, 1213–1216.

18

Duitama

J.
, Zablotskaya
A.
, Gemayel
R.
(2014) Large-scale analysis of tandem repeat variability in the human genome.
Nucleic Acids Res
., 42, 5728–5741.

19

Sharon

I.
, Birkland
A.
, Chang
K.
(2005) Correcting BLAST e-values for low-complexity segments.
J. Comput. Biol
., 12, 980–1003.

20

Conesa

A.
, Götz
S.
, Garcia-Gomez
J.M.
(2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research.
Bioinformatics
, 21, 3674–3676.

21

Ashburner

M.
, Ball
C.A.
, Blake
J.A.
(2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
Nat. Genet
., 25, 25–29.

22

Punta

M.
, Coggill
P.C.
, Eberhardt
R.Y.
(2012) The Pfam protein families database.
Nucleic Acids Res
., 40, D290–D301.

23

Ruepp

A.
, Zollner
A.
, Maier
D.
(2004) The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes.
Nucleic Acids Res
., 32, 5539–5545.

24

Mayer

C.
, Leese
F.
, Tollrian
R.
(2010) Genome-wide analysis of tandem repeats in Daphnia pulex – a comparative approach.
BMC Genomics
, 11, 277.

25

Metzgar

D.
, Bytof
J.
, Wills
C.
(2000) Selection against frameshift mutations limits microsatellite expansion in coding DNA.
Genome Res
., 10, 72–80.

26

Leclercq

S.
, Rivals
E.
, Jarne
P.
(2007) Detecting microsatellites within genomes: significant variation among algorithms.
BMC Bioinformatics
, 8, 125.

27

Karaoglu

H.
, Lee
C.M.
, Meyer
W.
(2005) Survey of simple sequence repeats in completed fungal genomes.
Mol. Biol. Evol
., 22, 639–649.

28

Singh

R.
, Pandey
B.
, Danishuddin
M.
(2011) Mining and survey of simple sequence repeats in wheat rust Puccinia sp.
Bioinformation
, 7, 291–295.

29

Huntley

M.A.
, Clark
A.G.
(2007) Evolutionary analysis of amino acid repeats across the genomes of 12 Drosophila species.
Mol. Biol. Evol
., 24, 2598–2609.

30

Gibbons

J.G.
, Rokas
A.
(2009) Comparative and functional characterization of intragenic tandem repeats in 10 Aspergillus genomes.
Mol. Biol. Evol
., 26, 591–602.

31

Rando

O.J.
, Verstrepen
K.J.
(2007) Timescales of genetic and epigenetic inheritance.
Cell
, 128, 655–668.

32

Hoyer

L.L.
, Green
C.B.
, Oh
S.H.
, Zhao
X.
(2008) Discovering the secrets of the Candida albicans agglutinin-like sequence (ALS) gene family – a sticky pursuit.
Med. Mycol
., 46, 1–15.

33

Oh

S.H.
, Cheng
G.
, Nuessen
J.A.
(2005) Functional specificity of Candida albicans Als3p proteins and clade specificity of ALS3 alleles discriminated by the number of copies of the tandem repeat sequence in the central domain.
Microbiology
, 151, 673–681.

34

Levdansky

E.
, Romano
J.
, Shadkchan
Y.
(2007) Coding tandem repeats generate diversity in Aspergillus fumigatus genes.
Eukaryot. Cell
, 6, 1380–1391.

Author notes

Citation details: Franco, M.E., Bitencourt, T.A., Marins, M. et al.In silico characterization of tandem repeats in Trichophyton rubrum and related dermatophytes provides new insights into their role in pathogenesis. Database (2017) Vol. 2017: article ID bax035; doi:10.1093/database/bax035

© The Author(s) 2017. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data