LeishMicrosatDB: open source database of repeat sequences detected in six fully sequenced Leishmania genomes Open Access

The details about sequenced Leishmania strains, the version of sequenced genomes, annotation status for each genome, number of chromosomes

Serial number	Parasite name	Strain	RefSeq assembly ID	Number of chromosome
1	L. donovani	MHOM/NP/2003/BPK282A1	GCF_000227135.1	36
2	L. infantum	MCAN/ES/98/LLM-724(JPCM5)	GCF_000002875.2	36
3	L. braziliensis	MHOM/BR/75/M2904	GCF_000002845.1	35
4	L. major	MHOM/IL/1980/Friedlin	GCF_000002725.1	36
5	L. tarentolae	Parrot-TarII	2011-06-22	36
6	L. mexicana	MHOM/GT/2001/U1103	2013-01-16	34

Serial number	Parasite name	Strain	RefSeq assembly ID	Number of chromosome
1	L. donovani	MHOM/NP/2003/BPK282A1	GCF_000227135.1	36
2	L. infantum	MCAN/ES/98/LLM-724(JPCM5)	GCF_000002875.2	36
3	L. braziliensis	MHOM/BR/75/M2904	GCF_000002845.1	35
4	L. major	MHOM/IL/1980/Friedlin	GCF_000002725.1	36
5	L. tarentolae	Parrot-TarII	2011-06-22	36
6	L. mexicana	MHOM/GT/2001/U1103	2013-01-16	34

Table 1.

The details about sequenced Leishmania strains, the version of sequenced genomes, annotation status for each genome, number of chromosomes

Serial number	Parasite name	Strain	RefSeq assembly ID	Number of chromosome
1	L. donovani	MHOM/NP/2003/BPK282A1	GCF_000227135.1	36
2	L. infantum	MCAN/ES/98/LLM-724(JPCM5)	GCF_000002875.2	36
3	L. braziliensis	MHOM/BR/75/M2904	GCF_000002845.1	35
4	L. major	MHOM/IL/1980/Friedlin	GCF_000002725.1	36
5	L. tarentolae	Parrot-TarII	2011-06-22	36
6	L. mexicana	MHOM/GT/2001/U1103	2013-01-16	34

Serial number	Parasite name	Strain	RefSeq assembly ID	Number of chromosome
1	L. donovani	MHOM/NP/2003/BPK282A1	GCF_000227135.1	36
2	L. infantum	MCAN/ES/98/LLM-724(JPCM5)	GCF_000002875.2	36
3	L. braziliensis	MHOM/BR/75/M2904	GCF_000002845.1	35
4	L. major	MHOM/IL/1980/Friedlin	GCF_000002725.1	36
5	L. tarentolae	Parrot-TarII	2011-06-22	36
6	L. mexicana	MHOM/GT/2001/U1103	2013-01-16	34

Results and Discussion

Construction and content of LeishMicrosatDB

In order to manage the data, MySQL, a relational database management system, was used for building the database. A front-end web interface was developed using web technologies like HTML, CSS, JavaScript, DBI (Database Interface), GD (Graphics Design), CGI (Common Gateway Interface) and PERL that communicate with the relational table for data retrieval. The overall architecture of the database is a ‘three-tier architecture’ with a client/presentation tier, middle /application tier and database tier which is outlined in Figure 1 . In database tier, tables were designed, and relationships among tables were created using unique, primary and foreign keys. The SSRs identified using MISA from different Leishmania species were stored into separate tables. Each species specific table contains field like chromosome, SSR_type, SSR_motif, Rep_no, Length, Start, End, Left_flank_seq, Right_flank_seq, Gene_id and PSSR_ID ( Table 2 ). The PSSR-ID is available for those repeats that are polymorphic. The unique PSSR_ID present in ‘PSSR’ table works as a bridge between individual SSR tables. The Gene table stores genomic coordinate of each gene from each species and its orthologous gene id. This explains the overall schema of the database for efficient data storage and retrieval ( Figure 2 ).

Figure 1.

Three tier architecture of LeishMicrosatDB.

Figure 2.

Architecture and data flow representation in LeishMicrosatDB.

Table 2.

Structure of the table used in the construction of the LeishMicrosatDB

Field information	Filed name	Data type	Key	Example
Serial number	Sn	Int(20)	PRI	203
Chromosome number	Chromosome	Varchar(2)		11
Repeat type	Type	Varchar(1)		1,2,3,4,5,6
SSR motif	Ssr	Varchar(15)		ACG, GA, AGGCTGA
Repeat number	RepNo	Int(11)		12,10
Total length of the repeated sequence	Length	Int(11)		30,22
Start coordinate of the SSR	Start	Int(10)		10 223, 331 201
End coordinate of the SSR	End	Int(10)		208 871,345 129
Left flanking sequence	Upstream	Varchar(250)		AGGCTAG … AGGTAGC
Right flanking sequence	Downstream	Varchar(250)		AGCtTAG … AGTAGCAA
Gene information if found with in a gene	CodingStatus	Varchar(15)		LTR1234.2, nonCoding,
Polymorphic SSR table Serial Number (if polymorphic)	PSSR_ID	Int(20)		102, 203

Field information	Filed name	Data type	Key	Example
Serial number	Sn	Int(20)	PRI	203
Chromosome number	Chromosome	Varchar(2)		11
Repeat type	Type	Varchar(1)		1,2,3,4,5,6
SSR motif	Ssr	Varchar(15)		ACG, GA, AGGCTGA
Repeat number	RepNo	Int(11)		12,10
Total length of the repeated sequence	Length	Int(11)		30,22
Start coordinate of the SSR	Start	Int(10)		10 223, 331 201
End coordinate of the SSR	End	Int(10)		208 871,345 129
Left flanking sequence	Upstream	Varchar(250)		AGGCTAG … AGGTAGC
Right flanking sequence	Downstream	Varchar(250)		AGCtTAG … AGTAGCAA
Gene information if found with in a gene	CodingStatus	Varchar(15)		LTR1234.2, nonCoding,
Polymorphic SSR table Serial Number (if polymorphic)	PSSR_ID	Int(20)		102, 203

Table 2.

Structure of the table used in the construction of the LeishMicrosatDB

Field information	Filed name	Data type	Key	Example
Serial number	Sn	Int(20)	PRI	203
Chromosome number	Chromosome	Varchar(2)		11
Repeat type	Type	Varchar(1)		1,2,3,4,5,6
SSR motif	Ssr	Varchar(15)		ACG, GA, AGGCTGA
Repeat number	RepNo	Int(11)		12,10
Total length of the repeated sequence	Length	Int(11)		30,22
Start coordinate of the SSR	Start	Int(10)		10 223, 331 201
End coordinate of the SSR	End	Int(10)		208 871,345 129
Left flanking sequence	Upstream	Varchar(250)		AGGCTAG … AGGTAGC
Right flanking sequence	Downstream	Varchar(250)		AGCtTAG … AGTAGCAA
Gene information if found with in a gene	CodingStatus	Varchar(15)		LTR1234.2, nonCoding,
Polymorphic SSR table Serial Number (if polymorphic)	PSSR_ID	Int(20)		102, 203

Field information	Filed name	Data type	Key	Example
Serial number	Sn	Int(20)	PRI	203
Chromosome number	Chromosome	Varchar(2)		11
Repeat type	Type	Varchar(1)		1,2,3,4,5,6
SSR motif	Ssr	Varchar(15)		ACG, GA, AGGCTGA
Repeat number	RepNo	Int(11)		12,10
Total length of the repeated sequence	Length	Int(11)		30,22
Start coordinate of the SSR	Start	Int(10)		10 223, 331 201
End coordinate of the SSR	End	Int(10)		208 871,345 129
Left flanking sequence	Upstream	Varchar(250)		AGGCTAG … AGGTAGC
Right flanking sequence	Downstream	Varchar(250)		AGCtTAG … AGTAGCAA
Gene information if found with in a gene	CodingStatus	Varchar(15)		LTR1234.2, nonCoding,
Polymorphic SSR table Serial Number (if polymorphic)	PSSR_ID	Int(20)		102, 203

Web visualization of LeishMicrosatDB

LeishMicrosatDB is likely to be accessed by biologist in broad objectives, primarily to develop molecular markers, and also to understand the role of microsatellites in regulating gene expression and genome evolution. The LeishMicrosatDB allows mining of different microsatellites along with their physical location in the chromosomes in six fully sequenced Leishmania species. At present, the LeishMicrosatDB has over 1.73 million repeats covering six Leishmania genomes. More related genomes will be considered when their whole genome sequences and .ptt file be made available in the public domain.

The web interface of LeishMicrosatDB provides a brief description and links to the page that enables user to select the genome and repeat class of interest. The database can be accessed by perfect repeats, compound repeats, repeat cluster and polymorphic repeats. The perfect repeats can be searched in a chromosome using following need based input parameters likerepeat type (mono- hexa), coding status, repeat unit length and repeat sequence motif. A specific region on the chromosome can be searched by providing input parameters (start and end position). Once species and chromosome options are selected, rest of the fields is set ‘ALL’ by default. The output is primarily a list of microsatellite annotated for all option of the query sheet and the output is generated as a hierarchical pre-sorted list. Each repeat carries its genomic location and corresponding indices. The result page gives complete information of SSR motif, 250 bp left and right flanking sequences that allows user to design locus specific primers. This is facilitated by automatic uploading of repeat and flanking sequences of the selected microsatellite into Primer3 query form ( Figure 3 ). At the bottom of the result page, repeat density map shows the distribution of repeats throughout the chromosome. Apart from the simple sequence repeats or perfect repeats, the database can be accessed for compound microsatellites (two or more microsatellites being found in close proximity) and microsatellite cluster (compound microsatellites interrupted by few nucleotides). Compound repeats can be sought by user’s customized repeat combination. For example, if a user wants to screen compound microsatellites from chromosome 36 of L. donovani which has repeat and combination of di- and tri-nucleotide repeat number greater than three unit, search can be made using the parameter specified in Figure 4 . Similarly, by specifying the interruption value, the repeat cluster can be accessed. The polymorphic tab contains a drop-down menu comprising the name of all six species. After selecting the target species, rest species were automatically updated in ‘species to consider’ field. A separate option is provided to screen out polymorphic repeats in genic and intergenic regions. The result page contains the number of polymorphic repeats found in the selected species, and gives the detailed information of the particular repeat motif, repeat unit, chromosome number, coding status and genomic location. The output shows information on the corresponding polymorphic repeats ( Figure 5 ). In this page, hyperlinks are also provided to each of the listed polymorphic repeats to design the primers using Primer3. All the detail search methods for perfect repeat, compound repeat, repeat cluster and polymorphic repeats are described in the database tutorial.

Figure 3.

Results displaying repeat information along with left and right flanking sequences and primer3plus primer generation tool.

Figure 4.

Result displaying compound repeats of any dinucleotide and trinucleotide repeat combination in 36 ^th chromosome of L. donovani .

Figure 5.

Overview of the retrieving of polymorphic repeats using screen-shots of various pages. ( A ) Main page containing species name which can be selected; ( B ) Overall information of the polymorphic repeats; ( C ) Detail information of the polymorphic repeats.

Leishmania genomes are varying greatly in microsatellite repeat compositions, diversity and distribution. In order to determine the frequency and composition of different type of repeat motifs available in database, a dedicated section ‘statistics’ has been incorporated in the database which comprises of (i) over all statistics, (ii) a polymorphic SSR statistics and (iii) a comparative statistics, and each statistics can be accessed by a separate ‘tab’. The overall statistics displays chromosome wise over-all repeat statistics of each genome, whereas polymorphic SSR statistics tab displays only the distribution of polymorphic repeats. The comparative statistics tab directs to a repeat summary page giving a detailed illustration of the repeat distribution. The repeat occurrence graph and table are generated dynamically based on the repeat information using GD module ( Figure 6 ). Several microsatellite databases ( Table 3 ) of various organisms have appeared in recent years that provide important data for the comparative analysis of microsatellite distribution in eukaryotic genomes; however, none of these databases provide length variation of SSR across genomes. The LeishMicrosatDB gives useful information such as comparative statistics and length variation across genomes. The identification of polymorphic repeats and its comparative study can exhibit different potential application.

Figure 6.

Tabular and graphical representation of microsatellite repeats comparison.

Table 3.

Comparison of various eukaryotic microsatellite databases, available in public domain

Database	Details on								Coverage
	Simple repeats	Compound repeats	Clustering information	Flanking sequences	Polymorphic information	Genomic repeats	Primer design	Comparative statistics
MMDBJ ( 17 )	Y	N	N	N	Y	Y	N	N	Mouse
InsatDB ( 18 )	Y	Y	N	Y	N	Y	Y	N	5 Insect genome
MRD ( 19 )	Y	N	N	Y	N	N	N	N	8 eukaryotic genome
SSRD ( 20 )	Y	N	N	Y	N	N	N	N	Human
EuMicrosat db ( 21 )	Y	Y	Y	Y	N	Y	Y	N	31 eukaryotic genome
FishMicrosat ( 22 )	Y	Y	Y	N	N	Y	Y	N	36 fish genome
LeishMicrosatDB	Y	Y	Y	Y	Y	Y	Y	Y	6 L. genome

Database	Details on								Coverage
	Simple repeats	Compound repeats	Clustering information	Flanking sequences	Polymorphic information	Genomic repeats	Primer design	Comparative statistics
MMDBJ ( 17 )	Y	N	N	N	Y	Y	N	N	Mouse
InsatDB ( 18 )	Y	Y	N	Y	N	Y	Y	N	5 Insect genome
MRD ( 19 )	Y	N	N	Y	N	N	N	N	8 eukaryotic genome
SSRD ( 20 )	Y	N	N	Y	N	N	N	N	Human
EuMicrosat db ( 21 )	Y	Y	Y	Y	N	Y	Y	N	31 eukaryotic genome
FishMicrosat ( 22 )	Y	Y	Y	N	N	Y	Y	N	36 fish genome
LeishMicrosatDB	Y	Y	Y	Y	Y	Y	Y	Y	6 L. genome

Table 3.

Comparison of various eukaryotic microsatellite databases, available in public domain

Database	Details on								Coverage
	Simple repeats	Compound repeats	Clustering information	Flanking sequences	Polymorphic information	Genomic repeats	Primer design	Comparative statistics
MMDBJ ( 17 )	Y	N	N	N	Y	Y	N	N	Mouse
InsatDB ( 18 )	Y	Y	N	Y	N	Y	Y	N	5 Insect genome
MRD ( 19 )	Y	N	N	Y	N	N	N	N	8 eukaryotic genome
SSRD ( 20 )	Y	N	N	Y	N	N	N	N	Human
EuMicrosat db ( 21 )	Y	Y	Y	Y	N	Y	Y	N	31 eukaryotic genome
FishMicrosat ( 22 )	Y	Y	Y	N	N	Y	Y	N	36 fish genome
LeishMicrosatDB	Y	Y	Y	Y	Y	Y	Y	Y	6 L. genome

Database	Details on								Coverage
	Simple repeats	Compound repeats	Clustering information	Flanking sequences	Polymorphic information	Genomic repeats	Primer design	Comparative statistics
MMDBJ ( 17 )	Y	N	N	N	Y	Y	N	N	Mouse
InsatDB ( 18 )	Y	Y	N	Y	N	Y	Y	N	5 Insect genome
MRD ( 19 )	Y	N	N	Y	N	N	N	N	8 eukaryotic genome
SSRD ( 20 )	Y	N	N	Y	N	N	N	N	Human
EuMicrosat db ( 21 )	Y	Y	Y	Y	N	Y	Y	N	31 eukaryotic genome
FishMicrosat ( 22 )	Y	Y	Y	N	N	Y	Y	N	36 fish genome
LeishMicrosatDB	Y	Y	Y	Y	Y	Y	Y	Y	6 L. genome

Conclusion

LeishMicrosatDB has been worked out as a complete curated web-oriented relational database of perfect, compound, cluster and polymorphic repeats in six-sequenced Leishmania genome. The database can provide parasitologists a platform to understand the diseases by considering the immense utility of the repeats. Various input parameters can be used for comprehensive search of simple, compound, polymorphic and cluster of repeats. This database may also be adopted as a useful tool to study relative occurrence and distribution of microsatellite across the parasitic genome. The repeats in the coding region of the gene may hopefully prove to be more useful for gene tagging and to study its functional role in evolutionary analysis, and all of these information may serve as an important input in designing experiments in new direction, elucidating novel role and function of different kinds of repeats. We anticipate that, the main application of this database will be the development of mapped markers for specific application such as association studies and the search for recombination with in chromosomes.

Availability

LeishMicrosatDB can be accessed freely at http://biomedinformri.com/leishmicrosat

Acknowledgements

The authors wish to express their gratitude towards F arheen W azri, Dr. Sindhuprava Rana and Md. Yusuf Ansari for their support in the development of database and revising the manuscript. The authors thank Dr Harpreet Singh, Scientist D, ICMR, New Delhi for helping us in setting up our biomedical informatics Department in RMRIMS, Patna, India.

Funding

The work is funded by Indian Council of Medical Research (ICMR), India. Funding for open access charges: Indian Council of Medical Research (ICMR)

Conflict of interest . None declared.

References

Katti

M.V.

Ranjekar

P.K.

Gupta

V.S.

(

2001

)

Differential distribution of simple sequence repeats in eukaryotic genome sequences

Mol. Biol. Evol.

1161

–

1167

Sharma

P.C.

Grover

Kahl

(

2007

)

Mining microsatellites in eukaryotic Genomes

Trend Biotechnol.

490

–

498

Mukherjee

Langston

L.D.

Ouellette

(

2011

)

Intra chromosomal tandem duplication and repeat expansion during attempts to inactivate the subtelomeric essential gene GSH1 in Leishmania

Nucleic Acids Res.

7499

–

7511

Ubeda

J.M.

Légaré

Raymond

et al. . (

2008

)

Modulation of gene expression in drug resistant Leishmania is associated with gene amplification, gene deletion and chromosome aneuploidy

Genome Biol.

R115

Zhang

Shen

et al. . (

2004

)

Preliminary study on applicability of microsatellite DNA primers from parasite protozoa Trypanosoma cruzi in free-living protozoa

J. Ocean Univ. China

–

Duhagon

M.A.

Smircich

Forteza

et al. . (

2011

)

Comparative genomic analysis of dinucleotide repeats in Tritryps

Gene

487

–

Ochsenreither

Kuhls

Schaar

et al. . (

2006

)

Multilocus microsatellite typing as a new tool for discrimination of Leishmania infantum MON-1 strains

J. Clin. Microbiol.

495

–

503

Kebede

Oghumu

Worku

et al. . (

2013

)

Multilocus microsatellite signature and identification of specific molecular markers for Leishmania aethiopica

Parasit Vectors

160

Gouzelou

Haralambous

Amro

et al. . (

2012

)

Multilocus microsatellite typing (MLMT) of strains from Turkey and Cyprus reveals a novel monophyletic L. donovani sensu lato group

PLoS Negl. Trop. Dis.

e1507

Kuhls

Alam

M.Z.

Cupolillo

et al. . (

2011

)

Comparative microsatellite typing of new world Leishmania infantum reveals low heterogeneity among populations and its recent old world origin

PLoS Negl. Trop. Dis.

e1155

Kuhls

Cupolillo

Silva

S.O.

et al. . (

2013

)

Population structure and evidence for both clonality and recombination among Brazilian strains of the subgenus Leishmania ( Viannia )

PLoS Negl. Trop. Dis.

e2490

Bulle

Millon

Bart

J.M.

et al. . (

2002

)

Practical approach for typing strains of Leishmania infantum by microsatellite analysis

J. Clin. Microbiol.

3391

–

3397

Seridi

Amro

Kuhls

et al. . (

2008

)

Genetic polymorphism of Algerian Leishmania infantum strains revealed by multilocus microsatellite analysis

Microbes Infect.

1309

–

1315

Mauricio

I.L.

Howard

M.K.

Stothard

Miles

M.A.

(

1999

)

Genomic diversity in the Leishmania donovani complex

Parasitology

119

237

–

246

Hide

Banuls

A.L.

Tibayrenc

(

2001

)

Genetic heterogeneity and phylogenetic status of Leishmania (Leishmania) infantum zymodeme MOn-1: epidemiological implications

Parasitology

123

425

–

432

Ishikawa

E.A.

Silveira

F.T.

Magalhães

A.L.

et al. . (

2002

Genetic variation in popu- genetic variation in populations of Leishmania species in Brazil

Trans. R Soc. Trop. Med. Hyg.

111

–

121

Cupolillo

Brahim

Toaldo

C.B.

et al. . (

2003

)

Genetic polymorphism and molecular epidemiology of Leishmania ( viannia) braziliensis from different hosts and geographic areas in Brazil

J. Clin. Microbiol.

3126

–

3132

Jamjoom

M.B.

Ashford

R.W.B.

Bates

P.A.

et al. . (

2002

)

Polymorphic microsatellite repeats are not conserved between Leishmania donovani and Leishmania major

Mol. Ecol. Notes

104

–

106

Schwenkenbecher

J.M.

Fröhlich

Gehre

et al. . (

2004

)

Evolution and conservation of microsatellite markers for Leishmania tropica

Infect. Genet. Evol.

–

105

Jamjoom

M.B.

Ashford

R.W.

Bates

P.A.

et al. . (

2002

)

Towards a standard battery of microsatellite markers for the analysis of the Leishmania donovani complex

Ann. Trop. Med. Parasitol.

265

–

Rossi

Wincker

Ravel

et al. . (

1994

)

Structural organization of microsatellite families in the Leishmania genome and polymorphisms at two (CA) n loci

Mol. Biochem. Parasitol.

271

–

282

Rodriguez

De Lima

Rodriguez

et al. . (

1997

)

Genomic DNA repeat from Leishmania(Viannia) braziliensis (Venezuelan strain) containing simplerepeats and microsatellites

Parasitology

115

349

–

358

Russell

Iribar

M.P.

Lambson

et al. . (

1999

)

Intra and interspecific microsatellite variation in the Leishmania subgenus Viannia

Mol. Biochem. Parasitol.

103

–

Rougeron

Waleckx

Hide

et al. . (

2008

)

A set of 12 microsatellite loci for genetic studies of Leishmaniabraziliensis

Mol. Ecol. Resources

351

–

353

Fakhar

Motazedian

M.H.

Daly

et al. . (

2008

)

An integrated pipeline for the development of novel panels of mapped microsatellite markers for Leishmania donovani complex, Leishmania braziliensis and Leishmania major

Parasitology

135

567

–

574

Mouse Microsatellite Data Base of Japan (MMDBJ) http://www.shigen.nig.ac.jp/mouse/mmdbj

Archak

Meduri

Kumar

P.S.

Nagaraju

(

2007

)

InSatDb: a microsatellite database of fully sequenced insect genomes

Nucleic Acids Res.

D36

–

D39

Subramanian

Madgula

V.M.

George

et al. . (

2002

)

MRD: a microsatellite repeats database for prokaryotic and eukaryotic genomes

Genome Biol.

PREPRINT0011

OpenURL Placeholder Text

Subramanian

Madgula

V.M.

George

et al. . (

2003

)

SSRD: simple sequence repeats database of the human genome

Comp. Funct. Genomics

342

–

345

Aishwarya

Grover

Sharma

P.C.

(

2007

)

EuMicroSatdb: a database for microsatellites in the sequenced genomes of eukaryotes

BMC Genomics

225