CyanoSat: A Database of Cyanobacterial Perfect and Imperfect Microsatellites


Simple Sequence Repeats (SSRs; Jacob et al. 1991) also known as Microsatellites (Litt and Luty 1989) are stretches of DNA sequence consisting of short tandem repeats. Based on the repeating unit SSRs can be categorised into mono- (A)n, di- (GT)n, tri- (CTC)n, tetra- (GATA)n, penta- (ATCGC)n and hexa- (ATTGCC)n nucleotide repeats where n is the number of repeating motif within the SSR locus. Moreover a SSR can be further classified as perfect [without interruptions; (GTG)15], imperfect [interrupted by non repeat nucleotide; (GTG)7G(GTG)8] and compound [two or more SSRs are found adjacent to one another; (GTG)8(AT)16] (Bachmann and Bare 2004). Additionally, a compound SSR can be categorized as perfect compound [(GT)n(AG)n] and overlapping compound [overlap of few bases of previous SSR with next SSR; (ACC)n(CT)n].

SSRs are found in prokaryotic and eukaryotic organisms and are widely distributed throughout the genome (both coding and non-coding regions). These are highly polymorphic and can be used as genetic markers. Conventional biotechnological methods for SSR mining are tiresome and costly. However computational approaches for data mining of sequences available in biological databases allow quick and inexpensive SSR extraction (Shanker et al. 2007).

Cyanobacteria also known as blue-green algae are photosynthetic bacteria. They are quite small, usually unicellular but often grow in colonies. Cyanobacteria play very important role in nitrogen fixation and are also responsible for oxygen evolution (Frank et al. 2003). The chloroplast in plants are considered to have originated from cyanobacteria through endosymbiosis (Martin et al. 2002). Cyanobacteria have applications in various fields including aquaculture, wastewater treatment, food, fertilizers. The availability of complete genome sequences of cyanobacteria in biological databases provides opportunity to determine frequency and distribution of SSRs in these organisms.

CyanoSat is an attempt to provide information of SSRs present in completely sequenced genomes of cyanobacteria. The perfect and compound SSRs were mined using Microsatellite Identification Tool (MISA; and Imperfect Microsatellite Extractor (IMEx; Mudunuri and Nagarajaram 2007) was used for extracting imperfect SSRs. The minimum length criteria for different repeat types were considered as >=12 for mono-, >=6 for di-, >=4 for tri- and >=3 for tetra-, penta- and hexa- nucleotide repeats. Maximum difference between two compound SSRs was taken as 0. The mismatches allowed for imperfect SSRs were 1 nucleotide for mono-, di- and tri-, 2 for tetra- and penta-, and 3 for hexa- repeats with 10% imperfection. Primer 3 ( with its default parameters was used to design PCR primers for identified SSRs considering 200 base pair of flanking regions. The parsed data was used to develop CyanoSat which is an easy to use, interactive relational database of cyanobacterial SSRs. User can retrieve information of SSRs frequency according to repeat type (mono-hexa), region [coding, non-coding or coding-non-coding (occurrence of few bases of SSR in coding as well as in non-coding regions or vice-versa)] along with average length of SSRs in an organism, density, primers etc. In case of coding and coding-non-coding SSRs corresponding gene id, protein id, and product are also provided. The displayed data can be downloaded from the link given on top right of web pages. The workflow of the database is shown in Fig 1.

We hope that CyanoSat prove to be a useful resource for cyanobacterial research.

Fig 1. Workflow of CyanoSat.


Bachmann L, Bare PTJ (2004) Allelic variation, fragment length analyses and population genetic model: A case study on Drosophila microsatellites. Zool Syst Evol Research 42: 215-222.
Frank IB, Lundgren P, Falkowski P (2003) Nitrogen fixation and photosynthetic oxygen evolution in cyanobacteria. Res in Microbiol 154:157-164.
Jacob HJ, Lindpaintner K, Lincoln SE, Kusumi K, Bunker RK, Mao YP, Ganten D, Dzau VJ, Lander ES (1991) Genetic mapping of a gene causing hypertensive rat. Cell 67: 213-224.
Litt M, Luty JA (1989) A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am J Hum Genet 44: 397-401.
Martin W, Rujan T, Richly E, Hansen A, Cornelson S, Lins T, Leister D, Stoebe B, Hasegawa M, Penny D (2002) Evolutionary analysis of Arabidopsis, cyanobacterial and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. PNAS 99: 12246-12251.
Mudunuri SB, Nagarajaram HA (2007) IMEx: Imperfect Microsatellite Extractor. Bioinformatics 23: 1181-1187.
Shanker A, Bhargava A, Bajpai R, Singh S, Srivastava S, Sharma V (2007) Bioinformatically mined simple sequence repeats in UniGene of Citrus sinensis. Sci Hort 113: 353-361.

Kabra,R., Kapil,A., Attarwala,K., Rai,P.K. and Shanker,A. World J Microbiol Biotechnol (2016) 32:71                                                      Best viewed at 1280 x 768.