Abstract

Bivalent gene is a gene marked with both H3K4me3 and H3K27me3 epigenetic modification in the same area, and is proposed to play a pivotal role related to pluripotency in embryonic stem (ES) cells. Identification of these bivalent genes and understanding their functions are important for further research of lineage specification and embryo development. So far, lots of genome-wide histone modification data were generated in mouse and human ES cells. These valuable data make it possible to identify bivalent genes, but no comprehensive data repositories or analysis tools are available for bivalent genes currently. In this work, we develop BGDB, the database of bivalent genes. The database contains 6897 bivalent genes in human and mouse ES cells, which are manually collected from scientific literature. Each entry contains curated information, including genomic context, sequences, gene ontology and other relevant information. The web services of BGDB database were implemented with PHP + MySQL + JavaScript, and provide diverse query functions.

Database URL:http://dailab.sysu.edu.cn/bgdb/

Introduction

Embryonic stem (ES) cells have the potential to differentiate into every tissue type of the body, and offer an important model for examining transitions of cellular identity in animals (1). It has been suggested that the potential is related to specific histone modifications or characteristic chromatin structure (2–4). Epigenetic regulation of gene expression is thought to be mediated partly by post-translational modifications of histones, which in turn establish different domains of active and inactive chromatin structures. The core histones have dozens of different modifications, including acetylation, methylation, phosphorylation and ubiquitylation. Histone H3 methylations of lysine 4 (K4) and lysine 27 (K27) have been shown to relate with active and repressed states, respectively (5). These methylations are catalyzed by Trithorax- and Polycomb-group proteins and play key roles in lineage-specific developmental functions (6). Trithorax-associated H3K4 trimethylation (H3K4me3) positively regulates transcription by recruiting nucleosome remodeling enzymes and histone acetylases (7–9), whereas Polycomb-associated H3K27 trimethylation (H3K27me3) negatively regulates transcription by promoting a compact chromatin structure (10, 11).The colocalization of these H3K4 and H3K27 histone methylations, termed ‘bivalent domains’, was found in ES cells by mapping mouse genome (12, 13). This modification pattern is observed in clusters of homeobox genes and other genes related to early embryonic development (12). The bivalent domains are proposed to silence key developmental genes in ES cells while keeping them poised for later activation, and these developmental genes marked by bivalent modifications are dubbed as bivalent genes (14). Whole-genome mapping found that H3K4me3 peaks were enriched in the region within 2 kb of the TSS of RefSeq annotations, and H3K27me3 peaks were also enriched in a band centered around the TSS with a greater width; moreover, most H3K27me3 peaks localized on promoters that were already marked with H3K4me3, suggesting that bivalent modifications on the same promoter is a rule in ES cells rather than an exception (15).

Genome-wide analyses of H3K4me3 and H3K27me3 in human ES cells and mouse ES cells identified several thousand genes marked with both trimethylation (15–20). These studies used diverse experimental approaches, such as hybridization, whole-genome microarrays (15), ChIP coupled with paired-end ditag sequencing (16) and single-molecule sequencing (18). Despite different ES cell lines and varied experimental methods used in these studies, they show remarkable consistency in genes marked with both H3K4me3 and H3K27me3. The high degree of consistency indicates that these data are reliable, especially for genes with bivalent domains identified by at least two independent experiments.

Since recent advances in high-throughput techniques such as genomic tiling microarrays and deep sequencing have discovered vast number of bivalent genes, it is an urgent topic to collect the experimental data and provide an up-to-date compressive resource for the community. Given these considerations, we have developed a novel database called ‘Bivalent Genes Database’ (BGDB) to store the sequence of bivalent genes and associated information from all studies published to date. In BGDB database, we manually curated 3913 bivalent genes in human ES cells and 2984 genes in mouse ES cells (Table 1), including the primary references and other annotations of these genes. Furthermore, we found 1604 genes have the same gene name in human and mouse ES cells (Table 1). Additionally, based on the gene ontology (GO) annotations, we analyzed the functional diversities and regulatory roles of bivalent genes. Taken together, the BGDB might be an integrated resource for bivalent genes and provide valuable information not only to stem cell biologists but also to researchers generally interested in gene expression regulation.

Table 1.

Data statistics of the BGDB

OrganismGene numberPercentiles (%)
Homo sapiens391356.7
Mus musculus298443.3
Total6897100
Botha160423.3
OrganismGene numberPercentiles (%)
Homo sapiens391356.7
Mus musculus298443.3
Total6897100
Botha160423.3

aGenes with the same name in both Homo sapiens and Mus musculus ES cells.

Table 1.

Data statistics of the BGDB

OrganismGene numberPercentiles (%)
Homo sapiens391356.7
Mus musculus298443.3
Total6897100
Botha160423.3
OrganismGene numberPercentiles (%)
Homo sapiens391356.7
Mus musculus298443.3
Total6897100
Botha160423.3

aGenes with the same name in both Homo sapiens and Mus musculus ES cells.

Database construction and content

The primary motivation of our BGDB is to collect and maintain a high quality bivalent genes database, which serves as an integrated, classified and well-annotated bivalent genes resource. The data generation flow of the BGDB is briefly illustrated in Figure 1. The generation flow is composed of three primary components: data processing, integration of external database and storing structural and functional annotation in database. To ensure the quality of BGDB database, we first performed a literature search of PubMed with major keywords ‘bivalent gene’ and ‘bivalent domain’. To avoid missing data, we next searched PubMed literature with keywords ‘H3K4 H3K27’ and ‘H3K4me3 H3K27me3’. Taking these four queries together, we collected and downloaded bivalent domain data for further manual review and curation. The search results are shown in Table 2.

The data generation flow of the BGDB database.
Figure 1.

The data generation flow of the BGDB database.

Table 2.

Search results in PubMed

Key wordsArticle number
Bivalent gene820
Bivalent domain405
H3K4 H3K27142
H3K4me3 H3K27me3204
Key wordsArticle number
Bivalent gene820
Bivalent domain405
H3K4 H3K27142
H3K4me3 H3K27me3204

The top 10 articles that contain most bivalent genes are shown in Supplementary Table S1

Table 2.

Search results in PubMed

Key wordsArticle number
Bivalent gene820
Bivalent domain405
H3K4 H3K27142
H3K4me3 H3K27me3204
Key wordsArticle number
Bivalent gene820
Bivalent domain405
H3K4 H3K27142
H3K4me3 H3K27me3204

The top 10 articles that contain most bivalent genes are shown in Supplementary Table S1

For curation of bivalent gene data from literature, we manually curated genes with bivalent domains and mapped the gene names to Entrez gene IDs. Then, we used Entrez gene IDs for BGDB to serve as the initial information to cross-link the same genes from different external databases. To avoid gene symbol ambiguity problems caused by synonyms of gene, we gained up-to-date official gene symbols from HGNC (21) and MGI (22) for human and mouse genes, separately. For better understanding the function and structure of these bivalent genes, we collected their extensive functional information as follows: basic gene information such as gene name, sequence and summary from Entrez gene database (23); gene product characteristics information from GO (24); and protein information related to gene from UnitProtKB (25).

In BGDB database, we manually curated 6897 bivalent genes from scientific literature in PubMed. Not surprisingly, many bivalent genes were experimentally identified in at least two independent articles. There are 3205 (∼46.5%) genes that are cross-validated in two distinct studies and 1165 (∼16.9%) genes in more than two studies (Figure 2). Because ∼63% genes have passed cross-validation, this suggests the reliability of our database.

Number of bivalent genes found in 1, 2 and >2 references.
Figure 2.

Number of bivalent genes found in 1, 2 and >2 references.

The annotations of each bivalent gene are described by the fields shown in Table 3. We build a MySQL relation database with two tables to store all the gene information. GO information, including GO ID, GO term and GO category, is stored in ‘Genes_go’ table. The ‘Genes’ table, which is defined as parent table, contains the other information. To enhance database normalization, we make the ‘Entrez ID’ field in ‘Genes_go’ as a foreign key and have it relate to the ‘Genes’ table. For providing a fast BLAST sequence alignment service, we also set up a local BLAST database and integrate the local BLAST application into web service. The web interface for searching and browsing was implemented by PHP and JavaScript.

Table 3.

Description of fields used to annotate bivalent gene

Field nameDescription
IDUnique database identifier for the bivalent gene
Gene symbolApproved symbol for the bivalent gene
Gene full nameApproved full gene symbol
Gene typeBiotype of the bivalent gene
OrganismOrganism of the bivalent gene
Gene synonymOther gene names used for the bivalent gene
SummaryDescriptive text about the gene
ReferenceArticles that reported the bivalent gene
HGNC/MGI IDHGNC ID for human bivalent gene, and MGI ID for mouse
Entrez IDExternal link to Entrez gene
Ensembl IDExternal link to Ensembl
UniProtKB IDExternal link to UniProtKB
UCSC linkExternal link to UCSC
Gene ontologyThe specific GO terms are listed by source of the information, category and term. Each GO term supports a link to the AmiGO browser
Genomic locationGenomic location of the bivalent gene
RefSeq IDReference sequence ID
Nucleotide sequenceNucleotide sequence of the bivalent gene
Protein sequenceProtein sequence of the bivalent gene
Field nameDescription
IDUnique database identifier for the bivalent gene
Gene symbolApproved symbol for the bivalent gene
Gene full nameApproved full gene symbol
Gene typeBiotype of the bivalent gene
OrganismOrganism of the bivalent gene
Gene synonymOther gene names used for the bivalent gene
SummaryDescriptive text about the gene
ReferenceArticles that reported the bivalent gene
HGNC/MGI IDHGNC ID for human bivalent gene, and MGI ID for mouse
Entrez IDExternal link to Entrez gene
Ensembl IDExternal link to Ensembl
UniProtKB IDExternal link to UniProtKB
UCSC linkExternal link to UCSC
Gene ontologyThe specific GO terms are listed by source of the information, category and term. Each GO term supports a link to the AmiGO browser
Genomic locationGenomic location of the bivalent gene
RefSeq IDReference sequence ID
Nucleotide sequenceNucleotide sequence of the bivalent gene
Protein sequenceProtein sequence of the bivalent gene
Table 3.

Description of fields used to annotate bivalent gene

Field nameDescription
IDUnique database identifier for the bivalent gene
Gene symbolApproved symbol for the bivalent gene
Gene full nameApproved full gene symbol
Gene typeBiotype of the bivalent gene
OrganismOrganism of the bivalent gene
Gene synonymOther gene names used for the bivalent gene
SummaryDescriptive text about the gene
ReferenceArticles that reported the bivalent gene
HGNC/MGI IDHGNC ID for human bivalent gene, and MGI ID for mouse
Entrez IDExternal link to Entrez gene
Ensembl IDExternal link to Ensembl
UniProtKB IDExternal link to UniProtKB
UCSC linkExternal link to UCSC
Gene ontologyThe specific GO terms are listed by source of the information, category and term. Each GO term supports a link to the AmiGO browser
Genomic locationGenomic location of the bivalent gene
RefSeq IDReference sequence ID
Nucleotide sequenceNucleotide sequence of the bivalent gene
Protein sequenceProtein sequence of the bivalent gene
Field nameDescription
IDUnique database identifier for the bivalent gene
Gene symbolApproved symbol for the bivalent gene
Gene full nameApproved full gene symbol
Gene typeBiotype of the bivalent gene
OrganismOrganism of the bivalent gene
Gene synonymOther gene names used for the bivalent gene
SummaryDescriptive text about the gene
ReferenceArticles that reported the bivalent gene
HGNC/MGI IDHGNC ID for human bivalent gene, and MGI ID for mouse
Entrez IDExternal link to Entrez gene
Ensembl IDExternal link to Ensembl
UniProtKB IDExternal link to UniProtKB
UCSC linkExternal link to UCSC
Gene ontologyThe specific GO terms are listed by source of the information, category and term. Each GO term supports a link to the AmiGO browser
Genomic locationGenomic location of the bivalent gene
RefSeq IDReference sequence ID
Nucleotide sequenceNucleotide sequence of the bivalent gene
Protein sequenceProtein sequence of the bivalent gene

Usage

To facilitate the use of BDGB resource, we developed a user-friendly web interface for user to search and browse for content. The search page (http://dailab.sysu.edu.cn/bgdb/database.php) provides an interface for searching the BGDB database with several keywords such as gene symbol, gene alias, reference sequence ID or UniProt ID. For example, if a keyword ‘GRK4’ is inputted (Figure 3A), the query result will be shown in a tabular format, with the features of BGDB ID, gene symbol, gene full name, organism and gene alias (Figure 3B). By clicking the link of BGDB ID (BGNO_002517), the detailed information for gene GRK4 will be shown (Figure 3C). The gene information, including gene symbol, full name, summary and relevant references, is provided. The gene sequence, protein sequence, GO annotation, genomic location and some useful external links are also presented. All output columns are described in Table 3.

Representative screenshots of BGDB. (A) Users could input ‘GRK4’ for querying. (B) The results will be shown in a tabular format. Users could click on the BGDB ID (BGDB-002517) to view the detailed information. (C) The detailed information of bivalent gene GRK4. The nucleotide and protein sequence are also presented.
Figure 3.

Representative screenshots of BGDB. (A) Users could input ‘GRK4’ for querying. (B) The results will be shown in a tabular format. Users could click on the BGDB ID (BGDB-002517) to view the detailed information. (C) The detailed information of bivalent gene GRK4. The nucleotide and protein sequence are also presented.

Furthermore, BGDB web interface provides three advanced options, including (i) batch search, (ii) BLAST search and (iii) browse function (Supplementary Figure S1). (i) Batch query: Using this function, users could query gene data for a batch of keywords at once with the results on one screen (Supplementary Figure S1A). (ii) BLAST search: Users can use an online BLAST interface to input an interested sequence in FASTA format and search against all nucleotide or protein sequences in our database (Supplementary Figure S1B). (iii) Browse: Instead of searching for specific genes, all entries of BGDB database could be listed by organism name (Supplementary Figure S1C).

For advanced bioinformatics users, all search results with related annotation, including nucleotide and protein sequence, GO and literature, are available to export with Excel format. Additionally, users could download the whole BGDB database with MySQL format (Supplementary Figure S1D).

Discussion

Recent genome-wide analyses of H3K4me3 and H3K27me3 in human and mouse ES cells have revealed several thousands of bivalent genes, but mapping chromatin modifications across the genome is the first step toward understanding the mechanism of gene regulation in pluripotent stem cells. Because database development is important for further experimental and computational designs by providing a high-quality benchmark, we focus on data collection and manually curated 6897 bivalent genes in this work. With a large amount of bivalent gene information, we had the opportunity to analyze abundance and functional diversity of bivalent genes.

To gain insight into the functional distribution of GO, we conducted the enrichment tests on the bivalent genes in BGDB. Firstly the GO annotations in GAF 2.0 file format was downloaded from UniProt-GOA (24, 26), and secondly, the columns of gene symbol, GO ID, GO term and GO category were extracted and stored in the database. Then, taking account of the GO terms with genes directly associated to it, we mapped them to bivalent genes through gene symbol column that is provided in GO annotation. Using the human genome as background, we calculated overpresented biological processes, molecular functions and cellular components in bivalent genes of BGDB with the hypergeometric distribution (P < 0.001, calculated by Fisher's exact test). The five most enriched GO terms in each category are shown in Table 4. This analysis revealed several interesting results. For example, the four most overrepresented biological processes, such as anterior/posterior pattern specification, neuron differentiation, neuron migration and central nervous system development, indicate that bivalent genes are enriched for genes involved in system development and cell differentiation (Table 4), which is in accordance with the role of bivalent genes in ES cells. The enrichment result found here is consistent with the study reported previously (13). Also, four most abundant cellular components, such as axon, dendrite, neuronal cell body and postsynaptic membrane, suggest that bivalent genes are enriched in neuron compartments (Table 4). One possibility of this abundance is that neuron is an important cell type during ES cell differentiation. In addition, the statistical analysis of molecular functions shows that bivalent genes modulate enzyme activity and protein interaction ability (Table 4). For mouse bivalent genes in BGDB, we can draw a similar conclusion as above. The detailed information of top five most overrepresented GO terms of mouse bivalent genes is shown in Supplementary Table S2.

Table 4.

The top five most enriched GO terms of biological processes, molecular functions and cellular components in human bivalent genes

Description of GO termBivalent geneGenomeE-ratiocP-value
n (%)a,bn (%)
The top five most enriched biological processes
    Anterior/posterior pattern specification (GO:0009952)70 (1.79)102 (0.54)3.312.72E-13
    Neuron differentiation (GO:0030182)46 (1.18)81 (0.43)2.743.30E-07
    Negative regulation of canonical Wnt receptor signaling pathway (GO:0090090)46 (1.18)79 (0.42)2.811.39E-07
    Neuron migration (GO:0001764)56 (1.43)100 (0.53)2.702.25E-08
    Central nervous system development (GO:0007417)58 (1.48)108 (0.57)2.593.57E-08
The top five most enriched molecular functions
    RNA polymerase II distal enhancer sequence-specific DNA binding transcription factor activity (GO:0003705)62 (1.58)108 (0.57)2.772.00E-09
    Metal ion binding (GO:0046872)552 (14.11)1050 (5.57)2.538.35E-68
    Sequence-specific DNA binding (GO:0043565)260 (6.64)536 (2.84)2.342.70E-27
    Transcription factor binding (GO:0008134)116 (2.96)280 (1.49)2.002.18E-09
    Protein dimerization activity (GO:0046983)64 (1.64)159 (0.84)1.942.25E-05
The top five most enriched cellular components
    Voltage-gated potassium channel complex (GO:0008076)48 (1.23)69 (0.37)3.351.26E-09
    Axon (GO:0030424)77 (1.97)149 (0.79)2.497.13E-10
    Dendrite (GO:0030425)82 (2.10)178 (0.94)2.221.18E-08
    Neuronal cell body (GO:0043025)104 (2.66)223 (1.18)2.251.05E-10
    Postsynaptic membrane (GO:0045211)85 (2.17)187 (0.99)2.191.43E-08
Description of GO termBivalent geneGenomeE-ratiocP-value
n (%)a,bn (%)
The top five most enriched biological processes
    Anterior/posterior pattern specification (GO:0009952)70 (1.79)102 (0.54)3.312.72E-13
    Neuron differentiation (GO:0030182)46 (1.18)81 (0.43)2.743.30E-07
    Negative regulation of canonical Wnt receptor signaling pathway (GO:0090090)46 (1.18)79 (0.42)2.811.39E-07
    Neuron migration (GO:0001764)56 (1.43)100 (0.53)2.702.25E-08
    Central nervous system development (GO:0007417)58 (1.48)108 (0.57)2.593.57E-08
The top five most enriched molecular functions
    RNA polymerase II distal enhancer sequence-specific DNA binding transcription factor activity (GO:0003705)62 (1.58)108 (0.57)2.772.00E-09
    Metal ion binding (GO:0046872)552 (14.11)1050 (5.57)2.538.35E-68
    Sequence-specific DNA binding (GO:0043565)260 (6.64)536 (2.84)2.342.70E-27
    Transcription factor binding (GO:0008134)116 (2.96)280 (1.49)2.002.18E-09
    Protein dimerization activity (GO:0046983)64 (1.64)159 (0.84)1.942.25E-05
The top five most enriched cellular components
    Voltage-gated potassium channel complex (GO:0008076)48 (1.23)69 (0.37)3.351.26E-09
    Axon (GO:0030424)77 (1.97)149 (0.79)2.497.13E-10
    Dendrite (GO:0030425)82 (2.10)178 (0.94)2.221.18E-08
    Neuronal cell body (GO:0043025)104 (2.66)223 (1.18)2.251.05E-10
    Postsynaptic membrane (GO:0045211)85 (2.17)187 (0.99)2.191.43E-08

aNum., number of proteins annotated.

bPer. percentiles of proteins annotated.

cE-ratio, enrichment ratio of bivalent genes.

Table 4.

The top five most enriched GO terms of biological processes, molecular functions and cellular components in human bivalent genes

Description of GO termBivalent geneGenomeE-ratiocP-value
n (%)a,bn (%)
The top five most enriched biological processes
    Anterior/posterior pattern specification (GO:0009952)70 (1.79)102 (0.54)3.312.72E-13
    Neuron differentiation (GO:0030182)46 (1.18)81 (0.43)2.743.30E-07
    Negative regulation of canonical Wnt receptor signaling pathway (GO:0090090)46 (1.18)79 (0.42)2.811.39E-07
    Neuron migration (GO:0001764)56 (1.43)100 (0.53)2.702.25E-08
    Central nervous system development (GO:0007417)58 (1.48)108 (0.57)2.593.57E-08
The top five most enriched molecular functions
    RNA polymerase II distal enhancer sequence-specific DNA binding transcription factor activity (GO:0003705)62 (1.58)108 (0.57)2.772.00E-09
    Metal ion binding (GO:0046872)552 (14.11)1050 (5.57)2.538.35E-68
    Sequence-specific DNA binding (GO:0043565)260 (6.64)536 (2.84)2.342.70E-27
    Transcription factor binding (GO:0008134)116 (2.96)280 (1.49)2.002.18E-09
    Protein dimerization activity (GO:0046983)64 (1.64)159 (0.84)1.942.25E-05
The top five most enriched cellular components
    Voltage-gated potassium channel complex (GO:0008076)48 (1.23)69 (0.37)3.351.26E-09
    Axon (GO:0030424)77 (1.97)149 (0.79)2.497.13E-10
    Dendrite (GO:0030425)82 (2.10)178 (0.94)2.221.18E-08
    Neuronal cell body (GO:0043025)104 (2.66)223 (1.18)2.251.05E-10
    Postsynaptic membrane (GO:0045211)85 (2.17)187 (0.99)2.191.43E-08
Description of GO termBivalent geneGenomeE-ratiocP-value
n (%)a,bn (%)
The top five most enriched biological processes
    Anterior/posterior pattern specification (GO:0009952)70 (1.79)102 (0.54)3.312.72E-13
    Neuron differentiation (GO:0030182)46 (1.18)81 (0.43)2.743.30E-07
    Negative regulation of canonical Wnt receptor signaling pathway (GO:0090090)46 (1.18)79 (0.42)2.811.39E-07
    Neuron migration (GO:0001764)56 (1.43)100 (0.53)2.702.25E-08
    Central nervous system development (GO:0007417)58 (1.48)108 (0.57)2.593.57E-08
The top five most enriched molecular functions
    RNA polymerase II distal enhancer sequence-specific DNA binding transcription factor activity (GO:0003705)62 (1.58)108 (0.57)2.772.00E-09
    Metal ion binding (GO:0046872)552 (14.11)1050 (5.57)2.538.35E-68
    Sequence-specific DNA binding (GO:0043565)260 (6.64)536 (2.84)2.342.70E-27
    Transcription factor binding (GO:0008134)116 (2.96)280 (1.49)2.002.18E-09
    Protein dimerization activity (GO:0046983)64 (1.64)159 (0.84)1.942.25E-05
The top five most enriched cellular components
    Voltage-gated potassium channel complex (GO:0008076)48 (1.23)69 (0.37)3.351.26E-09
    Axon (GO:0030424)77 (1.97)149 (0.79)2.497.13E-10
    Dendrite (GO:0030425)82 (2.10)178 (0.94)2.221.18E-08
    Neuronal cell body (GO:0043025)104 (2.66)223 (1.18)2.251.05E-10
    Postsynaptic membrane (GO:0045211)85 (2.17)187 (0.99)2.191.43E-08

aNum., number of proteins annotated.

bPer. percentiles of proteins annotated.

cE-ratio, enrichment ratio of bivalent genes.

Next, we calculated the distribution of bivalent genes in human ESC chromosomes, and found that ∼10–25% protein coding genes (23) in each chromosome are bivalent genes except the Y chromosome (Table 5). This distribution suggests that every chromosome may play a specific role related to pluripotency in ES cells. The Y chromosome is rich in junk (27) and has only 56 protein-coding genes (28), which may be the reason for just one bivalent gene found in Y chromosome. The same result can be achieved from the distribution of bivalent genes in mouse ESC chromosomes (Supplementary Table S3).

Table 5.

Distribution for bivalent genes in human ESC chromosomes

ChromosomeBivalent gene numberProtein-coding gene numberPercentiles (%)
1366208017.60
2303133322.73
3221107920.48
419576925.36
520089822.27
6200105418.98
718298318.51
816570223.50
919382923.28
1018577423.90
11228131717.31
12191107017.85
138333225.00
1412386614.20
1512161919.55
1614288616.03
17223121718.32
185829020.00
19169149611.30
2012856222.78
214724719.03
2210351120.16
X8683610.29
Y1561.79
ChromosomeBivalent gene numberProtein-coding gene numberPercentiles (%)
1366208017.60
2303133322.73
3221107920.48
419576925.36
520089822.27
6200105418.98
718298318.51
816570223.50
919382923.28
1018577423.90
11228131717.31
12191107017.85
138333225.00
1412386614.20
1512161919.55
1614288616.03
17223121718.32
185829020.00
19169149611.30
2012856222.78
214724719.03
2210351120.16
X8683610.29
Y1561.79
Table 5.

Distribution for bivalent genes in human ESC chromosomes

ChromosomeBivalent gene numberProtein-coding gene numberPercentiles (%)
1366208017.60
2303133322.73
3221107920.48
419576925.36
520089822.27
6200105418.98
718298318.51
816570223.50
919382923.28
1018577423.90
11228131717.31
12191107017.85
138333225.00
1412386614.20
1512161919.55
1614288616.03
17223121718.32
185829020.00
19169149611.30
2012856222.78
214724719.03
2210351120.16
X8683610.29
Y1561.79
ChromosomeBivalent gene numberProtein-coding gene numberPercentiles (%)
1366208017.60
2303133322.73
3221107920.48
419576925.36
520089822.27
6200105418.98
718298318.51
816570223.50
919382923.28
1018577423.90
11228131717.31
12191107017.85
138333225.00
1412386614.20
1512161919.55
1614288616.03
17223121718.32
185829020.00
19169149611.30
2012856222.78
214724719.03
2210351120.16
X8683610.29
Y1561.79

Conclusion and Future Perspective

BGDB is the first attempt to establish a literature-based resource of bivalent genes by integrating genomic data, sequences, GO and other useful information. It is a valuable resource for better understanding the mechanism of gene expression regulation in pluripotent stem cells. Furthermore, the statistical analyses revealed functional diversity and enrichment of bivalent genes.

We will continuously maintain and update the database once new bivalent gene data are reported. Additionally, our next prospective goal is to collect and curate genes marked by H3K4me3 only, H3K27me3 only and neither H3K4me3 nor H3K27me3 in ES cells, respectively. This will make BGDB a more comprehensive resource for further study of ES cell epigenetics.

Acknowledgements

Qingyan Li and X. H. Dai designed and implemented BGDB web service and wrote the article; Z. M Dai, Q. Xiang and S. B Lian participated in the discussion in preparation of manuscript.

Funding

National Natural Science Foundation of China (NSFC) [61174163].

Conflict of interest. None declared.

References

1
Thomson
JA
Itskovitz-Eldor
J
Shapiro
SS
et al. 
Embryonic stem cell lines derived from human blastocysts
Science
1998
, vol. 
282
 (pg. 
1145
-
1147
)
2
Azuara
V
Perry
P
Sauer
S
et al. 
Chromatin signatures of pluripotent cell lines
Nat. Cell Biol.
2006
, vol. 
8
 (pg. 
532
-
538
)
3
Lee
JH
Hart
SR
Skalnik
DG
Histone deacetylase activity is required for embryonic stem cell differentiation
Genesis
2003
, vol. 
38
 (pg. 
32
-
38
)
4
Martens
JH
O’Sullivan
RJ
Braunschweig
U
et al. 
The profile of repeat-associated histone lysine methylation states in the mouse epigenome
EMBO J.
2005
, vol. 
24
 (pg. 
800
-
812
)
5
Bracken
AP
Dietrich
N
Pasini
D
et al. 
Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions
Genes Dev.
2006
, vol. 
20
 (pg. 
1123
-
1136
)
6
Ringrose
L
Paro
R
Epigenetic regulation of cellular memory by the Polycomb and Trithorax group proteins
Annu. Rev. Genet.
2004
, vol. 
38
 (pg. 
413
-
443
)
7
Santos-Rosa
H
Schneider
R
Bernstein
BE
et al. 
Methylation of histone H3 K4 mediates association of the Isw1p ATPase with chromatin
Mol. Cell
2003
, vol. 
12
 (pg. 
1325
-
1332
)
8
Pray-Grant
MG
Daniel
JA
Schieltz
D
et al. 
Chd1 chromodomain links histone H3 methylation with SAGA-and SLIK-dependent acetylation
Nature
2005
, vol. 
433
 (pg. 
434
-
438
)
9
Sims
RJ
III
Chen
CF
Santos-Rosa
H
et al. 
Human but not yeast CHD1 binds directly and selectively to histone H3 methylated at lysine 4 via its tandem chromodomains
J. Biol. Chem.
2005
, vol. 
280
 (pg. 
41789
-
41792
)
10
Francis
NJ
Kingston
RE
Woodcock
CL
Chromatin compaction by a polycomb group protein complex
Science
2004
, vol. 
306
 (pg. 
1574
-
1577
)
11
Ringrose
L
Ehret
H
Paro
R
Distinct contributions of histone H3 lysine 9 and 27 methylation to locus-specific stability of polycomb complexes
Mol. Cell
2004
, vol. 
16
 (pg. 
641
-
653
)
12
Bernstein
BE
Mikkelsen
TS
Xie
X
et al. 
A bivalent chromatin structure marks key developmental genes in embryonic stem cells
Cell
2006
, vol. 
125
 (pg. 
315
-
326
)
13
Azuara
V
Perry
P
Sauer
S
et al. 
Chromatin signatures of pluripotent cell lines
Nat. Cell Biol.
2006
, vol. 
8
 (pg. 
532
-
538
)
14
Stock
JK
Giadrossi
S
Casanova
M
et al. 
Ring1-mediated ubiquitination of H2A restrains poised RNA polymerase II at bivalent genes in mouse ES cells
Nat. Cell Biol.
2007
, vol. 
9
 (pg. 
1428
-
1435
)
15
Pan
G
Tian
S
Nie
J
et al. 
Whole-genome analysis of histone H3 lysine 4 and lysine 27 methylation in human embryonic stem cells
Cell Stem Cell
2007
, vol. 
1
 (pg. 
299
-
312
)
16
Zhao
XD
Han
X
Chew
JL
et al. 
Whole-genome mapping of histone H3 Lys4 and 27 trimethylations reveals distinct genomic compartments in human embryonic stem cells
Cell Stem Cell
2007
, vol. 
1
 (pg. 
286
-
298
)
17
Jia
J
Zheng
X
Hu
G
et al. 
Regulation of pluripotency and self-renewal of ESCs through epigenetic-threshold modulation and mRNA pruning
Cell
2012
, vol. 
151
 (pg. 
576
-
589
)
18
Mikkelsen
TS
Ku
M
Jaffe
DB
et al. 
Genome-wide maps of chromatin state in pluripotent and lineage-committed cells
Nature
2007
, vol. 
448
 (pg. 
553
-
560
)
19
Ku
M
Koche
RP
Rheinbay
E
et al. 
Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains
PLoS Genet.
2008
, vol. 
4
 pg. 
e1000242
 
20
Young
MD
Willson
TA
Wakefield
MJ
et al. 
ChIP-seq analysis reveals distinct H3K27me3 profiles that correlate with transcriptional activity
Nucleic Acids Res.
2011
, vol. 
39
 (pg. 
7415
-
7427
)
21
Seal
RL
Gordon
SM
Lush
MJ
et al. 
genenames.org: the HGNC resources in 2011
Nucleic Acids Res.
2011
, vol. 
39
 (pg. 
D514
-
D519
)
22
Bult
CJ
Eppig
JT
Blake
JA
et al. 
The Mouse Genome Database: genotypes, phenotypes, and models of human disease
Nucleic Acids Res.
2012
, vol. 
41
 (pg. 
D885
-
D891
)
23
Maglott
D
Ostell
J
Pruitt
KD
et al. 
Entrez Gene: gene-centered information at NCBI
Nucleic Acids Res.
2011
, vol. 
39
 (pg. 
D52
-
D57
)
24
Ashburner
M
Ball
CA
Blake
JA
et al. 
Gene Ontology: tool for the unification of biology
Nat. Genet.
2000
, vol. 
25
 (pg. 
25
-
29
)
25
The
UniProt Consortium
Reorganizing the protein space at the Universal Protein Resource (UniProt)
Nucleic Acids Res.
2011
, vol. 
40
 (pg. 
D71
-
D75
)
26
Dimmer
EC
Huntley
RP
Alam-Faruque
Y
et al. 
The UniProt-GO Annotation database in 2011
Nucleic Acids Res.
2011
, vol. 
40
 (pg. 
D565
-
D570
)
27
Jobling
MA
Tyler-Smith
C
The human Y chromosome: an evolutionary marker comes of age
Nat. Rev. Genet.
2003
, vol. 
4
 (pg. 
598
-
612
)
28
Flicek
P
Amode
MR
Barrell
D
et al. 
Ensembl 2012
Nucleic Acids Res.
2011
, vol. 
40
 (pg. 
D84
-
D90
)

Author notes

Citation details: Li,Q., Lian,S., Dai,Z., et al. BGDB: a database of bivalent genes. Database (2013) Vol. 2013: article ID bat057; doi:10.1093/database/bat057.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data