- Split View
-
Views
-
Cite
Cite
Hans Ienasescu, Kang Li, Robin Andersson, Morana Vitezic, Sarah Rennie, Yun Chen, Kristoffer Vitting-Seerup, Emil Lagoni, Mette Boyd, Jette Bornholdt, Michiel J. L. de Hoon, Hideya Kawaji, Timo Lassmann, The FANTOM Consortium, Yoshihide Hayashizaki, Alistair R. R. Forrest, Piero Carninci, Albin Sandelin, On-the-fly selection of cell-specific enhancers, genes, miRNAs and proteins across the human body using SlideBase, Database, Volume 2016, 2016, baw144, https://doi.org/10.1093/database/baw144
- Share Icon Share
Genomics consortia have produced large datasets profiling the expression of genes, micro-RNAs, enhancers and more across human tissues or cells. There is a need for intuitive tools to select subsets of such data that is the most relevant for specific studies. To this end, we present SlideBase, a web tool which offers a new way of selecting genes, promoters, enhancers and microRNAs that are preferentially expressed/used in a specified set of cells/tissues, based on the use of interactive sliders. With the help of sliders, SlideBase enables users to define custom expression thresholds for individual cell types/tissues, producing sets of genes, enhancers etc. which satisfy these constraints. Changes in slider settings result in simultaneous changes in the selected sets, updated in real time. SlideBase is linked to major databases from genomics consortia, including FANTOM, GTEx, The Human Protein Atlas and BioGPS.
Database URL:http://slidebase.binf.ku.dk
Background
Large consortia such as ENCODE (1), The Human Protein Atlas (2), Functional Annotation of Mammals (FANTOM) (3) and Genotype-Tissue Expression (GTEx) (4) have measured the molecular states of cells and tissues across the human body on different functional levels, including the expression of genes, miRNAs, enhancer regions and the abundance of proteins.
While these atlases represent powerful resources for understanding of human biology and disease, the navigation of such vast datasets is daunting for the majority of potential users. While several resources are available for showing the expression of a selected gene across multiple experiments of states [e.g. Bloodspot (5), Expression Atlas (6), BioGPS(7)], there is a lack of intuitive methods for selecting sets of genes (or, depending on data, proteins, miRNAs, enhancers etc.) that are specifically used within tissues or cells of interest. This presents a bottleneck for the broader usage of the data by the wider community.
Although pre-defining feature sets (e.g. ‘liver-specific genes’) is possible, usage requirements typically vary from case to case, and there are many ways of defining such sets. Complicating the selection process, tissue-exclusive expression for a given gene is rare, and related to this, many users require joint expression in two or more tissues/cells (e.g. ‘expression must come from mostly blood and brain’) or the selection of more complex sets where expression of some tissues are allowed but others not (e.g. ‘expression in epithelia cells but not immune cells’). A related issue which makes computational/statistical methods for selection challenging is that samples or conditions in most datasets are not equidistant from each other [e.g. brain tissues are often distinct but yet more similar to each other than other organs (8)].
Therefore, thresholds for cell- or tissue specificity are best defined by the users themselves in an iterative and exploratory fashion. User-friendly tools for this process are currently lacking.
To this end, we present SlideBase (http://slidebase.binf.ku.dk), a web-based tool interfacing leading genomics resources, which enables users to select genes, enhancers, promoters, miRNAs with user-customisable expression thresholds for each sample, through the usage of interactive sliders.
Results
Selecting genomic features by expression constraints defined by sliders
In SlideBase we aimed to create a simple and intuitive interface for users to search for genomic features (e.g. genes, enhancers) based on their expression across different samples (e.g. heart, brain, liver) from a given experimental resource [such as FANTOM (3), GTEx (4) or The Human Protein Atlas (2)].
In order to select genes based on these values, we define a ‘slider’ with values ranging from a minimum of 0% to a maximum of 100% for each sample. Each such slider works as an interactive expression contribution threshold setting device, effectively defining how much of expression that is required to come from e.g. brain. This is done by moving two percentage handles, which is equivalent to setting expression contribution thresholds for minimal or maximal expression.
For example, moving the ‘min’ threshold slider to 80% in the brain slider will select only genes for which brain contributes at least 80%—which in this case is only true for gene 1 (Figure 1B). In addition, slider constraints across many samples can be combined, so that we can request a specific contribution from e.g. blood and heart (Figure 1C). Moreover, sliders offer the option to exclude genes expressed in specified samples by setting both slider handles to 0%; this can be combined with other constraints to further refine the search (Figure 1D).
Once a selection has been made, the expression levels of selected enhancers are shown across all FANTOM samples (Figure 2B, left panel), along with overlaps with SNPs and likely target genes predicted by co-expression (9) (Figure 2B, right panel). Selected enhancers can be visualized in genome browsers with a simple click (Figure 2C). Selected enhancer regions may also be downloaded as BED and FASTA files, or analyzed for DNA motifs. For a detailed discussion on selection features, see below.
SlideBase datasets
We have built SlideBase tools to navigate a range of different genomic entities measured across human tissues or cell types based on different technological platforms: micro-array-based gene expression data from BioGPS (7), miRNA expression based on small RNA sequencing (deHoon et al., under review), transcription start site and enhancer RNA expression by CAGE (3,9), RNA-seq-based gene expression from GTEx (4) and joint RNA-seq expression and protein abundance using antibody-based immunohistochemical staining from the Human Protein Atlas (2) (Table 1). While the large majority of the web features showcased in Figure 2 for enhancer regions are available for all datasets, each selection tool is customized to fit the underlying data, and to link back to the original resource once the features of interest have been selected. Also, depending on the source data, specific selection modes are available, outlined below. For instance, for the Human Protein Atlas data, it is possible to place constraints on both protein abundance and RNA expression levels simultaneously.
Expression features . | # Features for selection . | # Cell or tissue types . | Experimental technique . | Underlying database/resource . |
---|---|---|---|---|
Gene expression | 24 602 genes | 84 tissues and primary cells | Microarray | BioGPS (7) |
Gene expression | 19 692 genes | 32 tissues | RNA-seq | Human protein atlas (2) |
Gene expression | 41 991 genes | 53 tissues | RNA-seq (median expression across large human cohorts) | GTEx (4) (v6) |
Transcription start site expression | 184 476 TSSs | 69 primary cell groups, 41 tissue groups | CAGE | FANTOM promoter atlas (3) |
Enhancer RNA expression | 32 693 enhancers | 69 primary cell groups, 41 tissue groups | CAGE | Human enhancer atlas (9) |
miRNA expression | 1857 miRNAs | 67 primary cells | sRNA-seq | Data described in de Rie et al. |
Protein expression | 14 578 proteins | 45 tissues | Immunohisto-chemical staining | The human protein atlas (2) (v15) |
Expression features . | # Features for selection . | # Cell or tissue types . | Experimental technique . | Underlying database/resource . |
---|---|---|---|---|
Gene expression | 24 602 genes | 84 tissues and primary cells | Microarray | BioGPS (7) |
Gene expression | 19 692 genes | 32 tissues | RNA-seq | Human protein atlas (2) |
Gene expression | 41 991 genes | 53 tissues | RNA-seq (median expression across large human cohorts) | GTEx (4) (v6) |
Transcription start site expression | 184 476 TSSs | 69 primary cell groups, 41 tissue groups | CAGE | FANTOM promoter atlas (3) |
Enhancer RNA expression | 32 693 enhancers | 69 primary cell groups, 41 tissue groups | CAGE | Human enhancer atlas (9) |
miRNA expression | 1857 miRNAs | 67 primary cells | sRNA-seq | Data described in de Rie et al. |
Protein expression | 14 578 proteins | 45 tissues | Immunohisto-chemical staining | The human protein atlas (2) (v15) |
Expression features . | # Features for selection . | # Cell or tissue types . | Experimental technique . | Underlying database/resource . |
---|---|---|---|---|
Gene expression | 24 602 genes | 84 tissues and primary cells | Microarray | BioGPS (7) |
Gene expression | 19 692 genes | 32 tissues | RNA-seq | Human protein atlas (2) |
Gene expression | 41 991 genes | 53 tissues | RNA-seq (median expression across large human cohorts) | GTEx (4) (v6) |
Transcription start site expression | 184 476 TSSs | 69 primary cell groups, 41 tissue groups | CAGE | FANTOM promoter atlas (3) |
Enhancer RNA expression | 32 693 enhancers | 69 primary cell groups, 41 tissue groups | CAGE | Human enhancer atlas (9) |
miRNA expression | 1857 miRNAs | 67 primary cells | sRNA-seq | Data described in de Rie et al. |
Protein expression | 14 578 proteins | 45 tissues | Immunohisto-chemical staining | The human protein atlas (2) (v15) |
Expression features . | # Features for selection . | # Cell or tissue types . | Experimental technique . | Underlying database/resource . |
---|---|---|---|---|
Gene expression | 24 602 genes | 84 tissues and primary cells | Microarray | BioGPS (7) |
Gene expression | 19 692 genes | 32 tissues | RNA-seq | Human protein atlas (2) |
Gene expression | 41 991 genes | 53 tissues | RNA-seq (median expression across large human cohorts) | GTEx (4) (v6) |
Transcription start site expression | 184 476 TSSs | 69 primary cell groups, 41 tissue groups | CAGE | FANTOM promoter atlas (3) |
Enhancer RNA expression | 32 693 enhancers | 69 primary cell groups, 41 tissue groups | CAGE | Human enhancer atlas (9) |
miRNA expression | 1857 miRNAs | 67 primary cells | sRNA-seq | Data described in de Rie et al. |
Protein expression | 14 578 proteins | 45 tissues | Immunohisto-chemical staining | The human protein atlas (2) (v15) |
SlideBase selector tools
For each dataset, we have created and adapted a specific selector tool. The selector tools for the five datasets are very similar in their functionality—therefore, we have chosen to describe one tool (the enhancer selector) in detail and then highlight the differences for the other datasets and corresponding selector tools. The main components for a selector tool are the ‘search’ (or ‘selector’) page and the ‘viewer’ page. The search page contains sliders for the samples that underlie the specific dataset, as well as custom search options that fit respective data sets, such as searching for specific gene names or genomic regions, as applicable to respective set. The viewer page displays an overview of the returned results, which includes expression on detailed level, along with other information and options. Complementing the below description of respective sets, tutorial videos, and the more detailed documentation pages are available at the homepage.
Human enhancer selector. The human enhancer selector (Figure 2A) enables searching for human enhancers (9) based on enhancer RNA expression in primary cells and/or tissues. Two sets of sliders are available corresponding to 69 primary cells groups and 41 tissue groups. Each such slider works as in the example in Figures 1 and 2. The two slider sets can be used independently or in a combined fashion, corresponding to either an ‘OR’ or ‘AND’ statement in selection. After adjusting the percentage values, the number of enhancers that satisfy the selected constraints are shown in the left upper corner or in a popup window. The number of resulting enhancers is updated in real time based on how constraints are changed. After a satisfying set of thresholds is selected, a detailed view of resulting enhancers can be accessed by pressing the ‘See Detailed Results’ button.
The primary cell and organ percentage constraint values can be disabled by clicking the ‘Disable’ button located in the header of both slider set containers. Disabling the percentage values for either or both cells and tissues implies that their percentage numbers will not be taken into account when searching for enhancers.
The ‘Viewer page’ (Figure 2B) displays detailed information about the resulting enhancers and allows the user to perform actions on the resulting set such as downloading and viewing in genome browsers. Clicking on an enhancer shows information about that individual enhancer as three different panels (tabs) ‘CAGE Details’ for expression, ‘Browse’ for genome browser views and ‘View Overlaps’ for basal overlaps and correlations with proximal genomic entities.
The CAGE detail panel shows the enhancer expression. Specifically, percentage and expression tables for primary cells/organ groups [called facets in the original article (9)], and libraries (actual sample level expression for all the samples that make up each facet) are shown (Figure 2B, left panel). Red bars indicate that a particular sample is statistically over-represented as defined in Ref. (9). The expression table contains links to the FANTOM SSTAR resource (3,10) which holds detailed information about each sample.
In the browser panel, the user can create a custom track of selected enhancers in the UCSC genome browser (Figure 2C) (11) or look them up in the ZENBU genome browser (12).
The ‘View overlaps’ panel contains predicted target promoter(s) of selected enhancers [predicted by expression correlations as in Ref. (9)], and SNP overlaps from dbSNP (13) (Figure 2B, right panel). If an enhancer–promoter overlap exists, the TSS regions can be further inspected by clicking on the promoter annotation, which will take the user to the corresponding SlideBase promoter viewer page. An enhancer-SNP overlap is reported when the distance between the enhancer midpoint and the SNP is 200 bp or less. Overlapping SNPs are linked to dbSNP (13).
In addition to selecting a single enhancer, a number of options are available for the whole set of resulting enhancers. The options are displayed as buttons shown above the enhancer list, and enable (i) downloading the coordinates of selected enhancers, (ii) uploading all selected enhancers as a track in UCSC Genome Browser (11) (similar to above, but the whole set of selected enhancers will create a single UCSC track), (iii) obtaining the DNA of selected enhancers (with optional flanks) as a FASTA file and (iv) motif discovery of these DNA sequences using MEME (14).
Similar, the corresponding viewer page is almost identical to the enhancer viewer page, but shows additional annotation details for promoters. In particular, a ‘promoter annotation’ tab is available and specifies if the promoter is associated with known genes or is novel. If the promoter is associated with a known gene [as defined in Ref. (3)], links to NCBI Gene (16), UniProt (17) and FANTOM SSTAR (10) databases are available.
Human miRNA selector. The miRNA selector allows for searching for miRNAs expressed across 67 human cell types, based on small RNA-seq (de Rie et al., submitted). Therefore, only a single panel of sliders is available within this SlideBase selector.
The miRNA viewer page is almost identical to the enhancer viewer page offering options for viewing miRNAs expression, downloading as BED and FASTA files, and browsing miRNAs locations in the UCSC Genome Browser.
GTEx selector. The GTEx project is a large-scale effort to profile gene expression in a wide range of tissues and individuals using RNA-seq in combination with genotyping of the same individuals, enabling the detection of quantitative trait loci (4). In SlideBase, we were interested in overall expression patterns across tissues rather than genetics. Therefore, we created sliders corresponding to the median expression across individuals for 53 tissues in GTEx. The viewer page displays expression levels across different samples for the resulting genes, displays linked single tissue eQTLs, and allows the user to perform actions on the resulting set such as viewing genes in the GTEx Gene Page and GTEx eQTL Browser along with viewing linked eQTLs in the GTEx SNP Page.
BioGPS GeneAtlas selector. BioGPS is a gene information portal, which can be used to make custom reports on genes and expression (7). One of the most popular features of BioGPS is the expression atlas, based on systematic microarray analysis of cells and tissues using U133A Affymetrix arrays.
Based on the expression of individual genes across different samples, there are a total of 84 sliders for the dataset samples which include both tissues and primary cells, allowing the selection of Affymetrix probe sets (typically corresponding to annotated genes) with custom expression thresholds.
The viewer page displays information about the resulting probe sets, their expression across samples, linked genes and allows the user to perform actions on the resulting set such as viewing genes in the BioGPS Gene Report page on the BioGPS website.
The viewer page displays information about the resulting genes, their expression and protein abundance (Figure 4B), and allows the user to perform actions on the resulting set such as viewing genes in The Human Protein Atlas web tool.
Implementation
SlideBase was developed as a web user interface using HTML for displaying pages, CSS/Bootstrap (http://getbootstrap.com) for styling and JavaScript along with JavaScript-based libraries to support interactive page components such as sliders. The server side component was implemented using the Laravel PHP framework (https://laravel.com/) for application logic along with a MySQL database (https://www.mysql.com) for data handling. The server also employed an Opal web service client (18) for MEME jobs.
Discussion
While the current implementation of SlideBase demonstrates the clear utility of the slider selection concept for diverse data sets, SlideBase selectors will also be enabled for other data types in collaboration with the community. This could go beyond expression-based data. For instance, it is easy to imagine slider-based selection on data split by disease- or trait-specificity rather than cell or tissue, or selection based on other features than expression, such as the density of sequence patterns. In the future, we will also develop a system in which custom slider sets can be created to fit a new dataset submitted by users. In such a setting, SlideBase would be used as an exploratory tool in parallel with data production rather than being based on existing resources.
Dataset links
FANTOM5 Promoters: http://fantom.gsc.riken.jp/data/ FANTOM5 Enhancers: http://fantom.gsc.riken.jp/data/ FANTOM5 miRNAs: http://fantom.gsc.riken.jp/data/ BioGPS Dataset: GeneAtlas U133A, gcrma: http://biogps.org/dataset/GSE1133/geneatlas-u133a-gcrma/
Human Protein Atlas: http://v15.proteinatlas.org/about/download, http://v15.proteinatlas.org/download/normal_tissue.csv.zip, http://v15.proteinatlas.org/download/rna_tissue.csv.zip, http://v15.proteinatlas.org/download/proteinatlas.xml.gz
GTEx V6: Register a free account at gtexportal.org to download these files. For RNA-seq, the GTEx Analysis V6 gene median dataset was used (GTEx_Analysis_v6_RNA-seq_RNA-SeQCv1.1.8_gene_median_rpkm.gct.gz). For eQTL SNP annotation, we used (GTEx_ Analysis _V6_ eQTLs.tar.gz).
Acknowledgements
This work is part of the FANTOM5 project. Data, tools and co-published manuscripts are summarized at http://fantom.gsc.riken.jp/5/. FANTOM5 was made possible by a Research Grant for RIKEN OSC from MEXT to YH, Grant from MEXT for the RIKEN PMI to YH, Grant of the Innovative Cell Biology by Innovative Technology (Cell Innovation Program) from the MEXT to YH and Grant from MEXT to the RIKEN CLST.
Funding
ELIXIR, the Novo Nordisk Foundation and the Lundbeck Foundation (to A.S.).
Conflict of interest. None declared.
Documentation
Web tutorials are available on the website http://slidebase.binf.ku.dk and also as videos here:
SlideBase Usage Tutorial: https://www.youtube.com/watch?v=R2kagw-c8A4
SlideBase FANTOM5 Background Research on Promoters and Enhancers: https://www.youtube.com/watch?v=vWg2nm-NYMs
The SlideBase documentation page is available at: http://slidebase.binf.ku.dk/docs/.
References
Author notes
Citation details: Ienasescu H., Li K., Andersson R., Vitezic M., et al. On-the-fly selection of cell-specific enhancers, genes, miRNAs and proteins across the human body using SlideBase. Database (2016) Vol. 2016: article ID baw144; doi:10.1093/database/baw144