- Split View
-
Views
-
Cite
Cite
Bin Tan, Saige Xin, Yanshi Hu, Cong Feng, Ming Chen, LBD: a manually curated database of experimentally validated lymphoma biomarkers, Database, Volume 2022, 2022, baac051, https://doi.org/10.1093/database/baac051
- Share Icon Share
Abstract
Lymphoma is a heterogeneous disease caused by malignant proliferation of lymphocytes, resulting in significant mortality worldwide. While more and more lymphoma biomarkers have been identified with the advent and development of precision medicine, there are currently no databases dedicated to systematically gathering these scattered treasures. Therefore, we developed a lymphoma biomarker database (LBD) to curate experimentally validated lymphoma biomarkers in this study. LBD consists of 793 biomarkers extracted from 978 articles covering diverse subtypes of lymphomas, including 715 single and 78 combined biomarkers. These biomarkers can be categorized into molecular, cellular, image, histopathological, physiological and other biomarkers with various functions such as prognosis, diagnosis and treatment. As a manually curated database that provides comprehensive information about lymphoma biomarkers, LBD is helpful for personalized diagnosis and treatment of lymphoma.
Database URL
Introduction
Lymphomas are clonal neoplasms of immune cells, i.e. B cells, T cells and natural killer (NK) cells (1, 2). According to the latest version of lymphoma classification revised by the World Health Organization (WHO), lymphomas can be divided into three major groups: B-cell lymphomas, T-cell and NK-cell lymphomas and Hodgkin lymphomas (3, 4). It is estimated that there are over 620 000 new cases and over 280 000 deaths of lymphoma in 2020 (5), which makes lymphoma one of the most common and deadly cancers in the world. The mechanism of lymphoma pathogenesis converges on some common pathways such as cell growth and proliferation, differentiation block, apoptosis inhibition and genomic instability (6). New insights such as proteolytic deregulation of epigenetic modifiers have emerged (7–9) and provide an alternative pathway for the clinical management of lymphoma. However, owing to the heterogeneity of lymphoma (10), it is significant to customize specific clinical therapy for each patient.
Biomarkers, or biological markers, are evaluable indicators of certain biological states during normal and pathological processes, or biological responses to a therapeutic intervention (11, 12). On the basis of their given applications, biomarkers play a pivotal role in clinical practice and can be used as tools for disease prognosis, diagnosis and treatment (13). To date, studies have identified a number of lymphoma biomarkers that have been experimentally validated. For example, it is confirmed that the level of Cluster of Differentiation 20 (CD20) has a significant correlation with the prognosis of patients with B-cell lymphoma (14). The introduction of rituximab, a monoclonal antibody targeting the CD20 receptor, has been proven to have an effect on patients with diffuse large B-cell lymphoma (15). The addition of rituximab into cyclophosphamide, doxorubicin, vincristine and prednisone (CHOP) chemotherapy has better treatment results compared with CHOP alone.
In the big data era, more and more databases and tools for different disease biomarkers have been developed. For instance, database of prognostic biomarkers and models for hepatocellular carcinoma (dbPHCC) collected 567 prognosis biomarkers for hepatocellular carcinoma, including 323 proteins, 154 genes and 90 microRNAs (16). Colon Rectal Cancer Gene Database (CoReCG) compiled 268 biomarkers of gene mutation or expression alteration in colorectal cancer (17). Infectious Disease Biomarker Database (IDBD) gathered 611 diagnostic or therapeutic biomarkers with structural or expression changes in infectious diseases (18). Urinary Protein Biomarker database (UPB), as a protein biomarker database for urological diseases, aggregated data on the differential expression of 553 human proteins between disease and controls or between different disease stages (19). However, to our knowledge, there are no databases concerning lymphoma biomarkers. Thus, we created a distributed and multifaceted database of lymphoma biomarkers. Until now, the database has integrated 793 lymphoma biomarkers from 978 articles in PubMed. Users can browse and search information of specific biomarkers and download biomarker data of their interests. All biomarker information is available at http://bis.zju.edu.cn/LBD.
Materials and Methods
The main procedures of lymphoma biomarker database (LBD) construction are illustrated in Figure 1, including data collection, data annotation and database construction. All steps are implemented by experienced postgraduates who are majored in biomedicine and bioinformatics.
Data collection
To extract literature related to lymphoma biomarkers from PubMed, we used the following term: (lymphoma[Title] OR (lympho*[Title] AND (cancer*[Title] OR carcinoma*[Title] OR tumor*[Title] OR tumour*[Title] OR neoplasm*[Title] OR malignanc*[Title]))) AND (biomarker*[Title] OR marker*[Title] OR predict*[Title] OR indicat*[Title]). Based on these search keywords, 4245 articles published until January 2022 were collected. The gathered articles were further reviewed in line with the following criteria:
Article types such as reviews, meta-analyses, case reports, comments, letters and editorials should be removed.
Research studies based on other species rather than humans are excluded.
Articles without full text or non-English literature are eliminated.
In total, 1633 articles remained after filtering. By manually reading abstracts of these articles, we further wiped out articles that were not fully connected with the designated theme. If the biomarker name, lymphoma name and biomarker application curated from one article are the same as those curated from another article, we recorded it once in our database. Ultimately, 978 articles were selected as the source of our database, from which 793 biomarkers (715 single and 78 combined biomarkers) were extracted. The biomarkers were then categorized into prognostic, diagnostic, therapeutic or combined biomarkers in accordance with their applications.
Data annotation
To provide the ontology information of lymphoma biomarkers, especially molecular biomarkers, and to ensure the unification of biomarkers curated from literature, some other databases and tools were also utilized. For example, we searched the National Center for Biotechnology Information (NCBI) gene database (https://www.ncbi.nlm.nih.gov/gene) for the annotation of DNA and RNA and UniProt (https://www.uniprot.org/uniprot) for the annotation of protein. The chemicals were annotated based on the information from Human Metabolome Database (HMDB) (https://hmdb.ca) (20). Google browser was exploited when necessary.
Database construction
The database was constructed based on LAMP (Linux, Apache, MySQL and PHP) mode. MySQL was selected to store biomarker information. The front end of the webpage was written in HTML, CSS and JavaScript, while PHP was used to realize the interaction of the webpage. The database was running under the environment of Linux and Apache.
Results
Database framework
LBD consists of six pages, i.e. Home, Browse, Search, Statistics, Download and Help, as illustrated in Figure 2A. The ‘Home’ page provides an overall introduction of LBD. Users can browse and search biomarker information in the ‘Browse’ and ‘Search’ page, respectively. The ‘Statistics’ page is made up of statistical charts of biomarkers. In the ‘Download’ page, users can download all the biomarker information or download what they need according to biomarker categories, applications or lymphoma types. More information about how to make use of the database can be found in the ‘Help’ page.
Data retrieval
LBD provides browse and search functions for users to retrieve data. As for the browse function, users can either click the dropdown menu of the ‘Browse’ button in the navigation bar or click the ‘Go to Browse’ link on the homepage. Users can browse by lymphoma types, biomarker types as well as biomarker applications (Figure 2B). Two search modes are provided, including keyword search and advanced search. For the keyword search, users can search biomarker information by any keyword such as a biomarker name (BCL-2), lymphoma name (diffuse large B-cell lymphoma) or biomarker application (diagnosis) (Figure 2C). In advanced search, one input box (biomarker name) and three search options (biomarker category, biomarker application and lymphoma type) are provided for users to filter specific biomarkers (Figure 2D). The search results are displayed in the format of a table. Take interleukin-18 (IL-18) as an example, when clicking the ‘detail’ button, users will get access to detailed information of IL-18 (Figure 2E). The detail page contains five sections of information: biomarker information, lymphoma information, sample information, reference information and additional links.
Statistics of biomarkers
LBD consists of 793 biomarkers at molecular (DNAs, RNAs, proteins and chemicals), cellular (cells or cell counts), image (imaging technologies or parameters), histopathological (characteristics of individuals’ histopathological functioning), physiological (characteristics of individuals’ normal functioning) and other levels, including 715 single and 78 combined biomarkers. Figure 3A exhibits the biomarker distribution based on their types, in which molecular biomarkers top the list, followed by cellular and image biomarkers. As for single molecular biomarkers, protein accounts for more than two-thirds of molecular biomarkers, exceeding the sum of DNA, RNA and chemical. Figure 3B shows the biomarker distribution based on their clinical applications, and most of the biomarkers are employed for prognosis. The three most frequently used samples are tissue, blood and cerebrospinal fluid, which are illustrated in Figure 3C. Figure 3D and E introduces the tendency for the number of articles published in the last 10 years and the top 11 countries with the largest number of articles in the field of lymphoma biomarkers, respectively.
Biomarkers in LBD are highly correlated with lymphoma pathogenesis
Since protein biomarkers account for nearly one-half of all biomarkers in LBD, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis using these protein biomarkers with Metascape (http://metascape.org) (21). The results are summarized in Figure 4A. Apart from some common pathways shared by immune system diseases, pathways that are correlated with lymphoma pathogenesis have been identified. For example, cytokine IL-2 can induce tyrosine phosphorylation of JAK1 and JAK3 (22, 23), which contributes to the pathogenesis of cutaneous T-cell lymphoma. The activation of signaling pathways, such as the nuclear factor-kappa B (NF-κB) signaling pathway and the apoptotic signaling pathway, may also lead to the occurrence of lymphoma (24). Novel concepts of lymphoma pathogenesis, such as ubiquitin-mediated proteolysis and phosphorylation-mediated kinase signaling, have been revealed with the development of mass spectrometry-based proteomics (25, 26).
Moreover, among these protein biomarkers, three densely connected protein–protein interaction network components have been identified with the Molecular Complex Detection (MCODE) algorithm (27), as shown in Figure 4B. Pathway enrichment analysis has been applied to each MCODE component independently, and the three best-scoring terms ranked by P-value (negative log-transformed) have been retained in Table 1. For each MCODE network, the enrichment result displays a strong connection between protein biomarkers and lymphoma pathogenesis. For instance, the activation of the PI3K-Akt pathway in diffuse large B-cell lymphoma may result in gene mutations (28). The oncogenic function of sirtuin 6 (SIRT6) has been identified in diffuse large B-cell lymphomas by accelerating the cell cycle out of the G2/M phase and reducing rates of apoptosis (29). The concentration of cytosolic calcium ion and its sensor molecules (stromal interaction molecule 1 (STIM1) and stromal interaction molecule 2 (STIM2)) are crucial for cellular functions in B cells, including adhesion, differentiation, proliferation, effector functions and gene expression (30). Thus, the functional enrichment analysis results provide valuable hints for deciphering lymphoma pathogenesis.
MCODE . | Pathway ID . | Description . | −Log10(P) . |
---|---|---|---|
MCODE1 | hsa05200 | Pathways in cancer | 16.6 |
MCODE1 | hsa04151 | PI3K-Akt signaling pathway | 15.0 |
MCODE1 | hsa05169 | Epstein–Barr virus infection | 13.6 |
MCODE2 | GO:2001233 | Regulation of apoptotic signaling pathway | 8.6 |
MCODE2 | hsa05163 | Human cytomegalovirus infection | 8.2 |
MCODE2 | GO:0030162 | Regulation of proteolysis | 7.9 |
MCODE3 | GO:0007204 | Positive regulation of cytosolic calcium ion concentration | 10.2 |
MCODE3 | GO:0051480 | Regulation of cytosolic calcium ion concentration | 9.9 |
MCODE3 | GO:0006874 | Cellular calcium ion homeostasis | 9.3 |
MCODE . | Pathway ID . | Description . | −Log10(P) . |
---|---|---|---|
MCODE1 | hsa05200 | Pathways in cancer | 16.6 |
MCODE1 | hsa04151 | PI3K-Akt signaling pathway | 15.0 |
MCODE1 | hsa05169 | Epstein–Barr virus infection | 13.6 |
MCODE2 | GO:2001233 | Regulation of apoptotic signaling pathway | 8.6 |
MCODE2 | hsa05163 | Human cytomegalovirus infection | 8.2 |
MCODE2 | GO:0030162 | Regulation of proteolysis | 7.9 |
MCODE3 | GO:0007204 | Positive regulation of cytosolic calcium ion concentration | 10.2 |
MCODE3 | GO:0051480 | Regulation of cytosolic calcium ion concentration | 9.9 |
MCODE3 | GO:0006874 | Cellular calcium ion homeostasis | 9.3 |
MCODE . | Pathway ID . | Description . | −Log10(P) . |
---|---|---|---|
MCODE1 | hsa05200 | Pathways in cancer | 16.6 |
MCODE1 | hsa04151 | PI3K-Akt signaling pathway | 15.0 |
MCODE1 | hsa05169 | Epstein–Barr virus infection | 13.6 |
MCODE2 | GO:2001233 | Regulation of apoptotic signaling pathway | 8.6 |
MCODE2 | hsa05163 | Human cytomegalovirus infection | 8.2 |
MCODE2 | GO:0030162 | Regulation of proteolysis | 7.9 |
MCODE3 | GO:0007204 | Positive regulation of cytosolic calcium ion concentration | 10.2 |
MCODE3 | GO:0051480 | Regulation of cytosolic calcium ion concentration | 9.9 |
MCODE3 | GO:0006874 | Cellular calcium ion homeostasis | 9.3 |
MCODE . | Pathway ID . | Description . | −Log10(P) . |
---|---|---|---|
MCODE1 | hsa05200 | Pathways in cancer | 16.6 |
MCODE1 | hsa04151 | PI3K-Akt signaling pathway | 15.0 |
MCODE1 | hsa05169 | Epstein–Barr virus infection | 13.6 |
MCODE2 | GO:2001233 | Regulation of apoptotic signaling pathway | 8.6 |
MCODE2 | hsa05163 | Human cytomegalovirus infection | 8.2 |
MCODE2 | GO:0030162 | Regulation of proteolysis | 7.9 |
MCODE3 | GO:0007204 | Positive regulation of cytosolic calcium ion concentration | 10.2 |
MCODE3 | GO:0051480 | Regulation of cytosolic calcium ion concentration | 9.9 |
MCODE3 | GO:0006874 | Cellular calcium ion homeostasis | 9.3 |
Discussion
Over the past 30 years, tremendous advances have been made in the pathological, biological and molecular characterization of lymphomas (31). The use of advanced diagnostic tools that combine morphological, immunophenotypic and genetic analysis has led to the more precise histological classification of different types of lymphomas, which has indirectly accelerated the development of lymphoma-targeted therapy (32, 33). Precision oncology seeks a specific targeted therapy (or combination of treatments) to match the unique genetic and molecular composition of each individual cancer (34). Biomarkers, which can be measured as an indicator of biological processes or response to a specific therapy, have provided excellent functional insights into disease pathogenesis (35–37). Novel biomarker-directed clinical trial designs for lymphoma may expedite the process and reduce the cost of drug development. For instance, the inhibition of biomarker Syk prompted the clinical trial of fostamatinib disodium for patients with B-cell non-Hodgkin lymphoma (38).
Today, plenty of databases and tools have been developed for a certain category of disease. To our knowledge, however, no databases or knowledgebases for lymphomas have been constructed until now. Thus, in this study, a versatile and comprehensive database (LBD) integrating lymphoma biomarkers has been developed. LBD consists of 793 lymphoma biomarkers that can be categorized into molecules, cells, images, histopathologies, physiologies and others. Compared with other similar databases, LBD contains not only molecular biomarkers, such as DNA, RNA and protein, but also biomarkers at image, cellular, physiological and pathological levels. Biomarkers at different levels can be combined and modeled to improve the accuracy of predicting lymphoma occurrence and progression. At the same time, biomarkers in LBD can be divided into prognostic, diagnostic and therapeutic applications, which are beneficial for lymphoma precision medicine. In addition, we revised and standardized biomarker information according to NCBI gene and Uniprot databases to make it more accurate. What is more, four other database links are provided, including KEGG, University of California Santa Cruz (UCSC) Genome Browser, Ensembl and Online Mendelian Inheritance in Man (OMIM). Information in LBD can be retrieved and compared as a dictionary in support of lymphoma research.
Nevertheless, lymphoma is a heterogeneous disease that can vary among different lymphoma types such as indolent B-cell lymphomas and aggressive lymphomas (10, 39–41). Also, new research studies concerning lymphoma biomarkers are emerging. Therefore, it is impossible to gather all information about lymphoma biomarkers at one time. In the future, LBD will be updated and optimized continuously to fulfill more functions.
Conclusion
In summary, LBD is the first LBD, which can provide a practical platform for relevant researchers to obtain information of their interests. Through this database, clinicians and researchers can find various types of biomarkers of different lymphoma phenotypes quickly and easily. We hope that LBD will be of value in precision medicine for lymphoma.
Acknowledgements
The authors thank the anonymous reviewers for their constructive comments and the members of Ming Chen’s laboratory for helpful discussions and valuable comments.
Funding
National Natural Sciences Foundation of China (32070677); 151 Talent Project of Zhejiang Province (first level); Jiangsu Collaborative Innovation Center for Modern Crop Production and Collaborative Innovation Center for Modern Crop Production cosponsored by province and ministry.
Conflict of interest
None declared.