-
PDF
- Split View
-
Views
-
Cite
Cite
Biniam Haile, Adnan Cinar, Sayeda F Banini, Katrin Zhivkova, Finn J Gallagher, Christopher J Richardson, Frances M G Pearl, MOKCa-3D database: functional and structural analysis of missense mutations in cancer, Database, Volume 2026, 2026, baag001, https://doi.org/10.1093/database/baag001
Close - Share Icon Share
Abstract
Determining the functional consequence of missense mutations acquired in the development of cancer is critical to the understanding of the evolution and the therapeutic vulnerabilities of an individual tumour. Several million missense mutations associated with cancer have been reported across different databases with little functional annotation accompanying each mutation. We have designed the MOKCa-3D database, (https://bioinformaticslab.sussex.ac.uk/MOKCa-3D/) to enable the contextualization and interpretation of cancer somatic missense mutations, including the structural impact of the mutation on the 3D structure, and whether the mutation results in a gain or loss of the protein’s function. For each protein, a sequence feature viewer enables interactive visualization of the amino acid sequence, missense mutations, post-translational modification sites, protein domains, active sites, binding sites, protein–protein interaction sites, and mutational frequency. The mutation-level page concisely presents functional insights for each individual mutation, and an interactive MOL* viewer highlights mutated residue on an AlphaFold protein structural model. The SAAP structural impact analysis pipeline was used to identify the structural impact of the mutation. MOKCa-3D concisely presents functional insights and structural impacts of cancer somatic missense mutations enabling users to interpret their functional consequences. It is freely accessible and easy to navigate, making it usable by the widest range of researchers.
Introduction
Cancer is typically characterized by its uncontrolled cell proliferation, which is caused by the accumulation of mutations in critical sites of genes that are involved in cell cycle regulation, DNA damage repair, and apoptosis [1, 2]. These mutations can be inherited or acquired during the lifetime of an individual due to exposure to environmental hazards like radiation, chemicals, and viral infections [3], as well as inherent cellular processes. For instance, mutations in BRCA1 and BRCA2 genes are associated with an increased risk of developing breast and ovarian cancers. Understanding these mutations has led to targeted cancer therapies [4].
Alterations of the nucleotide sequence of a gene can have a variety of effects on the encoded protein, including frameshift and missense mutations that impact the protein’s structure and function. The most common alteration is a single base change, which usually leads to a missense mutation of a single amino acid or, more rarely, the introduction of a stop codon.
Cancer-associated missense somatic mutations are identified from analysis of tumour DNA sequences. For example, over three million distinct somatic missense mutations have been identified in COSMIC v98 [5]. However, only a small fraction of these somatic missense mutations are likely to be ‘driver’ mutations that contribute to tumorigenesis. It’s important to distinguish those that initiate and drive tumour progression from ‘passenger’ mutations that are not contributing to the cancer phenotype but are the result of the increased mutational rate that occurs in most cancers.
Understanding which mutations are passengers and which are likely to be drivers requires a highly detailed analysis of each mutation in terms of the structural and functional impact it has on the encoded protein, using statistical tools and machine learning methods.
Most existing mutation databases and web tools for annotating and storing cancer-related mutations lack critical insights into the structural impact of mutations. Furthermore, they often fail to determine whether a mutation is a driver and, if so, whether it results in a gain of function (GOF) or loss of function (LOF) of the protein [6, 7]. The MOKCa-3D database presents structural impact and functional annotation of missense mutations in more comprehensive way, including GOF and LOF assessment for known driver mutations curated from the literature.
Building the MOKCa-3D database
Mutation data collection and processing
Data collection and processing are crucial to standardizing data and ensuring its quality. The overall data processing and analysis workflow is shown below in Fig. 1. Mutation data were obtained from The Cancer Genome Atlas (TCGA) Research Network using Genomic Data Commons Data Portal (GDC portal) [8] and the UniProt Database [9]. Each mutation was mapped to protein sequences using UniProt accession ID or their Ensembl transcript ID with the biomaRT tool [10]. UniProt Fasta sequences were cross-checked with the mutations to verify wild-type residues [9].

MOKCa-3D data processing workflow. This figure shows the overall workflow consisting of four major components. (A) Mutations data processing—collection, filtering, normalization, and mapping of cancer associated mutations from TCGA and UniProt, (B) structural impact analysis—mapping mutations onto 3D protein structures and assessing those located on AlphaFold models with pLDDT > 60 using SAAP to evaluate structural consequence, (C) mutations annotation data—collecting and processing functional annotation, including GOF and LOF classification from curated literature, post-translational modifications, pfam domains, and protein interface residue, and (D) MySQL Database—all processed mutations data and annotations were stored in relational database for efficient retrieval.
For each sample, if a missense mutation was mapped to multiple transcripts for the same gene, we selected the mutation mapped to the UniProt canonical sequence to be included in MOKCa-3D. When a specific mutation was identified in more than one sample (e.g. BRAF V600E), only one instance of the mutation is recorded, and the frequency of the mutation in MOKCa-3D is calculated. This resulted in 1,836,979 unique missense mutations (aggregates) to include in MOKCa-3D (Fig. 1).
Annotation data overview
The mutation data processed in this study was enriched with annotations to provide detailed insights into their structural, functional, and pathogenic impacts. MOKCa-3D identifies five gene categories with particular importance in cancer, which are assigned using a variety of online resources. Tumour suppressor genes [5, 7, 11], oncogenes [5, 7, 11], DNA damage response genes, [12], protein kinases (KinaseMD [13]), and current drug targets [7, 14, 15] are all highlighted.
Gene level annotations providing details on the molecular function, biological process, and cellular locations for each protein were retrieved from Amigo2 [16]. For each canonical sequence, residue level annotations were also included to identify whether mutations occurred near or at a functional site. Pfam protein domain assignments were extracted from InterPro [17], and binding sites and active site positions retrieved from UniProt. PTM sites were annotated using PhosphoSitePlus [18], and protein–protein interaction residues were annotated using data from PIONEER [19]. Mutations at or near functional sites have the potential to disrupt protein function by disrupting the regulation of their role in essential biological processes.
For each missense mutation, pathogenicity was assigned using AlphaMissense, which categorizes mutations as pathogenic, benign, or uncertain [20]. Highly recurring mutations were identified. Lastly, gain-of-function and loss-of-function mutations were annotated using assessment curated from literature and from an in-house prediction algorithm (Manuscript in Preparation). These mutations either enhance (GOF) or reduce protein activity (LOF) and can be influential in the progression of cancer and for understanding therapeutic outcomes.
Structural impact analysis (SAAP)
AlphaFold2 protein structure models of the human proteome were downloaded from EBI [21] and were used to calculate the structural impact of missense mutations using the SAAP structural impact analysis pipeline [22]. To ensure a high level of reliability in these calculations, only mutations where the wild-type residue in the AlphaFold2 model had a pLDDT confidence score of 60% or higher were analysed [21]. Although residues with a pLDDT threshold of 70% or over are seen as reliably modelled, in highly structured regions residues with pLDDTs of 60% can also be reliable (Alessia David, Personal Communication, ISMB 2025). Using a reduced pLDDT score of 60% allowed an extra 85384. mutations to be structurally assessed. This resulted in the structural analysis of 1244744 missense mutations.
The SAAP pipeline checks if a mutation is disruptive to the protein structure by assessing whether a mutation disrupts the native hydrogen bonding in the protein; disrupts a disulfide bond; involves a mutation to a proline residue; is a mutation from a glycine residue; causes a steric clash; introduces a void into the core of the protein; is a mutation to a cis-proline; introduces a charge shift in the core of the protein; introduces a hydrophobic residue on the protein’s surface; or introduces a hydrophilic residue in the protein core [22].
This multi-dimensional annotation strategy provides a comprehensive framework for assessing mutations’ structural and functional effects, offering valuable insights into the consequence of the mutations.
Database design and backend
MOKCa-3D’s overall architectural design is shown in Fig. 2. All mutation and annotation data are stored in a relational database using MySQL v5.7.24, with a RESTful API backend implemented using Django v5.0.2 with four endpoints to serve data from the database to the frontend via HTTP protocol. This enables scalability, portability, flexibility, and easy integration of the MOKCa-3D server with a web interface that facilitates data presentation, visualization, and interaction. The design was implemented to increase query speed performance, future expansion, and data integrity. Currently, the database schema includes eleven tables (shown in Fig. S3).

MOKCa-3D overall architectural design. This figure shows the overview of the MOKCa-3D platform architecture, consisting of MySQL relational database for mutation and annotation data, a Django backend for data processing and API endpoints to deliver data, and ReactJS front-end for interactive visualization of structural and functional mutations effect.
The MOKCa-3D web interface
The MOKCa-3D web interface (Fig. 3) was implemented using React JS v18.2.0 web framework and Tailwind CSS v3.4.3 for styling. The interface was designed for responsiveness and ease of use. The web interface includes gene level annotation pages and individual mutation level annotations. A Mol* viewer is integrated to visualize protein 3D structure in a more interactive session with mutation residue highlighted [23].

MOKCa-3D web interface. This figure shows user interface for gene-based queries to explore cancer-associated mutations or browse set of genes (Oncogene, Tumour-Suppressor, Protein-kinase, Drug-Target genes, DNA damage response proteins).
Gene level annotation
Users can search for specific genes using the HGNC symbol, UniProt accession, or Ensembl ENSG ID, or users can browse the five gene class datasets using the dropdown list. The gene-level page provides gene level information, and links are provided to external databases, including UniProtKB [9], Ensembl [24], CanSar.ai [25], PhosPhositePlus [18], Amigo2 [16], and InterPro [17]. An integrated feature-viewer presents the encoded protein sequence, mutations, PTM-sites, protein interface residues, active sites, binding sites, and protein domain annotation. The interactive graph allows users to inspect individual mutations and see the surrounding functional site annotations. (Fig. S1). Individual mutation pages are accessed from the table below, by clicking on the position of the mutation. This Table also includes the Alpha Missense pathogenicity of the mutational, a GOF/LOF assessment, and the tumour types in which the mutation has been observed. (Fig. S1).
Mutation level annotation
The user can select individual mutations from the gene-level page (Fig. 4). This loads the mutation-level page that provides relevant functional annotation extracted from external databases (Fig. S2). These include the predicted pathogenicity of the mutation and, if available, a GOF/LOF assessment curated from literature with PubMed ID links to the relevant articles. Residue modifications, protein-interface residues, active sites, and binding sites that occur at the position of the mutation or within 3 residues are presented in a list.

Searching MOKCa-3D. This figure shows an example of use, analysing mutations in KRAS. (A) Gene search page—users begin by querying a gene of interest. (B) Gene-level view (example: KRAS)—displays functional annotations (GO terms) and a filterable mutation list. (C) Mutation-level view (example: G12V)—presents detailed functional and structural annotations along with 3D molecular visualization.
An interactive session loads the AlphaFold model and highlights the mutated residue on the structure for further 3D exploration. The structural impact analysis of the mutation is presented in a highly summarized manner, highlighting ten features from SAAP that detail how the mutation affects the structure and potentially impairs protein functionality.
Example of use
MOKCa-3D allows users to explore specific gene mutation of interest. In this example, we will focus on the KRAS G12V mutation, which is associated with various types of tumours, including pancreatic ductal carcinoma, colorectal cancer, and lung adenocarcinoma [26].
To start, users can search for a gene by its name (KRAS), and MOKCa-3D will return it in the search results. By clicking on the gene name, user is directed to the gene-level annotations page. Gene Ontology shows that KRAS is involved in the MAPK cascade, is found in the cytoplasm and the Golgi apparatus, and has GTPase activity. The Pfam link [17] shows that it has a RAS domain, and the links to CanSar.Ai [25] show an assessment of KRAS as a drug target.
The feature viewer [27] displays an interactive visualization showing the protein sequence along with all mutations present in the protein. The mutational frequency shows peaks at positions 12, 13, 117, and 146 in the amino acid sequence. Post-translational modification (PTM) sites, interface residues, functional sites, and protein domains are documented along the length of the protein. This comprehensive visualization allows users to better understand the entire protein and its specific mutations.
In the section below, documented somatic cancer mutations in KRAS are displayed in a Table. By clicking on a specific mutation (e.g. G12V) in the table, the user is taken to a detailed mutation page.
The KRAS G12V mutation page shows that this mutation is a documented GOF mutation and provides links to the publications in PubMed. It has a high AlphaMissense pathogenicity score of 0.994. MOKCa-3D further shows that the mutation is located within the Ras domain of the protein, is found in the GTP binding site, and is adjacent to interface residues that are crucial in protein–protein interactions.
The SAAP structural impact analysis indicates that the mutation causes moderate sidechain clashes. This means that the sidechain atoms of the mutated residue clash with surrounding atoms, which could disrupt the protein’s folding and stability. In fact, the KRAS G12V mutation creates a steric block, locking KRAS in its active state and leading to the constitutive activation of its downstream pathways [28].
In the Mol* visualizer, the mutation is highlighted in yellow for further examination (Fig. S2).
Conclusion and discussion
MOKCa-3D provides comprehensive annotation of somatic missense mutations implicated in different cancer types. The annotations include functional (pathogenicity, GOF, and LOF) and structural (showing how mutation disrupts protein structure) information and empower users to gain a clear understanding of the consequence of the mutation and will help in understanding the molecular mechanism of mutations in tumour growth and progression.
The MOKCa-3D web interface is highly interactive, and responsive and the underlying infrastructure is highly modularized to facilitate future expansion. Planned improvements to the database include the addition of extra annotation data, and the addition of new mutation data.
Acknowledgements
We would like to thank Andrew CR Martin for allowing us to use the SAAP pipeline. We are grateful to Laurence Pearl for critical reading of the manuscript.
Conflicts of interest
None declared.
Funding
University of Sussex.
Data availability
SAAP annotations and GOF/LOF status can be accesed from MOKCa-3D directly. Data derived from a source in the public domain: Missense mutations can be sourced from UniProt: https://www.uniprot.org/. Domain boundaries are derived from InterPro: https://www.ebi.ac.uk/interpro/. PPI data is derived from PIONEER: https://pioneer.yulab.org/. GO terms are derived from: https://geneontology.org/. Data owned by a third party. Tissue type and frequency data is derived from TCGA which can be accessed through the Genomic Data Commons: https://gdc.cancer.gov/. PTMs are derived from Phosphosite: https://www.phosphosite.org/.