Abstract

TIMBAL is a database holding molecules of molecular weight <1200 Daltons that modulate protein–protein interactions. Since its first release, the database has been extended to cover 50 known protein–protein interactions drug targets, including protein complexes that can be stabilized by small molecules with therapeutic effect. The resource contains 14 890 data points for 6896 distinct small molecules. UniProt codes and Protein Data Bank entries are also included.

Database URL:http://www-cryst.bioc.cam.ac.uk/timbal

Introduction

The idea of modulating protein–protein interactions (PPI) with small molecules has been intentionally pursued for more than a decade. The concept is attractive, but there are many challenges still ahead. In the UK, a network was recently created to bring the PPI scientists closer and facilitate collaboration to overcome the many hurdles (http://ppi-net.org). A contribution to these efforts has been to create TIMBAL, a resource that holds known small molecules modulating protein–protein complexes. The first release of the TIMBAL database in 2009 (1) included an analysis of 104 small molecules, 27 of which were structurally characterized with their targets in the Protein Data Bank (PDB) (2). A year later, Bourgeas et al. (3) released the 2P2I database, a hand-curated database of the structures of protein–protein complexes with known inhibitors. Several updates (4, 5) have refined the 2P2I to a structural database dedicated to orthosteric modulation of PPI containing 14 protein–protein complexes, 60 protein–inhibitor complexes, 16 free proteins and 55 small molecule modulators.

To our knowledge, there are no other resources for PPI modulators. The growth of data in the past years makes hand-curated databases a phenomenally time-consuming task. The maintenance of TIMBAL is achieved now through automated searches of the ChEMBL database (6) (currently using ChEMBL_15), and this report is a brief description of the update and its current contents.

Methods

ChEMBL database (6) (https://www.ebi.ac.uk/chembldb) holds bioactivity data for molecules manually extracted from a selection of peer-reviewed journals relevant to drug discovery. Chemical structures are checked and standardized to ensure consistency across the resource before deposition in the database. Assays are classified as ‘binding’ when there is direct interaction between the compound and the target, ‘functional’ when the interaction is indirect or against the whole organism or cell and ‘ADMET’ when there are pharmacokinetic data. Target assignment is checked by curators and a confidence score flagged. A further sub-classification depends on whether the assay is against an isolated in vitro target, a multi-protein complex (or nucleic acids), or not assigned because the assay is cell or tissue based. The database also contains a target dictionary that allows users to browse target components by standard identifiers like UniProt accession code as well as NCBI taxonomy. In addition to a rich interactive web-based interface, ChEMBL is also conveniently downloadable in full in a variety of formats, which has allowed us to use a local copy to derive the TIMBAL update.

Target list

The initial list of 17 known PPI targets has been extended to 50 targets by PPI-Net members and TIMBAL users, and from conference talks and ChEMBL classification. For each target, we have generated a list of reviewed UniProt (7) codes for its orthologs. The codes are used in ChEMBL for searching small molecule data related to these proteins in binding assays where there is confidence that the assay is directly assigned to either a single protein or its homolog (e.g. binding affinity to Bcl-XL by isothermal titration calorimetric assay) or to a protein complex or its homologs (such as p53/MDM2 complex).

Automated update and manual curation

We maintain a small table for manually curated entries that are not available from ChEMBL, e.g. the newly described Mixed Lineage Leukemia (MLL) inhibitors (8) are reported in a journal not fully screened by the ChEMBL curators. A completely automated script updates the database merging the manually curated entries and the data extracted from the local copy of the ChEMBL. Searches against the PDB bring the experimental structures for these targets, including protein–small molecule, protein–protein complexes and unbound proteins. Links to the CREDO database (9) allows the user to explore in detail the atomic interactions of these complexes. These links are matches to the chemical structure of the small molecule and the UniProt identifier of the appropriate target in the PDB entry.

The final step is a check of the contents of the database to ensure that the data reported are binding of small molecules to protein interfaces. Any discrepancy found is reported to the ChEMBL curators and removed from the TIMBAL database.

Thus, TIMBAL is no longer a manually curated database; there is a trade-off between automation and curation. Although every effort has been put in place to avoid noise in the data, it is clear that >9000 data points for the integrins cannot be fully curated. Researchers using TIMBAL are encouraged to report mistakes, comments or improvements.

Allosteric modulators that do not bind to interfacial residues have not been included, as their identification requires dedicated curation, and this is out of the scope of this update. Researchers interested in allosteric modulation are referred to AlloSteric Database (ASD) (10), a manually curated resource with announced updates every 6 months.

Owing to the characteristics of PPI targets, the small molecule term is a generic name to refer to synthetic molecules and small peptides that bind to these interfaces. For example, subnanomolar synthetic inhibitors for Bcl-2/Bcl-XL have been reported with molecular weight >1100 Daltons (11). The small peptides are also kept (up to 10 peptide bonds), as they might be useful for researchers as a tool compounds. In this way, TIMBAL molecules have molecular weight below 1200 Daltons and no more than 10 peptide bonds.

Web resource

Data extracted from ChEMBL and manually curated are stored into a PostgreSQL (http://www.postgresql.org) database. We use SQLAlchemy (http://www.sqlalchemy.org) to generate python objects from the database tables and Flask (http://flask.pocoo.org) to create web pages from these objects. User requests are handed on the fly using Flask generators and direct responses. Bootstrap (http://twitter.github.com/bootstrap) gives the Cascading Style Sheets framework and javascript functionality to create an efficient resource with minimal coding.

Results and Discussion

TIMBAL can be publicly accessed and downloaded at http://www-cryst.bioc.cam.ac.uk/timbal. The schema of the database is presented in Figure 1.

Schema of the database showing all tables and fields.
Figure 1.

Schema of the database showing all tables and fields.

It contains >14 000 data points for ∼7000 small molecules with 50 PPI targets. More than 9000 data entries are for integrins, the cell surface receptors that have been pursued as therapeutic targets for almost two decades (12).

Table 1 summarizes the contents of the database that also holds inactive molecules against PPI targets (7% of the total content), as ChEMBL stores all reported data, including non-active readings.

Table 1.

Summary of the TIMBAL contents

Target nameProtein complexN data pointsN unique SMN papersN prot-sm PDBN total PDBN unique SM in v1
14-3-3a14-3-3/PMA33238
Adenylyl CyclaseaAdenylyl Cyclase dimer C1-C2 domains7 (2)3 (1)3217
Annexin A2Annexin A2/S100-A10164 (22)54 (10)109
ARF1aARF1/SEC7422119
AuxinIAAaAuxinIAA-TIR111118
Bcl-XL and Bcl-2Bcl-2 and Bcl-XL with BAX; BAK and BID1256 (77)645 (71)65167826
Beta-cateninBetaCatenin/Tcf4 and Tcf312 (7)12 (7)40264
BIIIBIII/X11a000013
BRD2BRD2/Ack93 (5)44 (4)71221
BRD4BRD4/NUT109 (2)52 (2)8435
BRDTBRDT/H429 (2)28 (2)414
CD154CD40/CD1541 (1)1 (1)108
CD74CD74/MIF000049
CD80 (B7-1)CD80/CD28 (or CTLA-4)4430104
ClathrinClathrin/adaptor and accessory proteins221218
c-Mycc-Myc/Max1110101
CRM1CRM1/Rev182 (144)59 (51)40232
CyclophilinsCyclophilins261 (37)194 (33)11069
E2E1/E250 (1)44 (1)61304
HIF-1aHIF-1a/p300274 (43)182 (36)20012
IL-2IL-2/IL-2Ra52 (2)48 (2)54196
Immunophilin FKBP1AaFKBP1A/FK506571 (9)540 (9)301044
IntegrinsIntegrins9730 (498)3685 (307)210283
K-RasK-Ras/SOS155159
Keap1Nrf2/Keap1000031
LMO2LMO2/LDB1 or TAL100005
MDM2p53/MDM2320 (52)236 (47)2383416
MDMXp53/MDMX44 (16)40 (16)4115
MaxaMax dimer00008
MLLMLL/Menin221222
Neuropilin-1Neuropilin-1/VEGF-A177 (11)157 (11)6137
PPAR-gammaPPAR-gamma/NRCoA10000235
Plk1(PBD)Plk1(PBD)/PBD substrate221235
Rac1Rac1/GEFs118 (11)76 (11)3028
Rad51Rad51/BRCA234 (4)10 (2)2833
RGS4RGS4/Galpha-o protein111031
RRTF1RRTF1/CBFb000015
S100BS100B/p53191845327
SOD1aSOD1 dimer28 (17)16 (11)52109
STAT3STAT3 dimer42 (7)33 (6)302
STAT5STAT5 dimer195201
Sur-2ESX/Sur-2 (DRIP130)29 (8)9 (4)2011
Tak1Tak1/Tab111107
TNFaTNFa trimer or TNFa/TNFR8731132
TransthyretinaTransthyretin tetramer592 (71)350 (69)1824180
ToxTToxT dimer111011
TubulinaTubulin dimer75 (36)64 (36)9118
UL42UL30(Pol)/UL42 subunits of HSV type 1 DNA polymerase441013
XIAPXIAP/Caspase9 or SMAC (BIR3 domanin)538 (23)312 (18)308385
ZipAZipA/FtsZ242364821
Target nameProtein complexN data pointsN unique SMN papersN prot-sm PDBN total PDBN unique SM in v1
14-3-3a14-3-3/PMA33238
Adenylyl CyclaseaAdenylyl Cyclase dimer C1-C2 domains7 (2)3 (1)3217
Annexin A2Annexin A2/S100-A10164 (22)54 (10)109
ARF1aARF1/SEC7422119
AuxinIAAaAuxinIAA-TIR111118
Bcl-XL and Bcl-2Bcl-2 and Bcl-XL with BAX; BAK and BID1256 (77)645 (71)65167826
Beta-cateninBetaCatenin/Tcf4 and Tcf312 (7)12 (7)40264
BIIIBIII/X11a000013
BRD2BRD2/Ack93 (5)44 (4)71221
BRD4BRD4/NUT109 (2)52 (2)8435
BRDTBRDT/H429 (2)28 (2)414
CD154CD40/CD1541 (1)1 (1)108
CD74CD74/MIF000049
CD80 (B7-1)CD80/CD28 (or CTLA-4)4430104
ClathrinClathrin/adaptor and accessory proteins221218
c-Mycc-Myc/Max1110101
CRM1CRM1/Rev182 (144)59 (51)40232
CyclophilinsCyclophilins261 (37)194 (33)11069
E2E1/E250 (1)44 (1)61304
HIF-1aHIF-1a/p300274 (43)182 (36)20012
IL-2IL-2/IL-2Ra52 (2)48 (2)54196
Immunophilin FKBP1AaFKBP1A/FK506571 (9)540 (9)301044
IntegrinsIntegrins9730 (498)3685 (307)210283
K-RasK-Ras/SOS155159
Keap1Nrf2/Keap1000031
LMO2LMO2/LDB1 or TAL100005
MDM2p53/MDM2320 (52)236 (47)2383416
MDMXp53/MDMX44 (16)40 (16)4115
MaxaMax dimer00008
MLLMLL/Menin221222
Neuropilin-1Neuropilin-1/VEGF-A177 (11)157 (11)6137
PPAR-gammaPPAR-gamma/NRCoA10000235
Plk1(PBD)Plk1(PBD)/PBD substrate221235
Rac1Rac1/GEFs118 (11)76 (11)3028
Rad51Rad51/BRCA234 (4)10 (2)2833
RGS4RGS4/Galpha-o protein111031
RRTF1RRTF1/CBFb000015
S100BS100B/p53191845327
SOD1aSOD1 dimer28 (17)16 (11)52109
STAT3STAT3 dimer42 (7)33 (6)302
STAT5STAT5 dimer195201
Sur-2ESX/Sur-2 (DRIP130)29 (8)9 (4)2011
Tak1Tak1/Tab111107
TNFaTNFa trimer or TNFa/TNFR8731132
TransthyretinaTransthyretin tetramer592 (71)350 (69)1824180
ToxTToxT dimer111011
TubulinaTubulin dimer75 (36)64 (36)9118
UL42UL30(Pol)/UL42 subunits of HSV type 1 DNA polymerase441013
XIAPXIAP/Caspase9 or SMAC (BIR3 domanin)538 (23)312 (18)308385
ZipAZipA/FtsZ242364821

N data points, number of data points for each target; N unique SM, number of distinct small molecules for each target; N papers, number of distinct publications per target; N prot-sm PDB, number of protein–small molecule complexes in the PDB for each target; N total PDB, number of PDB for each target, including protein–protein, protein–small molecule and apo protein structures; N unique SM in v1, For comparison, number of unique small molecules per target that were in previous version of the database.

Numbers in parentheses for data points and unique small molecules refer to inactive molecules.

aSM for the targets are stabilizers of PPI.

Table 1.

Summary of the TIMBAL contents

Target nameProtein complexN data pointsN unique SMN papersN prot-sm PDBN total PDBN unique SM in v1
14-3-3a14-3-3/PMA33238
Adenylyl CyclaseaAdenylyl Cyclase dimer C1-C2 domains7 (2)3 (1)3217
Annexin A2Annexin A2/S100-A10164 (22)54 (10)109
ARF1aARF1/SEC7422119
AuxinIAAaAuxinIAA-TIR111118
Bcl-XL and Bcl-2Bcl-2 and Bcl-XL with BAX; BAK and BID1256 (77)645 (71)65167826
Beta-cateninBetaCatenin/Tcf4 and Tcf312 (7)12 (7)40264
BIIIBIII/X11a000013
BRD2BRD2/Ack93 (5)44 (4)71221
BRD4BRD4/NUT109 (2)52 (2)8435
BRDTBRDT/H429 (2)28 (2)414
CD154CD40/CD1541 (1)1 (1)108
CD74CD74/MIF000049
CD80 (B7-1)CD80/CD28 (or CTLA-4)4430104
ClathrinClathrin/adaptor and accessory proteins221218
c-Mycc-Myc/Max1110101
CRM1CRM1/Rev182 (144)59 (51)40232
CyclophilinsCyclophilins261 (37)194 (33)11069
E2E1/E250 (1)44 (1)61304
HIF-1aHIF-1a/p300274 (43)182 (36)20012
IL-2IL-2/IL-2Ra52 (2)48 (2)54196
Immunophilin FKBP1AaFKBP1A/FK506571 (9)540 (9)301044
IntegrinsIntegrins9730 (498)3685 (307)210283
K-RasK-Ras/SOS155159
Keap1Nrf2/Keap1000031
LMO2LMO2/LDB1 or TAL100005
MDM2p53/MDM2320 (52)236 (47)2383416
MDMXp53/MDMX44 (16)40 (16)4115
MaxaMax dimer00008
MLLMLL/Menin221222
Neuropilin-1Neuropilin-1/VEGF-A177 (11)157 (11)6137
PPAR-gammaPPAR-gamma/NRCoA10000235
Plk1(PBD)Plk1(PBD)/PBD substrate221235
Rac1Rac1/GEFs118 (11)76 (11)3028
Rad51Rad51/BRCA234 (4)10 (2)2833
RGS4RGS4/Galpha-o protein111031
RRTF1RRTF1/CBFb000015
S100BS100B/p53191845327
SOD1aSOD1 dimer28 (17)16 (11)52109
STAT3STAT3 dimer42 (7)33 (6)302
STAT5STAT5 dimer195201
Sur-2ESX/Sur-2 (DRIP130)29 (8)9 (4)2011
Tak1Tak1/Tab111107
TNFaTNFa trimer or TNFa/TNFR8731132
TransthyretinaTransthyretin tetramer592 (71)350 (69)1824180
ToxTToxT dimer111011
TubulinaTubulin dimer75 (36)64 (36)9118
UL42UL30(Pol)/UL42 subunits of HSV type 1 DNA polymerase441013
XIAPXIAP/Caspase9 or SMAC (BIR3 domanin)538 (23)312 (18)308385
ZipAZipA/FtsZ242364821
Target nameProtein complexN data pointsN unique SMN papersN prot-sm PDBN total PDBN unique SM in v1
14-3-3a14-3-3/PMA33238
Adenylyl CyclaseaAdenylyl Cyclase dimer C1-C2 domains7 (2)3 (1)3217
Annexin A2Annexin A2/S100-A10164 (22)54 (10)109
ARF1aARF1/SEC7422119
AuxinIAAaAuxinIAA-TIR111118
Bcl-XL and Bcl-2Bcl-2 and Bcl-XL with BAX; BAK and BID1256 (77)645 (71)65167826
Beta-cateninBetaCatenin/Tcf4 and Tcf312 (7)12 (7)40264
BIIIBIII/X11a000013
BRD2BRD2/Ack93 (5)44 (4)71221
BRD4BRD4/NUT109 (2)52 (2)8435
BRDTBRDT/H429 (2)28 (2)414
CD154CD40/CD1541 (1)1 (1)108
CD74CD74/MIF000049
CD80 (B7-1)CD80/CD28 (or CTLA-4)4430104
ClathrinClathrin/adaptor and accessory proteins221218
c-Mycc-Myc/Max1110101
CRM1CRM1/Rev182 (144)59 (51)40232
CyclophilinsCyclophilins261 (37)194 (33)11069
E2E1/E250 (1)44 (1)61304
HIF-1aHIF-1a/p300274 (43)182 (36)20012
IL-2IL-2/IL-2Ra52 (2)48 (2)54196
Immunophilin FKBP1AaFKBP1A/FK506571 (9)540 (9)301044
IntegrinsIntegrins9730 (498)3685 (307)210283
K-RasK-Ras/SOS155159
Keap1Nrf2/Keap1000031
LMO2LMO2/LDB1 or TAL100005
MDM2p53/MDM2320 (52)236 (47)2383416
MDMXp53/MDMX44 (16)40 (16)4115
MaxaMax dimer00008
MLLMLL/Menin221222
Neuropilin-1Neuropilin-1/VEGF-A177 (11)157 (11)6137
PPAR-gammaPPAR-gamma/NRCoA10000235
Plk1(PBD)Plk1(PBD)/PBD substrate221235
Rac1Rac1/GEFs118 (11)76 (11)3028
Rad51Rad51/BRCA234 (4)10 (2)2833
RGS4RGS4/Galpha-o protein111031
RRTF1RRTF1/CBFb000015
S100BS100B/p53191845327
SOD1aSOD1 dimer28 (17)16 (11)52109
STAT3STAT3 dimer42 (7)33 (6)302
STAT5STAT5 dimer195201
Sur-2ESX/Sur-2 (DRIP130)29 (8)9 (4)2011
Tak1Tak1/Tab111107
TNFaTNFa trimer or TNFa/TNFR8731132
TransthyretinaTransthyretin tetramer592 (71)350 (69)1824180
ToxTToxT dimer111011
TubulinaTubulin dimer75 (36)64 (36)9118
UL42UL30(Pol)/UL42 subunits of HSV type 1 DNA polymerase441013
XIAPXIAP/Caspase9 or SMAC (BIR3 domanin)538 (23)312 (18)308385
ZipAZipA/FtsZ242364821

N data points, number of data points for each target; N unique SM, number of distinct small molecules for each target; N papers, number of distinct publications per target; N prot-sm PDB, number of protein–small molecule complexes in the PDB for each target; N total PDB, number of PDB for each target, including protein–protein, protein–small molecule and apo protein structures; N unique SM in v1, For comparison, number of unique small molecules per target that were in previous version of the database.

Numbers in parentheses for data points and unique small molecules refer to inactive molecules.

aSM for the targets are stabilizers of PPI.

TIMBAL also holds small molecules that stabilize protein complexes with possible therapeutic effect (13), such as stabilizers of transthyretin oligomer that inhibit harmful amyloid fibril formation.

The resource will be updated with each new release of the ChEMBL database. Since its first release, TIMBAL has grown not only in terms of number of entries but also in terms of content, including now stabilizers and inactive molecules. It is our aim that this database helps in the quest of identifying small molecules binding to protein interfaces.

Acknowledgements

The authors thank Colin Groom for his encouragement to include stabilisers of protein complexes as well as their inhibitors; John Overington and Yvonne Light for their support with the ChEMBL database; Adrian Schreyer for maintenance of the local copy of ChEMBL; Will Pitt, Colin Groom, Adrian Schreyer, Bernardo Ochoa, Richard Bickerton, Marko Hyvonen, Mike Hann, Mark Searcey, Terry Rabbitts, Ian Hardcastle, Alexander Metz, John Karanicolas, Nikolay Todorov, David Bettinson, Hsin-Wei Wang, Alvaro Olivera-Nappa and Alessandro Contini for their contribution to the target list and testing the resource.

Funding

This work was supported by the University of Cambridge, BBSRC and UCB. Funding for open access charge: University of Cambridge.

Conflict of interest. None declared.

References

1
Higueruelo
AP
Schreyer
A
Bickerton
GRJ
et al. 
Atomic interactions and profile of small molecules disrupting protein-protein interfaces: the TIMBAL database
Chem. Biol. Drug Des.
2009
, vol. 
74
 (pg. 
457
-
467
)
2
Berman
HM
Westbrook
J
Feng
Z
et al. 
The protein data bank
Nucleic Acids Res.
2000
, vol. 
28
 (pg. 
235
-
242
)
3
Bourgeas
Rl
Basse
MJ
Morelli
X
et al. 
Atomic analysis of protein-protein interfaces with known inhibitors: The 2P2I database
PLoS One
2010
, vol. 
5
 pg. 
e9598
 
4
Morelli
X
Bourgeas
R
Roche
P
Chemical and structural lessons from recent successes in protein-protein interaction inhibition (2P2I)
Curr. Opin. Chem. Biol.
2011
, vol. 
15
 (pg. 
475
-
481
)
5
Basse
MJ
Betzi
S
Bourgeas
R
et al. 
2P2Idb: a structural database dedicated to orthosteric modulation of protein-protein interactions
Nucleic Acids Res.
2013
, vol. 
41
 (pg. 
D824
-
D827
)
6
Gaulton
A
Bellis
LJ
Bento
AP
et al. 
ChEMBL: a large-scale bioactivity database for drug discovery
Nucleic Acids Res.
2011
, vol. 
40
 (pg. 
D1100
-
D1107
)
7
The UniProt
C
Ongoing and future developments at the Universal Protein Resource
Nucleic Acids Res.
2011
, vol. 
39
 (pg. 
D214
-
D219
)
8
Shi
A
Murai
MJ
He
S
et al. 
Structural insights into inhibition of the bivalent menin-MLL interaction by small molecules in leukemia
Blood
2012
, vol. 
120
 (pg. 
4461
-
4469
)
9
Schreyer
A
Blundell
T
CREDO: a protein-ligand interaction database for drug discovery
Chem. Biol. Drug Des.
2009
, vol. 
73
 (pg. 
157
-
167
)
10
Huang
Z
Zhu
L
Cao
Y
et al. 
ASD: a comprehensive database of allosteric proteins and modulators
Nucleic Acids Res.
2011
, vol. 
39
 (pg. 
D663
-
D669
)
11
Zhou
H
Chen
J
Meagher
JL
et al. 
Design of Bcl-2 and Bcl-xL inhibitors with subnanomolar binding affinities based upon a New Scaffold
J. Med. Chem.
2012
, vol. 
55
 (pg. 
4664
-
4682
)
12
Fry
DC
Protein-protein interactions as targets for small molecule drug discovery
Biopolymers
2006
, vol. 
84
 (pg. 
535
-
552
)
13
Thiel
P
Kaiser
M
Ottmann
C
Small-molecule stabilization of protein-protein interactions: an underestimated concept in drug discovery?
Angew. Chem. Int. Ed. Engl.
2012
, vol. 
51
 (pg. 
2012
-
2028
)

Author notes

Citation details: Higueruelo,A.P., Jubb,H. and Blundell,T.L. TIMBAL v2: update of a database holding small molecules modulating protein–protein interactions. Database (2013) Vol. 2013: article ID bat039; doi:10.1093/database/bat039

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.