A large and accurate collection of peptidase cleavages in the MEROPS database

Peptidase preference by amino acid type

Amino acid type	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Acidic	5	7	5	24	5	4	2	5
Basic	11	8	13	67	11	9	5	2
Aliphatic	22	24	32	18	56	36	23	7
Aromatic	2	2	8	34	23	7	1	0
Small	35	34	31	58	65	22	26	20
Total	75	75	89	201	160	78	57	34

Amino acid type	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Acidic	5	7	5	24	5	4	2	5
Basic	11	8	13	67	11	9	5	2
Aliphatic	22	24	32	18	56	36	23	7
Aromatic	2	2	8	34	23	7	1	0
Small	35	34	31	58	65	22	26	20
Total	75	75	89	201	160	78	57	34

The number of peptidases with a preference for a particular amino acid type for each binding pocket S4–S4′ is shown, where 40% or more of substrates have an amino acid of that type at that position. Only those 312 peptidases with at least 10 known cleavages are included. There are 276 peptidases that show a preference, of which 18 show a preference at all eight sites, 16 for seven sites, 12 for six sites, 17 for five sites, 33 for four sites, 46 for three sites, 64 for two sites and 70 for one site.

Table 1.

Peptidase preference by amino acid type

Amino acid type	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Acidic	5	7	5	24	5	4	2	5
Basic	11	8	13	67	11	9	5	2
Aliphatic	22	24	32	18	56	36	23	7
Aromatic	2	2	8	34	23	7	1	0
Small	35	34	31	58	65	22	26	20
Total	75	75	89	201	160	78	57	34

Amino acid type	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Acidic	5	7	5	24	5	4	2	5
Basic	11	8	13	67	11	9	5	2
Aliphatic	22	24	32	18	56	36	23	7
Aromatic	2	2	8	34	23	7	1	0
Small	35	34	31	58	65	22	26	20
Total	75	75	89	201	160	78	57	34

The preference for individual amino acids is shown in Table 2. It is clear from the table that cysteine is an unwelcome amino acid near a cleavage site. Only the peroxisomal transit peptide peptidase shows a preference for cysteine binding to the S2 and S1 pockets. Tryptophan is also rare around cleavage sites, with only tryptophanyl aminopeptidase (M9A.008, a preference in the S1 pocket) and mast cell peptidase 4 (Rattus) (S01.005, in the S2 pocket) showing a preference; however, this may have more to do with the fact that tryptophan is the rarest of the amino acids. Asparagine is also very rare in the proximity of a cleavage site, one of the few examples being the specialist peptidase legumain (C13.006) which only cleaves asparaginyl bonds (33). Histidine is also a rare preference, with only three peptidases showing any preference for it, namely chymosin (A01.006; S4), carnosine dipeptidase I (M20.006; S1′) and Xaa-methyl-His dipeptidase (M20.013; S1′). Methionine is also not preferred by most peptidases, exceptions being methionyl aminopeptidases (M24.001, M24.002), where the preference is as expected for methionine binding in the S1 pocket, some members of the peptidase Clp family (S14) and the unsequenced Met-Xaa dipeptidase (M9B.004). The gpr peptidase (A25.001) shows a preference for Met binding to S4. The commonest preference is for arginine binding to the S1 pocket, which occurs in over fifty peptidases. However, arginine is relatively rare outside the P1 position. There are peptidases that show a preference for Gly, Pro and Val for every binding pocket in the range S4–S4′. Peptidases showing unique preferences are listed in Table 3.

Table 2.

Number of peptidases with an amino acid preference

Amino acid	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Ala	6	8	5	10	8	5	1
Cys			1	1
Asp	3			16	2			3
Glu	1	7	5	8	1		1	2
Phe	2	1	5	12	10	2
Gly	3	1	11	17	12	2	6	5
His	1				2
Ile	2				1	8	1
Lys	2	4	8	6	6	2	4
Leu	11	4	9	12	24	4	7
Met	1			6
Asn			9	1
Pro	2	8	5	9	9	4	1	4
Gln		9	1	5	1			10
Arg	8	1	2	52	5	3		1
Ser	8		1		8	3	2	1
Thr			3			1	1	1
Val	1	6	1	2	5	6	11	5
Trp			1	1
Tyr				11	1	5

Amino acid	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Ala	6	8	5	10	8	5	1
Cys			1	1
Asp	3			16	2			3
Glu	1	7	5	8	1		1	2
Phe	2	1	5	12	10	2
Gly	3	1	11	17	12	2	6	5
His	1				2
Ile	2				1	8	1
Lys	2	4	8	6	6	2	4
Leu	11	4	9	12	24	4	7
Met	1			6
Asn			9	1
Pro	2	8	5	9	9	4	1	4
Gln		9	1	5	1			10
Arg	8	1	2	52	5	3		1
Ser	8		1		8	3	2	1
Thr			3			1	1	1
Val	1	6	1	2	5	6	11	5
Trp			1	1
Tyr				11	1	5

The number of peptidases showing a preference for an amino acid in a binding site is shown. Only those 312 peptidases with 10 or more known substrate cleavages are included. An amino acid must occur at that position in 40% or more of substrates. Therefore, it is possible for two amino acids to be preferred in any one binding pocket, as is the case for trypsin 1 where there is a preference for either Lys (59% of substrates) or Arg (41%) in S1. There are 202 peptidases that show a preference, of which 13 show a preference at all eight sites, 13 for seven sites, five for six sites, three for five sites, eight for four sites, 23 for three sites, 49 for two sites and 88 for one site.

Table 2.

Number of peptidases with an amino acid preference

Amino acid	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Ala	6	8	5	10	8	5	1
Cys			1	1
Asp	3			16	2			3
Glu	1	7	5	8	1		1	2
Phe	2	1	5	12	10	2
Gly	3	1	11	17	12	2	6	5
His	1				2
Ile	2				1	8	1
Lys	2	4	8	6	6	2	4
Leu	11	4	9	12	24	4	7
Met	1			6
Asn			9	1
Pro	2	8	5	9	9	4	1	4
Gln		9	1	5	1			10
Arg	8	1	2	52	5	3		1
Ser	8		1		8	3	2	1
Thr			3			1	1	1
Val	1	6	1	2	5	6	11	5
Trp			1	1
Tyr				11	1	5

Amino acid	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Ala	6	8	5	10	8	5	1
Cys			1	1
Asp	3			16	2			3
Glu	1	7	5	8	1		1	2
Phe	2	1	5	12	10	2
Gly	3	1	11	17	12	2	6	5
His	1				2
Ile	2				1	8	1
Lys	2	4	8	6	6	2	4
Leu	11	4	9	12	24	4	7
Met	1			6
Asn			9	1
Pro	2	8	5	9	9	4	1	4
Gln		9	1	5	1			10
Arg	8	1	2	52	5	3		1
Ser	8		1		8	3	2	1
Thr			3			1	1	1
Val	1	6	1	2	5	6	11	5
Trp			1	1
Tyr				11	1	5

Table 3.

Peptidases showing unique preferences

Peptidase name	MEROPS ID	Total substrate cleavages	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Chymosin	A01.006	15	His		Ser				Ile
Feline immunodefiency virus retropepsin	A02.007	28			Val
Walleye dermal sarcoma virus retropepsin	A02.063	27			Gln
PibD peptidase	A24.017	10						Thr
gpr peptidase	A25.001	32	Met				Ile			Glu
Cruzipain	C01.075	49								Arg
Coxsackievirus-type picornain 3C	C03.011	10							Pro
Ubiquitinyl hydrolase-L3	C12.003	14		Arg
Legumain	C13.004	30				Asn
Sapovirus 3C-like peptidase	C24.003	10							Thr
Separase (yeast-type)	C50.001	12	Glu
Peptidyl-dipeptidase Acer	M02.002	10		Phe
Bacterial collagenase H	M09.003	18							Ala
PrtA peptidase (Photorhabdus-type)	M10.063	23							Glu
ADAM8 peptidase	M12.208	22					Gln
Tryptophanyl aminopeptidase (Trichosporon cutaneum)	M9A.008	15				Trp
Carboxypeptidase G3	M9E.007	12					Glu
Mast cell peptidase 4 (Rattus)	S01.005	33			Trp
Kumamolisin	S53.004	10	Val	Gly			Tyr
Peroxisomal transit peptide peptidase	U9G.062	14			Cys	Cys

Peptidase name	MEROPS ID	Total substrate cleavages	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Chymosin	A01.006	15	His		Ser				Ile
Feline immunodefiency virus retropepsin	A02.007	28			Val
Walleye dermal sarcoma virus retropepsin	A02.063	27			Gln
PibD peptidase	A24.017	10						Thr
gpr peptidase	A25.001	32	Met				Ile			Glu
Cruzipain	C01.075	49								Arg
Coxsackievirus-type picornain 3C	C03.011	10							Pro
Ubiquitinyl hydrolase-L3	C12.003	14		Arg
Legumain	C13.004	30				Asn
Sapovirus 3C-like peptidase	C24.003	10							Thr
Separase (yeast-type)	C50.001	12	Glu
Peptidyl-dipeptidase Acer	M02.002	10		Phe
Bacterial collagenase H	M09.003	18							Ala
PrtA peptidase (Photorhabdus-type)	M10.063	23							Glu
ADAM8 peptidase	M12.208	22					Gln
Tryptophanyl aminopeptidase (Trichosporon cutaneum)	M9A.008	15				Trp
Carboxypeptidase G3	M9E.007	12					Glu
Mast cell peptidase 4 (Rattus)	S01.005	33			Trp
Kumamolisin	S53.004	10	Val	Gly			Tyr
Peroxisomal transit peptide peptidase	U9G.062	14			Cys	Cys

Table 3.

Open in new tab Download slide

Peptidases showing unique preferences

Peptidase name	MEROPS ID	Total substrate cleavages	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Chymosin	A01.006	15	His		Ser				Ile
Feline immunodefiency virus retropepsin	A02.007	28			Val
Walleye dermal sarcoma virus retropepsin	A02.063	27			Gln
PibD peptidase	A24.017	10						Thr
gpr peptidase	A25.001	32	Met				Ile			Glu
Cruzipain	C01.075	49								Arg
Coxsackievirus-type picornain 3C	C03.011	10							Pro
Ubiquitinyl hydrolase-L3	C12.003	14		Arg
Legumain	C13.004	30				Asn
Sapovirus 3C-like peptidase	C24.003	10							Thr
Separase (yeast-type)	C50.001	12	Glu
Peptidyl-dipeptidase Acer	M02.002	10		Phe
Bacterial collagenase H	M09.003	18							Ala
PrtA peptidase (Photorhabdus-type)	M10.063	23							Glu
ADAM8 peptidase	M12.208	22					Gln
Tryptophanyl aminopeptidase (Trichosporon cutaneum)	M9A.008	15				Trp
Carboxypeptidase G3	M9E.007	12					Glu
Mast cell peptidase 4 (Rattus)	S01.005	33			Trp
Kumamolisin	S53.004	10	Val	Gly			Tyr
Peroxisomal transit peptide peptidase	U9G.062	14			Cys	Cys

Peptidase name	MEROPS ID	Total substrate cleavages	S4	S3	S2	S1	S1′	S2′	S3′	S4′
Chymosin	A01.006	15	His		Ser				Ile
Feline immunodefiency virus retropepsin	A02.007	28			Val
Walleye dermal sarcoma virus retropepsin	A02.063	27			Gln
PibD peptidase	A24.017	10						Thr
gpr peptidase	A25.001	32	Met				Ile			Glu
Cruzipain	C01.075	49								Arg
Coxsackievirus-type picornain 3C	C03.011	10							Pro
Ubiquitinyl hydrolase-L3	C12.003	14		Arg
Legumain	C13.004	30				Asn
Sapovirus 3C-like peptidase	C24.003	10							Thr
Separase (yeast-type)	C50.001	12	Glu
Peptidyl-dipeptidase Acer	M02.002	10		Phe
Bacterial collagenase H	M09.003	18							Ala
PrtA peptidase (Photorhabdus-type)	M10.063	23							Glu
ADAM8 peptidase	M12.208	22					Gln
Tryptophanyl aminopeptidase (Trichosporon cutaneum)	M9A.008	15				Trp
Carboxypeptidase G3	M9E.007	12					Glu
Mast cell peptidase 4 (Rattus)	S01.005	33			Trp
Kumamolisin	S53.004	10	Val	Gly			Tyr
Peroxisomal transit peptide peptidase	U9G.062	14			Cys	Cys

Despite there being a large number of substrates collected, the specificity of some peptidases can not be explained in terms of S4–S4′ preferences. These peptidases include (MEROPS identifier and number of substrate cleavages in brackets): cathepsin D (A01.009; 145), cathepsin E (A01.010; 64), nemepsin-2 (A01.068; 127), papain (C01.001; 40), cathepsin X (C01.013; 24), cathepsin L (C01.032; 85), cathepsin B (C01.060; 82), aspergilloglutamic peptidase (G01.002; 37), mirabilysin (M10.057; 32), neprilysin (M13.001; 83), endothelin-converting enzyme 1 (M13.002; 27), MEP peptidase (M13.011; 43), pitrilysin (M16.001; 23), insulysin (M16.002; 31), eupitrilysin (M16.009; 54), aminopeptidase Ap1 (M28.002; 66), plasma glutamate carboxypeptidase (M28.014; 33), penicillolysin (M35.001; 20), deuterolysin (M35.002; 22), FtsH peptidase (M41.001; 24), dipeptidyl-peptidase III (M49.001; 24), glycyl aminopeptidase (M61.001; 26), chymotrypsin C (S01.157; 20), kallikrein 1 (S01.160; 25), subtilisin Carlsberg (S08.001; 33), high alkaline protease (Alkaliphilus transvaalensis) (S08.028; 28), peptidase K (S08.054; 43) and signalase (animal) 21 kDa component (S26.010; 363).

Displays on the MEROPS website

Specificity logos and frequency matrices present the user with a visual representation of peptidase specificity. An example specificity logo is shown in Figure 2. From the logo and the cleavage pattern string it is clear that caspase-3 has an absolute requirement for Asp in the S1 pocket (position 4, only one cleavage after Glu is known) and a preference for Asp in S4. There are minor preferences for Glu in S3 and Gly or Ser in S1′.

Figure 2.

The specificity logo and frequency matrix showing the substrate specificity of caspase-3. The figure is taken from a page in the MEROPS database. The logo is shown at the top with the frequency matrix below. The cleavage pattern is a textual representation of the logo, where the scissile bond is shown as a red cross, and the binding pockets separated by forward slashes. The preferred residue is shown in uppercase if the preference is strong. The number of cleavages on which these data are based is given in parentheses. For the logo, the binding pockets S4–S4′ are shown along the x-axis, where 1 is S4, 2 is S3, etc. The bit score is shown on the y-axis. The height of the letter is proportional to the bit score. The letters are coloured to indicate amino acid properties: blue for basic, red for acidic, black for hydrophobic and green for any other. In the frequency matrix below the logo, each cell shows the number of substrates with an amino acid occupying one of the positions P4–P4′. Cells in the matrix are highlighted in shades of green where the greater the preference, i.e. the more often an amino acid occurs at that position, the brighter the shade. Cells are highlighted in black if the amino acid is unknown at that position for any substrate.

While the logo indicates which amino acids are acceptable in each position, it does not indicate which amino acids are unobserved. These are shown in the frequency matrix, and an example is also shown in Figure 2 for caspase-3. In this example Asp occurs in the P1 position in all 413 substrates, Asp occurs in P4 in almost half the substrates, while Glu occurs in P3 in 27% of substrates. Note that in this frequency matrix every amino acid occurs in positions P4–P2 and P1′–P4′, but tryptophan is observed only once in P4, P2, P1′ and P4′. This gives an indication of the minimum number of substrate cleavages that has to be collected for a peptidase before definite conclusions about specificity in all binding pockets can be drawn.

A substrate alignment is shown in Figure 3. The density of residues highlighted in black is high, implying that this cleavage position is very poorly conserved and thus may not be physiologically relevant.

Figure 3.

Alignment of the protein sequences of orthologues of the mouse BID protein showing known peptidase cleavages. The alignment is highlighted to show conservation of residues around the cleavage of BID by cathepsin H (C01.040) at residue 12. The sequence where the cleavage is known is highlighted in green and residues are numbered according to this sequence (inserts are indicated by letters). The rows beneath the residue numbers show the MEROPS identifier of each peptidase known to cleave this substrate. Arrows indicate the residue range of the fragment used in the experiment, and cleavage positions are indicated by the ‘+’ symbol. Clicking on a MEROPS identifier takes the user to the relevant summary page. Clicking on a ‘+’ symbol causes the alignment to be redrawn with residues P4–P4′ highlighted for that particular cleavage. Residues either side of the cleavage site are highlighted in pink if conserved with the equivalent residue in the sequence where the cleavage is known. A residue is highlighted in orange if it is not conserved but is known to occur in the same binding pocket in another cathepsin H substrate. A residue is shown as white on black if it is not conserved and is not known to occur in the same peptidase substrate binding site in any other substrate.

Open in new tab Download slide

Substrate cleavages that are not evolutionarily conserved

Protein sequence alignments were constructed for every substrate where the cleavage had been assumed in the literature to be of physiological significance. The total number of alignments generated was 3141. A selection of cleavage sites which were not conserved in all homologues included in the same UniRef50 database entry are listed in Table 4. Only those cleavages by peptidases with at least 20 known substrates are included.

Table 4.

Assumed physiological cleavages that are not conserved in terms of peptidase substrate binding

Substrate	UniProt accession	P1	Peptidase [MEROPS ID] (total substrates)	Replacements	Possible cause	Ref.
Serine protease HTRA2, mitochondrial	O43464	211	HtrA2 peptidase [S01.278] (56)	VRLLSGDT (5)		(37)
				–-P–– (4)	g
Cytochrome C	P00022	1	mitochondrial methionyl aminopeptidase [M24.028] (131)	MGDVE (35)		(38)
				-C–- (1)	g
Coagulation factor XIII A chain	P00488	38	thrombin [S01.217] (169)	VVPRGVNL (22)		(39)
	–-L–– (1)	g
Insulin-1	P01325	87	proprotein convertase 2 [S08.073] (59)	RQKRGIVD (36)		(40)
				–WH-W-W (1)	a
				–A-X–R (1)	d
Collagen alpha-2(I) chain	P02465	870	cathepsin D [A01.009] (145)	APGFLGLP (15)		(41)
				–-I–– (21)	h
Collagen alpha-2(I) chain	P02465	863	matrix metallopeptidase−1 [M10.001] (70)	GPQGLLGA (28)		(42)
				-T––– (8)	h
Collagen alpha-2(I) chain	P02465	863	matrix metallopeptidase−8 [M10.002] (87)	GPQGLLGA (28)		(42)
				-T–P–- (8)	h
Platelet-derived growth factor subunit A	P04085	86	Furin [S08.071] (116)	RRKRSIEE (38)		(43)
G-LT–– (1)	b
				L–X–– (1)	d
Collagen alpha-2(IV) chain	P08572	1077	kallikrein-related peptidase 14 [S01.029] (49)	APGRAGLY (6)		(44)
				–-S–– (7)	a
				–-L–– (5)	a
				–-A–– (2)	a
				–-I–– (1)	a, g
				–-V–– (3)
Collagen alpha-2(IV) chain	P08572	1109	kallikrein-related peptidase 14 [S01.029] (49)	KGERGTTG (12)		(44)
				–QP-E– (7)	a
				––-E– (2)	a
				–-L–– (2)	a
				–-V–– (1)	a
Insulin-like growth factor-binding protein 1	P08833	165	Matriptase [S01.302] (26)	KALHVTNI (2)		(45)
	–D-N–- (1)	b
				-SXXXDD- (1)	d
				–V––- (2)	h
				–-E–D- (3)	h
Acyl-CoA thioesterase I	P0ADA1	26	Signal peptidase I [S26.001] (294)	RAAAADTL (19)		(46)
				X–––- (1)	d
Protein ygiW	P0ADU5	20	Signal peptidase I [S26.001] (294)	PVMAAEQG (10)		(47)
				––-X– (1)	d
Chymotrypsin inhibitor 3	P10822	24	Signalase (animal) 21 kDa component [S26.010) (363)	SSTADDDL (4)		(48)
				X–––- (7)	d
				–-M–– (1)	h
Plastocyanin minor isoform, chloroplastic	P11490	72	Thylakoidal processing peptidase [S26.008] (52)	NAMAMEVL (20)		(49)
	––Q–- (2)	h
				–-D–– (1)	g
50S ribosomal protein L7Ae	P12743	1	Methionyl aminopeptidase 2 [M24.002] (130)	MPVYV (2)		(50)
				KKMA–– (1)	d
				SKDK–– (8)	d
				MAR–- (1)	d
Beta-crystallin B3	P19141	4	Calpain-1 [C02.001] (101)	MAEQHSTP (10)		(51)
				XXXXXXXX (23)	a, d
Beta-crystallin B3	P19141	10	Calpain-1 [C02.001] (101)	TPEQAAAG (10)		(51)
				XXXXXXXX (23)	a, d
1-phosphatidylinositol-4,5-bisphosphate phosphodiesterase gamma-1	P19174	770	Caspase-7 [C14.004] (112)	AEPDYGAL (20)		(52)
	T–––- (1)	g
Mimecan	P20774	219	ADAMTS4 peptidase [M12.221] (57)	TFLYLDHN (26)		(53)
				-H––– (3)	h
Mimecan	P20774	234	ADAMTS4 peptidase [M12.221] (57)	NLPESLRV (23)		(53)
				X–––- (6)	d
Trypsin inhibitor 2	P26780	30	Signalase (animal) 21 kDa component [S26.010] (363)	IKAQDSEC (7)		(54)
				–-H–– (2)	a
60S ribosomal protein L10	P27635	180	Granzyme B (Homo sapiens)-type) [S01.010] (348)	NADEFEDM (36)		(55)
				R–––- (2)
Chitinase 2	P29027	22	Signalase (animal) 21 kDa component [S26.010] (363)	GVQAAWSS (2)		(56)
				XX––– (1)	a, d
Alpha-synuclein	P37840	122	Calpain-1	DPDNEAYE (34)		(57)
			[C02.001] (101)	–-D–– (1)	g
				–X––- (2)	b, d
Cathepsin E	P43159	53	Cathepsin E [A01.010] (64)	KVDMVQYT (14)		(35)
			-Y––– (11)	a
				-F––– (2)	h
				-H––– (3)	a
				–-G–– (2)	a
				–-T–– (2)	h
				––H–- (5)	a
				–––H- (1)	g
40S ribosomal protein S25	P62852	51	Granzyme B,	LFDKATYD (9)		(55)
			rodent-type [S01.136] (231)	–XXXXX- (2)	b, d
Hemoglobin subunit alpha	P69905	37	Cathepsin D [A01.009] (145)	FLSFPTTK (42)		(34)
				––-W– (1)	h
Hemoglobin subunit alpha	P69905	109	Cathepsin D [A01.009] (145)	LLVTLAAH (36)		(34)
				–––C- (6)	h
Hemoglobin subunit alpha	P69905	110	Cathepsin D [A01.009] (145)	LVTLAAHL (36)		(34)
				––-C– (6)	h
ABC transporter periplasmic-binding protein yphF	P77269	26	Signal peptidase I [S26.001] (294)	FARAAEKE (22)		(47)
	–-T–– (1)	g
Tyrosine-protein phosphatase non-receptor type 18	Q61152	424	Caspase-1 [C14.001] (60)	EVTDGAQT (4)		(58)
	–-G–– (4)	h
				––R–- (1)	g
Cartilage intermediate layer protein 2	Q8IUL8	810	ADAMTS5 peptidase [M12.225] (38)	ALVTATLG (14)		(53)
	-H–-N– (1)	h
				––-N– (2)	h
				–––M- (2)	h
Cartilage intermediate layer protein 2	Q8IUL8	811	ADAMTS5 peptidase [M12.225] (38)	LVTATLGG (14)		(53)
	H–––- (1)	g
				Y–––- (5)	h
				Y–-IM– (2)	h
Cartilage intermediate layer protein 2	Q8IUL8	813	ADAMTS5 peptidase [M12.225] (38)	TATLGGEE (12)		(53)
	M–––- (3)	h
				–S––- (5)	h
Cartilage intermediate layer protein 2	Q8IUL8	830	ADAMTS5 peptidase [M12.225] (38)	PLPATVGV (16)		(53)
	I–––- (1)	h
				M–-I–- (2)	h
				-H––– (1)	g
Cartilage intermediate layer protein 2	Q8IUL8	832	ADAMTS5 peptidase [M12.225] (38)	PATVGVTQ (13)		(53)
	–-I–– (6)	h
				XX––– (1)	d
Probable FKBP-type peptidyl-prolyl cis-trans isomerase 1, chloroplastic	Q9LM71	71	Thylakoidal processing peptidase [S26.008] (52)	SSEARERR (4)		(49)
	–-G–– (1)	g
	XXXXXXXX (15)	a, d

Substrate	UniProt accession	P1	Peptidase [MEROPS ID] (total substrates)	Replacements	Possible cause	Ref.
Serine protease HTRA2, mitochondrial	O43464	211	HtrA2 peptidase [S01.278] (56)	VRLLSGDT (5)		(37)
				–-P–– (4)	g
Cytochrome C	P00022	1	mitochondrial methionyl aminopeptidase [M24.028] (131)	MGDVE (35)		(38)
				-C–- (1)	g
Coagulation factor XIII A chain	P00488	38	thrombin [S01.217] (169)	VVPRGVNL (22)		(39)
	–-L–– (1)	g
Insulin-1	P01325	87	proprotein convertase 2 [S08.073] (59)	RQKRGIVD (36)		(40)
				–WH-W-W (1)	a
				–A-X–R (1)	d
Collagen alpha-2(I) chain	P02465	870	cathepsin D [A01.009] (145)	APGFLGLP (15)		(41)
				–-I–– (21)	h
Collagen alpha-2(I) chain	P02465	863	matrix metallopeptidase−1 [M10.001] (70)	GPQGLLGA (28)		(42)
				-T––– (8)	h
Collagen alpha-2(I) chain	P02465	863	matrix metallopeptidase−8 [M10.002] (87)	GPQGLLGA (28)		(42)
				-T–P–- (8)	h
Platelet-derived growth factor subunit A	P04085	86	Furin [S08.071] (116)	RRKRSIEE (38)		(43)
G-LT–– (1)	b
				L–X–– (1)	d
Collagen alpha-2(IV) chain	P08572	1077	kallikrein-related peptidase 14 [S01.029] (49)	APGRAGLY (6)		(44)
				–-S–– (7)	a
				–-L–– (5)	a
				–-A–– (2)	a
				–-I–– (1)	a, g
				–-V–– (3)
Collagen alpha-2(IV) chain	P08572	1109	kallikrein-related peptidase 14 [S01.029] (49)	KGERGTTG (12)		(44)
				–QP-E– (7)	a
				––-E– (2)	a
				–-L–– (2)	a
				–-V–– (1)	a
Insulin-like growth factor-binding protein 1	P08833	165	Matriptase [S01.302] (26)	KALHVTNI (2)		(45)
	–D-N–- (1)	b
				-SXXXDD- (1)	d
				–V––- (2)	h
				–-E–D- (3)	h
Acyl-CoA thioesterase I	P0ADA1	26	Signal peptidase I [S26.001] (294)	RAAAADTL (19)		(46)
				X–––- (1)	d
Protein ygiW	P0ADU5	20	Signal peptidase I [S26.001] (294)	PVMAAEQG (10)		(47)
				––-X– (1)	d
Chymotrypsin inhibitor 3	P10822	24	Signalase (animal) 21 kDa component [S26.010) (363)	SSTADDDL (4)		(48)
				X–––- (7)	d
				–-M–– (1)	h
Plastocyanin minor isoform, chloroplastic	P11490	72	Thylakoidal processing peptidase [S26.008] (52)	NAMAMEVL (20)		(49)
	––Q–- (2)	h
				–-D–– (1)	g
50S ribosomal protein L7Ae	P12743	1	Methionyl aminopeptidase 2 [M24.002] (130)	MPVYV (2)		(50)
				KKMA–– (1)	d
				SKDK–– (8)	d
				MAR–- (1)	d
Beta-crystallin B3	P19141	4	Calpain-1 [C02.001] (101)	MAEQHSTP (10)		(51)
				XXXXXXXX (23)	a, d
Beta-crystallin B3	P19141	10	Calpain-1 [C02.001] (101)	TPEQAAAG (10)		(51)
				XXXXXXXX (23)	a, d
1-phosphatidylinositol-4,5-bisphosphate phosphodiesterase gamma-1	P19174	770	Caspase-7 [C14.004] (112)	AEPDYGAL (20)		(52)
	T–––- (1)	g
Mimecan	P20774	219	ADAMTS4 peptidase [M12.221] (57)	TFLYLDHN (26)		(53)
				-H––– (3)	h
Mimecan	P20774	234	ADAMTS4 peptidase [M12.221] (57)	NLPESLRV (23)		(53)
				X–––- (6)	d
Trypsin inhibitor 2	P26780	30	Signalase (animal) 21 kDa component [S26.010] (363)	IKAQDSEC (7)		(54)
				–-H–– (2)	a
60S ribosomal protein L10	P27635	180	Granzyme B (Homo sapiens)-type) [S01.010] (348)	NADEFEDM (36)		(55)
				R–––- (2)
Chitinase 2	P29027	22	Signalase (animal) 21 kDa component [S26.010] (363)	GVQAAWSS (2)		(56)
				XX––– (1)	a, d
Alpha-synuclein	P37840	122	Calpain-1	DPDNEAYE (34)		(57)
			[C02.001] (101)	–-D–– (1)	g
				–X––- (2)	b, d
Cathepsin E	P43159	53	Cathepsin E [A01.010] (64)	KVDMVQYT (14)		(35)
			-Y––– (11)	a
				-F––– (2)	h
				-H––– (3)	a
				–-G–– (2)	a
				–-T–– (2)	h
				––H–- (5)	a
				–––H- (1)	g
40S ribosomal protein S25	P62852	51	Granzyme B,	LFDKATYD (9)		(55)
			rodent-type [S01.136] (231)	–XXXXX- (2)	b, d
Hemoglobin subunit alpha	P69905	37	Cathepsin D [A01.009] (145)	FLSFPTTK (42)		(34)
				––-W– (1)	h
Hemoglobin subunit alpha	P69905	109	Cathepsin D [A01.009] (145)	LLVTLAAH (36)		(34)
				–––C- (6)	h
Hemoglobin subunit alpha	P69905	110	Cathepsin D [A01.009] (145)	LVTLAAHL (36)		(34)
				––-C– (6)	h
ABC transporter periplasmic-binding protein yphF	P77269	26	Signal peptidase I [S26.001] (294)	FARAAEKE (22)		(47)
	–-T–– (1)	g
Tyrosine-protein phosphatase non-receptor type 18	Q61152	424	Caspase-1 [C14.001] (60)	EVTDGAQT (4)		(58)
	–-G–– (4)	h
				––R–- (1)	g
Cartilage intermediate layer protein 2	Q8IUL8	810	ADAMTS5 peptidase [M12.225] (38)	ALVTATLG (14)		(53)
	-H–-N– (1)	h
				––-N– (2)	h
				–––M- (2)	h
Cartilage intermediate layer protein 2	Q8IUL8	811	ADAMTS5 peptidase [M12.225] (38)	LVTATLGG (14)		(53)
	H–––- (1)	g
				Y–––- (5)	h
				Y–-IM– (2)	h
Cartilage intermediate layer protein 2	Q8IUL8	813	ADAMTS5 peptidase [M12.225] (38)	TATLGGEE (12)		(53)
	M–––- (3)	h
				–S––- (5)	h
Cartilage intermediate layer protein 2	Q8IUL8	830	ADAMTS5 peptidase [M12.225] (38)	PLPATVGV (16)		(53)
	I–––- (1)	h
				M–-I–- (2)	h
				-H––– (1)	g
Cartilage intermediate layer protein 2	Q8IUL8	832	ADAMTS5 peptidase [M12.225] (38)	PATVGVTQ (13)		(53)
	–-I–– (6)	h
				XX––– (1)	d
Probable FKBP-type peptidyl-prolyl cis-trans isomerase 1, chloroplastic	Q9LM71	71	Thylakoidal processing peptidase [S26.008] (52)	SSEARERR (4)		(49)
	–-G–– (1)	g
	XXXXXXXX (15)	a, d

The substrate name, UniProt accession, number of the residue occupying the P1 position in the known cleavage, the peptidase performing the cleavage (with MEROPS identifier in square brackets and the total of known substrates for the peptidase in parentheses), the sequence occupying P4–P4′ in the known cleavage and replacements unobserved in other substrates, the possible cause (a–h, see text for details), and the reference describing the cleavage are given. The numbers in parenthesis after the sequence are the number of homologues where the cleavage site is conserved (those identical to the known cleavage plus acceptable replacements) and the number of sequences where a replacement has occurred that has not been observed in any substrate for the peptidase. A hyphen indicates a conserved amino acid or an acceptable replacement, an ‘x’ indicates a gap character inserted in the alignment. A space indicates where no amino acid is possible (e.g. in P4, P3 and P2 for an aminopeptidase cleavage). Data are arranged by UniProt accession and the P1 position.

Table 4.

Assumed physiological cleavages that are not conserved in terms of peptidase substrate binding

Substrate	UniProt accession	P1	Peptidase [MEROPS ID] (total substrates)	Replacements	Possible cause	Ref.
Serine protease HTRA2, mitochondrial	O43464	211	HtrA2 peptidase [S01.278] (56)	VRLLSGDT (5)		(37)
				–-P–– (4)	g
Cytochrome C	P00022	1	mitochondrial methionyl aminopeptidase [M24.028] (131)	MGDVE (35)		(38)
				-C–- (1)	g
Coagulation factor XIII A chain	P00488	38	thrombin [S01.217] (169)	VVPRGVNL (22)		(39)
	–-L–– (1)	g
Insulin-1	P01325	87	proprotein convertase 2 [S08.073] (59)	RQKRGIVD (36)		(40)
				–WH-W-W (1)	a
				–A-X–R (1)	d
Collagen alpha-2(I) chain	P02465	870	cathepsin D [A01.009] (145)	APGFLGLP (15)		(41)
				–-I–– (21)	h
Collagen alpha-2(I) chain	P02465	863	matrix metallopeptidase−1 [M10.001] (70)	GPQGLLGA (28)		(42)
				-T––– (8)	h
Collagen alpha-2(I) chain	P02465	863	matrix metallopeptidase−8 [M10.002] (87)	GPQGLLGA (28)		(42)
				-T–P–- (8)	h
Platelet-derived growth factor subunit A	P04085	86	Furin [S08.071] (116)	RRKRSIEE (38)		(43)
G-LT–– (1)	b
				L–X–– (1)	d
Collagen alpha-2(IV) chain	P08572	1077	kallikrein-related peptidase 14 [S01.029] (49)	APGRAGLY (6)		(44)
				–-S–– (7)	a
				–-L–– (5)	a
				–-A–– (2)	a
				–-I–– (1)	a, g
				–-V–– (3)
Collagen alpha-2(IV) chain	P08572	1109	kallikrein-related peptidase 14 [S01.029] (49)	KGERGTTG (12)		(44)
				–QP-E– (7)	a
				––-E– (2)	a
				–-L–– (2)	a
				–-V–– (1)	a
Insulin-like growth factor-binding protein 1	P08833	165	Matriptase [S01.302] (26)	KALHVTNI (2)		(45)
	–D-N–- (1)	b
				-SXXXDD- (1)	d
				–V––- (2)	h
				–-E–D- (3)	h
Acyl-CoA thioesterase I	P0ADA1	26	Signal peptidase I [S26.001] (294)	RAAAADTL (19)		(46)
				X–––- (1)	d
Protein ygiW	P0ADU5	20	Signal peptidase I [S26.001] (294)	PVMAAEQG (10)		(47)
				––-X– (1)	d
Chymotrypsin inhibitor 3	P10822	24	Signalase (animal) 21 kDa component [S26.010) (363)	SSTADDDL (4)		(48)
				X–––- (7)	d
				–-M–– (1)	h
Plastocyanin minor isoform, chloroplastic	P11490	72	Thylakoidal processing peptidase [S26.008] (52)	NAMAMEVL (20)		(49)
	––Q–- (2)	h
				–-D–– (1)	g
50S ribosomal protein L7Ae	P12743	1	Methionyl aminopeptidase 2 [M24.002] (130)	MPVYV (2)		(50)
				KKMA–– (1)	d
				SKDK–– (8)	d
				MAR–- (1)	d
Beta-crystallin B3	P19141	4	Calpain-1 [C02.001] (101)	MAEQHSTP (10)		(51)
				XXXXXXXX (23)	a, d
Beta-crystallin B3	P19141	10	Calpain-1 [C02.001] (101)	TPEQAAAG (10)		(51)
				XXXXXXXX (23)	a, d
1-phosphatidylinositol-4,5-bisphosphate phosphodiesterase gamma-1	P19174	770	Caspase-7 [C14.004] (112)	AEPDYGAL (20)		(52)
	T–––- (1)	g
Mimecan	P20774	219	ADAMTS4 peptidase [M12.221] (57)	TFLYLDHN (26)		(53)
				-H––– (3)	h
Mimecan	P20774	234	ADAMTS4 peptidase [M12.221] (57)	NLPESLRV (23)		(53)
				X–––- (6)	d
Trypsin inhibitor 2	P26780	30	Signalase (animal) 21 kDa component [S26.010] (363)	IKAQDSEC (7)		(54)
				–-H–– (2)	a
60S ribosomal protein L10	P27635	180	Granzyme B (Homo sapiens)-type) [S01.010] (348)	NADEFEDM (36)		(55)
				R–––- (2)
Chitinase 2	P29027	22	Signalase (animal) 21 kDa component [S26.010] (363)	GVQAAWSS (2)		(56)
				XX––– (1)	a, d
Alpha-synuclein	P37840	122	Calpain-1	DPDNEAYE (34)		(57)
			[C02.001] (101)	–-D–– (1)	g
				–X––- (2)	b, d
Cathepsin E	P43159	53	Cathepsin E [A01.010] (64)	KVDMVQYT (14)		(35)
			-Y––– (11)	a
				-F––– (2)	h
				-H––– (3)	a
				–-G–– (2)	a
				–-T–– (2)	h
				––H–- (5)	a
				–––H- (1)	g
40S ribosomal protein S25	P62852	51	Granzyme B,	LFDKATYD (9)		(55)
			rodent-type [S01.136] (231)	–XXXXX- (2)	b, d
Hemoglobin subunit alpha	P69905	37	Cathepsin D [A01.009] (145)	FLSFPTTK (42)		(34)
				––-W– (1)	h
Hemoglobin subunit alpha	P69905	109	Cathepsin D [A01.009] (145)	LLVTLAAH (36)		(34)
				–––C- (6)	h
Hemoglobin subunit alpha	P69905	110	Cathepsin D [A01.009] (145)	LVTLAAHL (36)		(34)
				––-C– (6)	h
ABC transporter periplasmic-binding protein yphF	P77269	26	Signal peptidase I [S26.001] (294)	FARAAEKE (22)		(47)
	–-T–– (1)	g
Tyrosine-protein phosphatase non-receptor type 18	Q61152	424	Caspase-1 [C14.001] (60)	EVTDGAQT (4)		(58)
	–-G–– (4)	h
				––R–- (1)	g
Cartilage intermediate layer protein 2	Q8IUL8	810	ADAMTS5 peptidase [M12.225] (38)	ALVTATLG (14)		(53)
	-H–-N– (1)	h
				––-N– (2)	h
				–––M- (2)	h
Cartilage intermediate layer protein 2	Q8IUL8	811	ADAMTS5 peptidase [M12.225] (38)	LVTATLGG (14)		(53)
	H–––- (1)	g
				Y–––- (5)	h
				Y–-IM– (2)	h
Cartilage intermediate layer protein 2	Q8IUL8	813	ADAMTS5 peptidase [M12.225] (38)	TATLGGEE (12)		(53)
	M–––- (3)	h
				–S––- (5)	h
Cartilage intermediate layer protein 2	Q8IUL8	830	ADAMTS5 peptidase [M12.225] (38)	PLPATVGV (16)		(53)
	I–––- (1)	h
				M–-I–- (2)	h
				-H––– (1)	g
Cartilage intermediate layer protein 2	Q8IUL8	832	ADAMTS5 peptidase [M12.225] (38)	PATVGVTQ (13)		(53)
	–-I–– (6)	h
				XX––– (1)	d
Probable FKBP-type peptidyl-prolyl cis-trans isomerase 1, chloroplastic	Q9LM71	71	Thylakoidal processing peptidase [S26.008] (52)	SSEARERR (4)		(49)
	–-G–– (1)	g
	XXXXXXXX (15)	a, d

Substrate	UniProt accession	P1	Peptidase [MEROPS ID] (total substrates)	Replacements	Possible cause	Ref.
Serine protease HTRA2, mitochondrial	O43464	211	HtrA2 peptidase [S01.278] (56)	VRLLSGDT (5)		(37)
				–-P–– (4)	g
Cytochrome C	P00022	1	mitochondrial methionyl aminopeptidase [M24.028] (131)	MGDVE (35)		(38)
				-C–- (1)	g
Coagulation factor XIII A chain	P00488	38	thrombin [S01.217] (169)	VVPRGVNL (22)		(39)
	–-L–– (1)	g
Insulin-1	P01325	87	proprotein convertase 2 [S08.073] (59)	RQKRGIVD (36)		(40)
				–WH-W-W (1)	a
				–A-X–R (1)	d
Collagen alpha-2(I) chain	P02465	870	cathepsin D [A01.009] (145)	APGFLGLP (15)		(41)
				–-I–– (21)	h
Collagen alpha-2(I) chain	P02465	863	matrix metallopeptidase−1 [M10.001] (70)	GPQGLLGA (28)		(42)
				-T––– (8)	h
Collagen alpha-2(I) chain	P02465	863	matrix metallopeptidase−8 [M10.002] (87)	GPQGLLGA (28)		(42)
				-T–P–- (8)	h
Platelet-derived growth factor subunit A	P04085	86	Furin [S08.071] (116)	RRKRSIEE (38)		(43)
G-LT–– (1)	b
				L–X–– (1)	d
Collagen alpha-2(IV) chain	P08572	1077	kallikrein-related peptidase 14 [S01.029] (49)	APGRAGLY (6)		(44)
				–-S–– (7)	a
				–-L–– (5)	a
				–-A–– (2)	a
				–-I–– (1)	a, g
				–-V–– (3)
Collagen alpha-2(IV) chain	P08572	1109	kallikrein-related peptidase 14 [S01.029] (49)	KGERGTTG (12)		(44)
				–QP-E– (7)	a
				––-E– (2)	a
				–-L–– (2)	a
				–-V–– (1)	a
Insulin-like growth factor-binding protein 1	P08833	165	Matriptase [S01.302] (26)	KALHVTNI (2)		(45)
	–D-N–- (1)	b
				-SXXXDD- (1)	d
				–V––- (2)	h
				–-E–D- (3)	h
Acyl-CoA thioesterase I	P0ADA1	26	Signal peptidase I [S26.001] (294)	RAAAADTL (19)		(46)
				X–––- (1)	d
Protein ygiW	P0ADU5	20	Signal peptidase I [S26.001] (294)	PVMAAEQG (10)		(47)
				––-X– (1)	d
Chymotrypsin inhibitor 3	P10822	24	Signalase (animal) 21 kDa component [S26.010) (363)	SSTADDDL (4)		(48)
				X–––- (7)	d
				–-M–– (1)	h
Plastocyanin minor isoform, chloroplastic	P11490	72	Thylakoidal processing peptidase [S26.008] (52)	NAMAMEVL (20)		(49)
	––Q–- (2)	h
				–-D–– (1)	g
50S ribosomal protein L7Ae	P12743	1	Methionyl aminopeptidase 2 [M24.002] (130)	MPVYV (2)		(50)
				KKMA–– (1)	d
				SKDK–– (8)	d
				MAR–- (1)	d
Beta-crystallin B3	P19141	4	Calpain-1 [C02.001] (101)	MAEQHSTP (10)		(51)
				XXXXXXXX (23)	a, d
Beta-crystallin B3	P19141	10	Calpain-1 [C02.001] (101)	TPEQAAAG (10)		(51)
				XXXXXXXX (23)	a, d
1-phosphatidylinositol-4,5-bisphosphate phosphodiesterase gamma-1	P19174	770	Caspase-7 [C14.004] (112)	AEPDYGAL (20)		(52)
	T–––- (1)	g
Mimecan	P20774	219	ADAMTS4 peptidase [M12.221] (57)	TFLYLDHN (26)		(53)
				-H––– (3)	h
Mimecan	P20774	234	ADAMTS4 peptidase [M12.221] (57)	NLPESLRV (23)		(53)
				X–––- (6)	d
Trypsin inhibitor 2	P26780	30	Signalase (animal) 21 kDa component [S26.010] (363)	IKAQDSEC (7)		(54)
				–-H–– (2)	a
60S ribosomal protein L10	P27635	180	Granzyme B (Homo sapiens)-type) [S01.010] (348)	NADEFEDM (36)		(55)
				R–––- (2)
Chitinase 2	P29027	22	Signalase (animal) 21 kDa component [S26.010] (363)	GVQAAWSS (2)		(56)
				XX––– (1)	a, d
Alpha-synuclein	P37840	122	Calpain-1	DPDNEAYE (34)		(57)
			[C02.001] (101)	–-D–– (1)	g
				–X––- (2)	b, d
Cathepsin E	P43159	53	Cathepsin E [A01.010] (64)	KVDMVQYT (14)		(35)
			-Y––– (11)	a
				-F––– (2)	h
				-H––– (3)	a
				–-G–– (2)	a
				–-T–– (2)	h
				––H–- (5)	a
				–––H- (1)	g
40S ribosomal protein S25	P62852	51	Granzyme B,	LFDKATYD (9)		(55)
			rodent-type [S01.136] (231)	–XXXXX- (2)	b, d
Hemoglobin subunit alpha	P69905	37	Cathepsin D [A01.009] (145)	FLSFPTTK (42)		(34)
				––-W– (1)	h
Hemoglobin subunit alpha	P69905	109	Cathepsin D [A01.009] (145)	LLVTLAAH (36)		(34)
				–––C- (6)	h
Hemoglobin subunit alpha	P69905	110	Cathepsin D [A01.009] (145)	LVTLAAHL (36)		(34)
				––-C– (6)	h
ABC transporter periplasmic-binding protein yphF	P77269	26	Signal peptidase I [S26.001] (294)	FARAAEKE (22)		(47)
	–-T–– (1)	g
Tyrosine-protein phosphatase non-receptor type 18	Q61152	424	Caspase-1 [C14.001] (60)	EVTDGAQT (4)		(58)
	–-G–– (4)	h
				––R–- (1)	g
Cartilage intermediate layer protein 2	Q8IUL8	810	ADAMTS5 peptidase [M12.225] (38)	ALVTATLG (14)		(53)
	-H–-N– (1)	h
				––-N– (2)	h
				–––M- (2)	h
Cartilage intermediate layer protein 2	Q8IUL8	811	ADAMTS5 peptidase [M12.225] (38)	LVTATLGG (14)		(53)
	H–––- (1)	g
				Y–––- (5)	h
				Y–-IM– (2)	h
Cartilage intermediate layer protein 2	Q8IUL8	813	ADAMTS5 peptidase [M12.225] (38)	TATLGGEE (12)		(53)
	M–––- (3)	h
				–S––- (5)	h
Cartilage intermediate layer protein 2	Q8IUL8	830	ADAMTS5 peptidase [M12.225] (38)	PLPATVGV (16)		(53)
	I–––- (1)	h
				M–-I–- (2)	h
				-H––– (1)	g
Cartilage intermediate layer protein 2	Q8IUL8	832	ADAMTS5 peptidase [M12.225] (38)	PATVGVTQ (13)		(53)
	–-I–– (6)	h
				XX––– (1)	d
Probable FKBP-type peptidyl-prolyl cis-trans isomerase 1, chloroplastic	Q9LM71	71	Thylakoidal processing peptidase [S26.008] (52)	SSEARERR (4)		(49)
	–-G–– (1)	g
	XXXXXXXX (15)	a, d

There are a number of possible causes for a cleavage site not to be conserved which are listed below.

The UniRef50 entry might include paralogous sequences which although at least 50% identical to the sequence with the known cleavage, might be processed or degraded differently and there is no evolutionary pressure to maintain the known cleavage site. Where a cleavage site was not conserved, a paralogue was identified in an alignment as a second protein from the same species that was clearly not a splice variant.
UniRef50 entries contain many translated genes from genome sequencing projects; gene finding in eukaryote genomes is notoriously difficult and it is possible that erroneous gene building has resulted, for example, in the loss of the exon encoding the cleavage site or the inclusion of part of an intron in its place.
It is also probable that for some peptidases there are not enough substrates known to be sure that any amino acid is really excluded from a particular binding site. The number of substrates known for each peptidase is included in Table 4, because the greater the number of substrates the more likely that an amino acid is really atypical and not just unobserved.
The alignment is incorrect. This is unlikely given the close relationship between the sequences, which are all 50% or more identical; however there are situations where an insert or deletion occurs within the range P4–P4′.
Some endogenous cleavages (for example removal of signal and transit peptides) may be the result of more than one cleavage, because aminopeptidases nibble away the N-terminus (1), and may thus be incorrectly mapped to the specificity of the leader peptidase.
It is theoretically possible that if the substrate and peptidase are from the same organism both will have evolved to accommodate a change in the cleavage position.
A single residue mismatch may also be due to a single-base sequencing error. Potential errors of this kind can be identified using a codon dictionary, provided the atypical residue could be the result of a single base change, and that it is the only residue not conserved, regardless of the number of sequences in the alignment.
Some cleavages regarded as ‘physiological’ are actually fortuitous. If a cleavage site is extremely poorly conserved it is unlikely to be physiologically relevant.

Where it is possible to suggest a cause why a cleavage site is not conserved this is indicated in Table 4 by the letters a–h. Included in category d, where insertions and or deletions occur in the homologous cleavage sites, is 50S ribosomal protein L7Ae (UniProt accession P12743). There are N-terminal extensions to most homologues so that the known methionyl aminopeptidase 2-cleavage site is not aligned. Five of these sequences may be derived from erroneous gene builds (point b). The UniRef50 database entry for 60S ribosomal protein L10 (P27635) includes a wide range of species (the cleavage is known in the human protein) and the peptidase performing the cleavage (granzyme B) is not present in Paracoccidiodes brasiliensis, where the substrate cleavage is also not conserved. The replacements that are reported as atypical in hemoglobin subunit alpha (P69905) by Schistosoma cathepsin D (A01.009) (34) are the rarest naturally occurring amino acids, tryptophan and cysteine, and despite there being 109 known cleavages for this peptidase, this may still not be enough to properly exclude these rare amino acids. On the other hand, this is the cleavage of a host protein by a parasite peptidase and the specificity may have adapted to limit the availability of hosts.

None of the cleavages listed in Table 4 has been assigned to cause f above, namely where changes in the substrate cleavage site may be mirrored by changes in peptidase specificity. Without modelling the substrate binding sites, if that were possible, detecting this situation is difficult. However, the autolytic processing of cathepsin E (P43159) may be such an example (35).

In some cases, a poorly conserved cleavage site may represent a pathological condition in the species where the cleavage was first identified. For example, despite there being few cleavages for cathepsin H, the reported cleavages in the BID protein (36) are in particularly poorly conserved regions (see Figure 3). Cleavage of the BID protein leads to the induction of apoptosis. That the cleavage sites are not well conserved amongst mammalian orthologues is not surprising given that the cytoplasmic substrate and the lysosomal peptidase should not meet under normal circumstances. The mouse protein in which the cleavage was identified may therefore be unusually susceptible to cleavage should the lysosomal membrane be ruptured.

The specificity logos and frequency matrices for all peptidases with 10 or more known substrate cleavages are already available in the MEROPS database. Alignments are also available for all protein substrates that have a corresponding UniRef50 entry, showing conservation of both physiological and non-physiological cleavages. The next release of the database will include tables showing comparative peptidase specificity in terms of preference for both amino acid and amino acid type.

Conclusions

The MEROPS database includes over 39 000 cleavages in substrates (synthetic and naturally occurring) which have been collected from the literature. These are classified as physiological or non-physiological, depending on whether the substrate is naturally occurring and if it is in native conformation. At least one substrate is known for 45% of the different peptidases identified in the MEROPS database. Displays in the database give insights into peptidase specificity and to the conservation of cleavage sites amongst orthologous proteins. The data provide a substantial training set for algorithms to predict peptidase substrates and cleavage positions in those substrates. The data may also be useful for the design of inhibitors and engineering novel specificities into peptidases.

By examining the conservation of cleavage sites in protein substrates in terms of peptidase substrate binding sites, it is clear that there are a number of cleavages where atypical replacements occur. Many of these can be explained by gene build or sequencing errors, inserts or deletions in the region around the cleavage site, or the alignments contain one or more paralogues in which cleavage may be absent or different. In a few cases it is possible that more than one peptidase is involved in processing, or there may not be enough known substrates for some peptidases to be sure that an atypical replacement is really unacceptable. A number of substrate cleavages that may be fortuitous and not of any physiological relevance have been identified.

This cleavage set is freely available and can be downloaded from the MEROPS FTP site (ftp://ftp.sanger.ac.uk/pub/MEROPS/current_release/database_files/Substrate_search.txt).

Funding

Wellcome Trust [grant number WT077044/Z/05/Z]. Funding for open access charge: Wellcome Trust.

Conflict of interest statement. None declared.

Acknowledgements

the author would like to thank Dr Alan Barrett, Chai Yin Kok, Jun Kong, Matias Piipari, Matthew Jenner and Olivia Harris for helping to collect and/or enter cleavage data, Jun Kong for devising the specificity logo software, Dr Penelope Coggill for reading the manuscript and Dr Alex Bateman for guidance and useful discussions.

References

Emanuelsson

Brunak

von Heijne

, et al. ,

Locating proteins in the cell using TargetP, SignalP and related tools

Nat. Protoc.

2007

, vol.

(pg.

953

971

)

Weissman

A.M.

. ,

Regulating protein degradation by ubiquitination

Immunol. Today

1997

, vol.

(pg.

189

198

)

Drag

Salvesen

G.S.

. ,

DeSUMOylating enzymes—SENPs

IUBMB Life

2008

, vol.

(pg.

734

742

)

Shen

L.N.

Liu

Dong

, et al. ,

Structural basis of NEDD8 ubiquitin discrimination by the deNEDDylating enzyme NEDP1

EMBO J.

2005

, vol.

(pg.

1341

1351

)

Rholam

Fahy

. ,

Processing of peptide and hormone precursors at the dibasic cleavage sites

Cell Mol. Life Sci.

2009

, vol.

(pg.

2075

2091

)

Kitabgi

. ,

Inactivation of neurotensin and neuromedin N by Zn metallopeptidases

Peptides

2006

, vol.

(pg.

2515

2522

)

Ghosh

A.K.

Gemma

Tang

. ,

beta-Secretase as a therapeutic target for Alzheimer's disease

Neurotherapeutics

2008

, vol.

(pg.

399

408

)

Murphy

Nagase

. ,

Reappraising metalloproteinases in rheumatoid arthritis and osteoarthritis: destruction or repair?

Nat. Clin. Pract. Rheumatol.

2008

, vol.

(pg.

128

135

)

Churg

Wright

J.L.

. ,

Proteases and emphysema

Curr. Opin. Pulm. Med.

2005

, vol.

(pg.

153

159

)

Seiki

. ,

Membrane-type 1 matrix metalloproteinase: a key enzyme for tumor invasion

Cancer Lett.

2003

, vol.

194

(pg.

)

O'Reilly

D.A.

Yang

B.M.

Creighton

J.E.

, et al. ,

Mutations of the cationic trypsinogen gene in hereditary and non-hereditary pancreatitis

Digestion

2001

, vol.

(pg.

)

Ersmark

Samuelsson

Hallberg

. ,

Plasmepsins as potential targets for new antimalarial therapy

Med. Res. Rev.

2006

, vol.

(pg.

626

666

)

Johansen

J.T.

Ottesen

. ,

The proteolytic degradation of the B-chain of oxidized insulin by papain, chymopapain and papaya peptidase

CR Trav. Lab. Carlsberg

1968

, vol.

(pg.

265

283

)

Schechter

Berger

. ,

On the active site of proteases. 3. Mapping the active site of papain; specific peptide inhibitors of papain

Biochem. Biophys. Res. Commun.

1968

, vol.

(pg.

898

902

)

Rodriguez

Gupta

Smith

R.D.

, et al. ,

Does trypsin cut before proline?

J. Proteome Res.

2008

, vol.

(pg.

300

305

)

Thornberry

N.A.

Chapman

K.T.

Nicholson

D.W.

. ,

Determination of caspase specificities using a peptide combinatorial library

Methods Enzymol.

2000

, vol.

322

(pg.

100

110

)

Fontana

De Laureto

P.P.

Spolaore

, et al. ,

Probing protein structure by limited proteolysis

Acta Biochim. Pol.

2004

, vol.

(pg.

299

321

)

Ruzza

Quintieri

Osler

, et al. ,

Fluorescent, internally quenched, peptides for exploring the pH-dependent substrate specificity of cathepsin B

J. Pept. Sci.

2006

, vol.

(pg.

455

461

)

Schilling

Overall

C.M.

. ,

Proteome-derived, database-searchable peptide libraries for identifying protease cleavage sites

Nat. Biotechnol.

2008

, vol.

(pg.

685

694

)

Rawlings

N.D.

Barrett

A.J.

. ,

Evolutionary families of peptidases

Biochem. J.

1993

, vol.

290

(pg.

205

218

)

Rawlings

N.D.

Morton

F.R.

Kok

C.Y.

, et al. ,

MEROPS: the peptidase database

Nucleic Acids Res.

2008

, vol.

(pg.

D320

D325

)

Barrett

A.J.

Rawlings

N.D.

Woessner

J.F.

. ,

Handbook of Proteolytic Enzymes

1998

(1st)

London, Academic Press

Google Preview

Igarashi

Eroshkin

Gramatikova

, et al. ,

CutDB: a proteolytic event database

Nucleic Acids Res.

2007

, vol.

(pg.

D546

D549

)

Luthi

A.U.

Martin

S.J.

. ,

The CASBAH: a searchable database of caspase substrates

Cell Death Differ.

2007

, vol.

(pg.

641

650

)

Wheeler

D.L.

Barrett

Benson

D.A.

, et al. ,

Database resources of the National Center for Biotechnology Information

Nucleic Acids Res.

2008

, vol.

(pg.

D13

D21

)

Apweiler

Bairoch

C.H.

, et al. ,

UniProt: the Universal Protein knowledgebase

Nucleic Acids Res.

2004

, vol.

(pg.

D115

D119

)

Crooks

G.E.

Hon

Chandonia

J.M.

, et al. ,

WebLogo: a sequence logo generator

Genome Res.

2004

, vol.

(pg.

1188

1190

)

Suzek

B.E.

Huang

McGarvey

, et al. ,

UniRef: comprehensive and non-redundant UniProt reference clusters

Bioinformatics

2007

, vol.

(pg.

1282

1288

)

Edgar

R.C.

. ,

MUSCLE: a multiple sequence alignment method with reduced time and space complexity

BMC Bioinformatics

2004

, vol.

pg.

113

Chumanevich

A.A.

Agniswamy

, et al. ,

Structural basis for executioner caspase recognition of P5 position in substrates

Apoptosis

2008

, vol.

(pg.

1291

1302

)

Isaya

Kalousek

Rosenberg

L.E.

. ,

Amino-terminal octapeptides function as recognition signals for the mitochondrial intermediate peptidase

J. Biol. Chem.

1992

, vol.

267

(pg.

7904

7910

)

Livingstone

C.D.

Barton

G.J.

. ,

Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation

Comput. Appl. Biosci.

1993

, vol.

(pg.

745

756

)

Dando

P.M.

Fortunato

Smith

, et al. ,

Pig kidney legumain: an asparaginyl endopeptidase with restricted specificity

Biochem. J.

1999

, vol.

339

(pg.

743

749

)

Brindley

P.J.

Kalinna

B.H.

Wong

J.Y.M.

, et al. ,

Proteolysis of human hemoglobin by schistosome cathepsin D

Mol. Biochem. Parasitol.

2001

, vol.

112

(pg.

103

112

)

Kay

Tatnell

P.J.

Cathepsin

Barrett

A.J.

Rawlings

N.D.

Woessner

J.F.

. ,

Handbook of Proteolytic Enzymes

2004

Elsevier

London

(pg.

)

Google Preview

Cirman

Oresic

Mazovec

G.D.

, et al. ,

Selective disruption of lysosomes in HeLa cells triggers apoptosis mediated by cleavage of Bid by multiple papain-like lysosomal cathepsins

J. Biol. Chem.

2004

, vol.

279

(pg.

3578

3587

)

Walle

L.V.

Damme

P.V.

Lamkanfi

, et al. ,

Proteome-wide identification of HtrA2/Omi substrates

J. Proteome Res.

2007

, vol.

(pg.

1006

1015

)

Chan

S.K.

Tulloss

Margoliash

. ,

Primary structure of the cytochrome c from the snapping turtle, Chelydra serpentina

Biochemistry

1966

, vol.

(pg.

2586

2597

)

Lorand

Jeong

J.M.

Radek

J.T.

, et al. ,

Human plasma factor XIII: subunit interactions and activation of zymogen

Methods Enzymol.

1993

, vol.

222

(pg.

)

Furuta

Carroll

Martin

, et al. ,

Incomplete processing of proinsulin to insulin accompanied by elevation of Des-31,32 proinsulin intermediates in islets of mice lacking active PC2

J.Biol. Chem.

1998

, vol.

273

(pg.

3431

3437

)

Scott

P.G.

Pearson

. ,

Cathepsin D: specificity of peptide-bond cleavage in type-I collagen and effects on type-III collagen and procollagen

Eur. J. Biochem.

1981

, vol.

114

(pg.

)

Aimes

R.T.

Quigley

J.P.

. ,

Matrix metalloproteinase-2 is an interstitial collagenase. Inhibitor-free enzyme catalyzes the cleavage of collagen fibrils and soluble native type I collagen generating the specific 3/4- and 1/4-length fragments

J. Biol. Chem.

1995

, vol.

270

(pg.

5872

5876

)

Siegfried

Khatib

A.M.

Benjannet

, et al. ,

The proteolytic processing of pro-platelet-derived growth factor-A at RRKR(86) by members of the proprotein convertase family is functionally correlated to platelet-derived growth factor-A-induced functions and tumorigenicity

Cancer Res.

2003

, vol.

(pg.

1458

1463

)

Borgono

C.A.

Michael

I.P.

Shaw

J.L.

, et al. ,

Expression and functional characterization of the cancer-related serine protease, human tissue kallikrein 14

J. Biol. Chem.

2007

, vol.

282

(pg.

2405

2422

)

Uhland

. ,

Matriptase and its putative role in cancer

Cell Mol. Life Sci.

2006

, vol.

(pg.

2968

2978

)

Karasawa

Kudo

Kobayashi

, et al. ,

Lysophospholipase L1 from Escherichia coli K-12 overproducer

J. Biochem.

1991

, vol.

109

(pg.

288

293

)

Link

A.J.

Robison

Church

G.M.

. ,

Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12

Electrophoresis

1997

, vol.

(pg.

1259

1313

)

Shibata

Hara

Ikenaka

. ,

Amino acid sequence of winged bean (Psophocarpus tetragonolobus (L.) DC.) chymotrypsin inhibitor, WCI-3

J. Biochem.

1988

, vol.

104

(pg.

537

543

)

Zybailov

Rutschow

Friso

, et al. ,

Sorting signals, N-terminal modifications and abundance of the chloroplast proteome

PLoS ONE

2008

, vol.

pg.

e1994

Kimura

Arndt

Kimura

. ,

Primary structures of three highly acidic ribosomal proteins S6, S12 and S15 from the archaebacterium Halobacterium marismortui

FEBS Lett.

1987

, vol.

224

(pg.

)

Shih

Lampi

K.J.

Shearer

T.R.

, et al. ,

Cleavage of beta crystallins during maturation of bovine lens

Mol. Vis.

1998

, vol.

pg.

Bae

S.S.

Perry

D.K.

Y.S.

, et al. ,

Proteolytic cleavage of phospholipase C-gamma1 during apoptosis in Molt-4 cells

FASEB J.

2000

, vol.

(pg.

1083

1092

)

Zhen

E.Y.

Brittain

I.J.

Laska

D.A.

, et al. ,

Characterization of metalloprotease cleavage products of human articular cartilage

Arthritis Rheum.

2008

, vol.

(pg.

2420

2431

)

Menegatti

Tedeschi

Ronchi

, et al. ,

Purification, inhibitory properties and amino acid sequence of a new serine proteinase inhibitor from white mustard (Sinapis alba L.) seed

FEBS Lett.

1992

, vol.

301

(pg.

)

Van Damme

Maurer-Stroh

Plasman

, et al. ,

Analysis of protein processing by N-terminal proteomics reveals novel species-specific substrate determinants of granzyme B orthologs

Mol. Cell Proteomics

2009

, vol.

(pg.

258

272

)

Yanai

Takaya

Kojima

, et al. ,

Purification of two chitinases from Rhizopus oligosporus and isolation and sequencing of the encoding genes

J. Bacteriol.

1992

, vol.

174

(pg.

7398

7406

)