Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Overview and analysis of the polyprotein cleavage sites in the family Potyviridae

Overview and analysis of the polyprotein cleavage sites in the family Potyviridae INTRODUCTION Virus‐encoded proteases are involved in the replication cycle of many different viruses. Members of the large plant virus family Potyviridae have a polyadenylated ssRNA genome, encapsidated in slightly flexuous filamentous particles. The genomic RNA is translated into a large polyprotein which encodes three different proteases that cleave the polyprotein into a total of ten mature peptides. Each protease cleaves at specific sites that have been determined experimentally for a few (up to five) species, but which are often deduced for others from sequence alignments. Most species in the family have a monopartite genome and these are allocated to genera on the basis of their mode of transmission, virion size and genome relatedness ( Berger ., 2000 ). The genus Potyvirus is the largest, containing over 100 species which are transmitted by aphids in a non‐persistent manner and have particles > 700 nm long. The other monopartite genera are: Ipomovirus (three species transmitted by whiteflies), Macluravirus (three species transmitted by aphids but with particles < 700 nm long), Rymovirus (three species restricted to the Gramineae and transmitted by Abacarus mites) and Tritimovirus (three species restricted to the Gramineae and transmitted by Aceria mites). Figure 1A shows the genome organization with the ten mature proteins. In the genus Bymovirus (six species restricted to the Gramineae and transmitted by root‐infecting parasites known as plasmodiophorids), the genome is bipartite and RNA1 corresponds to the 3′‐section (eight mature proteins) of the genome of other members ( Fig. 1B ). The smaller RNA (RNA2) of the bymoviruses has no homology with the other members of the family except that the C‐terminal part of its P2‐1 gene resembles the C‐terminal region of the HC‐Pro of the monopartite viruses ( Shukla ., 1998 ). The proteases and cleavage sites that have been identified are also shown in Fig. 1 . Cleavage probably occurs co‐translationally as there is no evidence that the whole polyprotein is produced in vivo , but the different sites are not all processed at the same rate and some intermediate products can be detected ( Merits ., 2002 ). This, and other earlier evidence, indicates that the NIa‐Pro, which cuts at seven of the nine sites, acts in both cis and trans , whereas the P1 and HC‐Pro probably operate only in cis to cleave themselves from the polyprotein as it is translated. 1 Genome organization in the family Potyviridae . The upper diagram shows the ten mature proteins produced by proteolytic cleavage (arrows) of the polyprotein in most members of the family. The two genomic RNAs of the bymoviruses are shown below, aligned to show the homology between RNA1 and the 3′‐section of the monopartite viruses. The positions of the serine protease (S30) and of the two different cysteine proteases (C6 and C4) are indicated. Most of the mature protein products have multiple functions. P1 also has a role in host range determination and the HC‐Pro of potyviruses is required for aphid transmission. The cylindrical inclusion (CI) protein has RNA helicase and NTPase activity and may be involved in cell‐to‐cell movement. The P3 and 6K2 proteins are possibly membrane anchors for the replication complex. The VPg is covalently bound to the 5′‐terminus of the RNA and is a determinant of virulence. The Nuclear inclusion protein b (NIb) is an RNA‐dependent RNA‐polymerase and the coat protein (CP) is the major structural protein of virions and often has roles in symptom production and transmission. In this review, cleavage site patterns are compiled for all sequenced species within the Potyviridae and the presence of unusual sites within the sequences of some members is explained through homology modelling. These sites are sometimes different from those suggested in the original literature or contained within the header information of sequence files. Some patterns and general conclusions are discussed. A list of the 49 fully sequenced species in the family, with their abbreviations, is given in Table 1 . The classification of proteases here is that used in the MEROPS database ( Rawlings ., 2004 ) and the well‐established labelling convention for enzyme binding pockets and substrate residues ( Fig. 2 ; Schechter and Berger, 1967 ) has been adopted. Sequences were obtained from the international databases and the polyprotein amino acid sequences were then aligned using ClustalX ( Thompson ., 1997 ) and manually adjusted where necessary. Where motifs or putative cleavage sites appeared unusual, these were checked with other sequences of the same species, if available. 1 Virus species in the family Potyviridae for which complete sequence data are available. Species Abbreviation Species Abbreviation Genus Potyvirus Genus Potyvirus (contd) Bean common mosaic necrosis virus BCMNV Sugarcane mosaic virus SCMV Bean common mosaic virus BCMV Sweet potato feathery mottle virus SPFMV Bean yellow mosaic virus BYMV Tobacco etch virus TEV Beet mosaic virus BtMV Tobacco vein mottling virus TVMV Chilli veinal mottle virus ChiVMV Turnip mosaic virus TuMV Clover yellow vein virus ClYVV Watermelon mosaic virus WMV Cocksfoot streak virus CSV Wild potato mosaic virus WPMV Cowpea aphid‐borne mosaic virus CABMV Yam mosaic virus YMV Dasheen mosaic virus DsMV Zucchini yellow mosaic virus ZYMV Japanese yam mosaic virus JYMV Johnsongrass mosaic virus JGMV Genus Bymovirus Leek yellow stripe virus LYSV Barley mild mosaic virus BaMMV Lettuce mosaic virus LMV Barley yellow mosaic virus BaYMV Lily mottle virus LMoV Oat mosaic virus OMV Maize dwarf mosaic virus MDMV Wheat yellow mosaic virus WYMV Onion yellow dwarf virus OYDV Papaya leaf distortion mosaic virus PLDMV Genus Ipomovirus Papaya ringspot virus PRSV Sweet potato mild mottle virus SPMMV Pea seed‐borne mosaic virus PSbMV Peanut mottle virus PeMoV Genus Rymovirus Pepper mottle virus PepMoV Agropyron mosaic virus AgMV Peru tomato mosaic virus PTV Hordeum mosaic virus HoMV Plum pox virus PPV Ryegrass mosaic virus RGMV Potato virus A PVA Potato virus V PVV Genus Tritimovirus Potato virus Y PVY Brome streak mosaic virus BStMV Scallion mosaic virus ScaMV Oat necrotic mottle virus ONMV Sorghum mosaic virus SrMV Wheat streak mosaic virus WSMV Soybean mosaic virus SMV 2 Diagrammatic representation of a protease and its substrate showing the nomenclature of Schechter and Berger (1967 ). The substrate residues are labelled progressively from P1 N‐terminal of the cleavage site and P1′, etc., C‐terminal of the cleavage site. The corresponding binding pockets of the protease are labelled from S1 and S1′. THE P1 PROTEASE The first (N‐terminal) mature protein of all the monopartite viruses is the most variable and least conserved region of the genome ( Adams ., 2005 ). However, the C‐terminus of this protein is relatively conserved and was first identified as a serine protease by Verchot . (1991, 1992 ). These viral proteins are now classified as MEROPS Clan SA, family S30 serine proteases. The catalytic triad His‐(X 7−11 )‐Asp‐(X 30−36 )‐Ser [but Glu rather than Asp for viruses of the Bean common mosaic virus (BCMV) subgroup] with Gly‐X‐Ser‐Gly around the active site serine is strictly conserved ( Fig. 3 ). There is no protein of this type in the bymoviruses. The protein cleaves itself from the polyprotein (i.e. at the P1/HC‐Pro junction) but the site has been definitively determined by experiments with only three viruses. In experiments using in vitro translation of the TVMV RNA in a wheat germ system, determination of the N‐terminal amino acid of the HC‐Pro and modifications by site‐directed mutagenesis of the transcription template identified the cleavage site as Phe 274 /Ser 275 ( Mavankal and Rhoads, 1991 ). In similar experiments with TEV, but using modified P1‐GUS constructs, Tyr 304 /Ser 305 was confirmed as the cleavage site ( Verchot ., 1992 ). Work with infectious clones of PVY in protoplasts, demonstrated that the cleavage site was Phe 284 /Ser 285 and that the Phe was essential while the Ser was optimal for this virus ( Yang ., 1998 ). The only other virus for which there is experimental information is WSMV (genus Tritimovirus ). Its cleavage site was mapped using a series of modified templates for in vitro translation to the region between amino acids 348 and 353; alignments suggested that the actual site was probably Tyr 352 /Gly 353 ( Choi ., 2002 ). 3 Sequence alignment of some amino acids at the C‐terminus of the P1 protein of all sequenced members of the Potyviridae showing the active site of the serine protease and the P1/HC‐Pro cleavage site (vertical arrow). The numbers of amino acids between the conserved residues and the amino acid number of the P1′ residue are shown and the three catalytic residues (His, Asp/Glu, Ser) are marked (*). The conserved Gly mentioned in the text is marked as @. Positions where all amino acids are identical (or highly similar) are shaded black and the two levels of grey show positions where 80% or 60% of the residues are identical. Virus abbreviations are as in Table 1 . Viruses for which the cleavage site has been verified experimentally are marked †. The P1/HC‐Pro junction is the one that has most often been wrongly predicted by authors (the PVY site in particular is wrongly defined in most sequence accessions) but alignments produced for this paper show that a consistent pattern can be identified around the cleavage site ( Fig. 3 ). Site P1 is always Phe or Tyr and site P1′ is nearly always Ser. Site P4 is nearly always (42/45) one of the related highly hydrophobic amino acids Ile, Leu, Val or Met and there is also conservation at P2 with either an aromatic residue (25 His, 1 Phe, 2 Trp, 2 Tyr) or a Gln/Glu (12 and 3, respectively). The cleavage site also occurs at a fairly constant distance (22–28 residues) downstream of a conserved Gly residue (marked @in Fig. 3 ). HC‐Pro AND SIMILAR PROTEASES The C‐terminal region of the potyvirus HC‐Pro was first identified as a cysteine protease in TEV by Carrington . (1989 ). These viral proteins are now classified as MEROPS Clan CA, family C6. The sequence Gly‐Tyr‐Cys‐Tyr usually surrounds the active site Cys, and a conserved His associated with catalytic activity has also been identified ( Oh and Carrington, 1989 ). The cleavage site was shown to be Gly/Gly ( Carrington ., 1989 ) and subsequent work using pulse chase labelling to follow the processing of in vitro translation products of modified transcripts showed that the pattern Tyr‐X‐Val‐Gly/Gly was strictly necessary, but that substitutions could be tolerated at positions P5, P3 and P2′ ( Carrington and Herndon, 1992 ). Other potyviruses proved to be extremely similar at this cleavage site and no further experimental work on the cleavage site has been reported. Alignments produced for this paper show very little variation either at the catalytic and cleavage sites, or in the length of the intervening region ( Fig. 4 ). 4 Sequence alignment of some amino acids at the C‐terminus of the HC‐Pro protein of all sequenced members of the Potyviridae showing the active site of the C6 cysteine protease and the HC‐Pro/P3 cleavage site (vertical arrow). This site has been determined experimentally only for TEV. The lower part of the figure shows the corresponding residues in the P2‐1 protein of bymoviruses and the putative P2‐1/P2‐2 cleavage site. The numbers of amino acids between the conserved residues are shown and the two catalytic residues (Cys, His) are marked (*). Positions where all amino acids are identical (or highly similar) are shaded black and the two levels of grey show positions where 80% or 60% of the residues are identical. Virus abbreviations are as in Table 1 . The first mature protein of the RNA2 of bymoviruses is also closely related to the potyvirus HC‐Pro in its C‐terminal region (and is now included in the same protease family). There is a Gly‐Tyr‐Cys‐Tyr (or similar) and a conserved His downstream. This aligns well with the HC‐Pro over this region and it was suggested that the cleavage site in BaYMV is at Val‐Gly/Ser ( Davidson ., 1991 ; Kashiwazaki ., 1991 ). There has been no experimental work on the cleavage sites of bymoviruses but the alignment with sequences now available ( Fig. 4 ) confirms the conservation of this region. Bymovirus cleavage sites show more variation than those of the HC‐Pro; several different amino acids occur at the putative P1′ position, and only the Gly at position P1 is strictly conserved. THE NIA‐Pro These are the best‐studied potyviral proteases. They have a cysteine residue in the active site but structural motifs shared with eukaryotic cellular serine proteases ( Dougherty and Semler, 1993 ). They are now classified as cysteine proteases [MEROPS Clan PA(C), family C4] and are related to the 3C proteases of picornaviruses ( Ryan and Flint, 1997 ; Ziebuhr ., 2003 ). The four active site residues of His, Asp, Cys and His, with Gly‐X‐Cys‐Gly around the active cysteine, are conserved throughout the family ( Fig. 5 ). Distances between these residues and the C‐terminal cleavage site are also very similar, except that the C‐terminus of bymoviruses is shorter. 5 Sequence alignment of some amino acids towards the C‐terminus of the NIa‐Pro protein of all sequenced members of the Potyviridae showing the active site of the C4 cysteine protease and the NIa‐Pro/NIb cleavage site (vertical arrow). The numbers of amino acids between the conserved residues are shown and the four catalytic residues (His, Asp, Cys, His) are marked (*). Positions where all amino acids are identical (or highly similar) are shaded black and the two levels of grey show positions where 80% or 60% of the residues are identical. Virus abbreviations are as in Table 1 . Viruses for which the cleavage site has been verified experimentally are marked †. Published experimental evidence The NIa‐Pro of TEV has been particularly well studied. In a series of elegant experiments, the 49‐kDa protein (the VPg and NIa‐Pro in Fig. 1 ) was identified as a protease and shown to cleave itself from the NIb and also to cleave the NIb/CP junction at a Gln/Ser dipeptide. Sequence alignments suggested that there was a conserved heptapeptide sequence [Glu‐X‐X‐Tyr‐X‐Gln/Ser (or Gly)] at these junctions and also at the 6K1/CI, CI/6K2 and 6K2/VPg boundaries. If the Gln at P1 was changed to Pro or Leu, no cleavage occurred ( Carrington and Dougherty, 1987 ; Carrington ., 1988 ). Introduction of the NIb/CP site (Glu‐Asn‐Leu‐Tyr‐Phe‐Gln/Ser) into a cleavage cassette confirmed the cleavage site as Gln/Ser and showed cleavage of the heptapeptide, but not the dipeptide alone ( Carrington and Dougherty, 1988 ). Amino acid substitutions at the four conserved sites (P6, P3, P1, P1′) showed preferential processing of the wild‐type sequence. At P1, substitution of the Gln with Cys, Gly, Phe or Asn reduced processing and Thr, Arg, Lys, Tyr, His, Ile, Leu, Met or Pro eliminated it. At P3, substitution of the Tyr with Val/Lys, Cys/Met, Ala/Gln/Leu/Phe, Ser, Asn/Glu or Gly progressively decreased processing. At P6, processing was slightly reduced by substitution of the Glu for Pro, Gln, Ser or Lys, rather more with Met, Gly, Val or Ala and almost entirely by Leu or Thr. At P1′, substitution of the Ser by Ile, Arg or Asn had little effect (or even increased processing), Thr, Phe or Cys reduced processing and Asp or Gly eliminated it ( Dougherty ., 1988 ). Amino acid substitutions at the P7 and P2′ positions had minimal effects on cleavage, while substitutions at the non‐conserved positions P5, P4 and P2 often reduced processing but did not eliminate cleavage. It was proposed that there is strong conservation of P6, P3, P1 and P1′ and that variations in P4, and particularly P2, may regulate the rate of cleavage ( Dougherty ., 1989 ). It was then demonstrated that the 49‐kDa protein had an internal cleavage site between the VPg and proteinase domains at a suboptimal cleavage site Glu/Gly ( Dougherty and Parks, 1991 ) and that mutation of Glu to Leu prevented cleavage ( Carrington ., 1993 ). If substitutions were made at this site (Glu‐Asp‐Leu‐Thr‐Phe‐Glu/Gly) to come closer to the optimal consensus sequence (if P3 was changed to Tyr and/or P1 changed to Gln), the site was cleaved more efficiently but this had a detrimental effect on genome amplification, showing that there was a requirement for suboptimal cleavage at this site, presumably because of some role for the uncleaved protein ( Schaad ., 1996 ). Studies with recombinant TEV NIa‐Pro also showed that 24 amino acids were spontaneously lost from the C‐terminus of the protein and that this substantially decreased proteolytic activity ( Parks ., 1995 ). The highly specific nature of the protease/substrate recognition was demonstrated in experiments in which the TEV NIa‐Pro was shown to cleave its own NIb/CP site (Glu‐Asn‐Leu‐Tyr‐Phe‐Gln/Ser) but not that predicted for TVMV (Glu‐Thr‐Val‐Arg‐Phe‐Gln/Ser) and vice versa. After construction of hybrid proteases, specificity was shown to reside in three domains of the protein ( Parks and Dougherty, 1991 ). In recent years, the high specificity of the TEV NIa‐Pro has been exploited for protein engineering and there has been renewed interest in the specificity of its cleavage. Under digestion conditions probably more realistic than those used in earlier experiments ( Dougherty ., 1988 ) many different amino acids were tolerated at the P1′ position, although some residues (e.g. Glu, Leu and Lys) gave less than optimal cleavage ( Kapust ., 2002 ). The subsequent resolution of the crystal structure of the TEV NIa‐Pro in association with its substrate has now provided detailed information on the positions of the substrate binding pockets ( Phan ., 2002 ). It was shown that P6, P4, P3, P2, P1 and P1′ all make contact with the enzyme active site, and the specific amino acid interactions involved have been identified. Because the NIa‐Pro is such a highly conserved protein, these data should have predictive value for other members of the Potyviridae. In experiments with recombinant NIa‐Pro of TuMV, an internal self‐cleavage site (Gln‐Ala‐Ser‐Gln‐Pro‐Ser 223 /Gly 224 ) near the C‐terminus of the NIa‐Pro was demonstrated (removing the last 20 amino acids). In contrast to results with TEV ( Parks ., 1995 ), the processing speed at the 6K1/CI junction was little affected ( Kim ., 1995 ) but it is not known whether this cleavage occurs in vivo . Examination of the sequence alignment shows that the TuMV and TEV internal cleavage sites occur on different sides of a Pro residue that is conserved throughout the genus Potyvirus (results not shown), so these effects may not be strictly comparable. A second spontaneous internal cleavage site (Thr 207 /Ser 208 ) that removed the C‐terminal 36 amino acids abolished processing activity ( Kim ., 1996 ). Mutation of the P4 Val to Asp or of the P1 Gln to Ala in the NIb/CP junction sequence (Thr 238 ‐Ala‐Val‐Tyr‐Ala‐Gln/Thr) greatly decreased processing but other changes in the region 218–237, including deletions (Lys 230 , Ser 229 ‐Lys 230 , Ser 229 ‐Leu 231 , or Ser 229 ‐Asp 234 ), insertions (5 × Gly at Ser 229 /Lys 230 or Ser 220 /Gln 221 ) and mutations (Phe 226 , Ile 232 or Leu 235 to Asp; Val 228 or Lys 230 to Glu), had little effect on the processing rate ( Kim ., 1998 ). A genetic assay was used to examine the substrate specificity of the 6K1/CI junction (Pro‐Thr‐Val‐Tyr‐His‐Gln/Thr). Random amino acid substitutions were created at sites P6–P4, P3–P2 and P1–P1′ and the sequences of those that produced products cleaved by the NIa‐Pro were determined. At P4, most of the sequences were Val (57%) or Cys (30%), all sequences contained His at P2, 73% had Gln at P1 and 50% had Ser at P1′. Although other amino acids could be tolerated at many of the positions, there was a strong bias towards the TuMV consensus cleavage sequence ( Kang ., 2001 ). Work with PPV showed that the NIa‐Pro cleaved the putative NIb/CP site at Gln/Ala and that Pro/Ala was not cleaved ( Garciá., 1989 ). Later work provided the first identification of a P3/6K1 site; mutation of the Gln at P1 to His abolished cleavage and changes at P3′ also affected processing efficiency, but it appeared that this site was only partially processed ( Garciá., 1992 ); processing was not essential for infectivity but did affect symptom production ( Riechmann ., 1995 ). With TVMV, mutation of any of the consensus P4 to P1 residues (Val‐Arg‐Phe‐Gln) to Gly abolished activity. Reduction of the target substrate sequence from ten to six amino acids before the cleavage site had little effect on processing, but reduction to four decreased activity substantially even though P6 is not conserved in this virus ( Yoon ., 2000 ). Mutation of the TVMV P4 Val to Ala or Leu prevented cleavage, while substitution of the P3 Arg with either Phe or Tyr greatly decreased activity ( Tözsér ., 2005 ). With PVA, mutation of the P4 Val to Ala at the CI/6K2 cleavage site (Glu‐Ala‐Val‐Gln‐Phe‐Gln/Ser) completely abolished cleavage of the site in a baculovirus expression system, while mutation of the P6 Glu to Gly at the 6K2/VPg site (Glu‐Val‐Val‐Ala‐Phe‐Gln/Ser) slowed down, but did not prevent, cleavage ( Merits ., 2002 ). The P3/6K1, CI‐6K2 and VPg/NIa‐Pro junctions were processed more slowly than the other sites. Although there is experimental evidence for only five viruses, there is a sufficient pattern in their amino acid sequences for the cleavage sites to be predicted with confidence for most members of the family. The most difficulty arises with SPMMV, the only fully sequenced member of the genus Ipomovirus , but re‐examination of the data has suggested that it may have a novel cleavage site, as discussed below. A summary of the frequency of each amino acid in each position around the cleavage site for the 49 fully sequenced species in the family is shown in Table 2 . Some major patterns shown by the data are discussed below and related where possible to the active site–substrate interactions deduced from resolution of the crystal structure of TEV NIa‐Pro ( Phan ., 2002 ). 2 Occurrence of each amino acid at each position around the NIa‐Pro cleavage site for fully sequenced species in the family Potyviridae ( n = 343 from 49 sequences × 7 cleavage sites). The final column shows the numbers of amino acids expected at each position if they were used at random in proportion to their total numbers in the polyproteins analysed. Numbers underlined differ significantly ( P < 0.01) from random using the Genstat procedure BNTEST to test differences between two proportions ( Payne and Arnold, 2003 ). Amino acid Position P6 P5 P4 P3 P2 P1 P1′ P2′ P3′ Random Ala A 15 25 5 15 8 0 96 17 12 23 Cys C 2 8 7 5 2 0 1 1 1 6 Asp D 44 30 0 11 0 0 2 15 64 19 Glu E 114 56 0 52 14 83 0 9 30 22 Phe F 6 12 8 6 51 0 0 3 0 15 Gly G 8 15 2 9 0 0 81 37 12 20 His H 3 7 0 20 138 6 2 1 10 10 Ile I 9 14 22 13 3 0 0 4 2 20 Lys K 12 12 0 24 0 0 1 136 29 24 Leu L 8 19 7 4 44 0 0 43 2 30 Met M 4 4 4 5 7 0 3 3 1 10 Asn N 23 16 0 4 2 0 5 9 25 17 Pro P 11 24 1 0 2 0 0 0 8 13 Gln Q 17 11 0 21 4 253 0 5 21 13 Arg R 5 8 1 40 0 1 3 20 16 18 Ser S 21 22 3 29 5 0 138 28 75 23 Thr T 22 31 2 25 18 0 9 5 33 21 Val V 13 18 274 29 12 0 2 5 2 23 Trp W 0 1 1 1 2 0 0 0 0 4 Tyr Y 6 10 6 30 31 0 0 2 0 12 Cleavage sites of SPMMV The authors of the full sequence of SPMMV were unable to suggest exact cleavage sites for the NIa‐Pro/NIb and NIb/CP sites because of the absence of the usual (Gln or Glu)‐(Ala, Ser or Gly) dipeptides ( Colinet ., 1998 ). Two possible positions for the SPMMV NIb/CP site were then suggested in a study of some partial sequences giving cleavage at either a Gln/Arg (Thr‐Ile‐Thr‐Val‐Val‐Gln/Arg) or a Glu/Pro (Phe‐Asp‐Val‐Tyr‐Val‐Glu/Pro) dipeptide ( Mukasa ., 2003 ). However, a partial sequence of a second ipomovirus (CVYV; Lecoq ., 2000 ) can be aligned in the region around the NIb/CP site and this suggests that SPMMV might be cleaved at an His/Ala dipeptide ( Fig. 6A ). Eleven partial sequences of SPMMV have identical amino acids in this region (data not shown). Re‐examination of other cleavage sites in the full SPMMV sequence then shows that at most, if not all, of the sites cleaved by the NIa‐Pro, a revised cleavage site can be identified ( Fig. 6B ). These sites are all in positions consistent with the alignment for other viruses in the family and, apart from the novel P1 amino acid, provide a better consensus than those previously suggested. There is conservation at P4 that is typical of most viruses in the family (see below), Glu (or the similar Asp) at P3, while the conserved His at P1 can be explained by a difference in the amino acids that form the enzyme S1 pocket (see below). 6 Alignment of amino acids around the proposed sites cleaved by the NIa‐Pro in SPMMV (A) the NIb/CP site aligned with that for CVYV; (B) the other probable SPMMV cleavage sites. The numbers are the positions of the proposed cleavage sites from the start of the polyprotein. Position P6 The importance of P6 was first suggested by the conservation of Glu at this position in TEV. This was probably fortunate for the progress of research as there is strict conservation of P6 only in TEV, LMoV and PLDMV (all as Glu). In other species it is impossible to detect any pattern, although Glu was generally the most frequent residue ( Table 2 ). Position P5 There is no recognizable pattern in the amino acids at this position and, of all the sites examined, this is the one where amino acid use is closest to that of the complete polyproteins. These results are consistent with the experimental evidence that this site plays little or no role in recognition by the enzyme and that there is no S5 pocket in the TEV protease ( Phan ., 2002 ). Position P4 The importance of P4 is evident from the experimental work discussed above. Val occurs at almost 80% and other highly hydrophobic residues account for most of the remainder. Most members of the genus Potyvirus have a strictly conserved Val at P4 and TEV is unusual with its mixture of Val, Ile and Leu. The other exceptions within the genus are BYMV and ClYVV (two closely related viruses) which have Tyr, Phe, Leu and Cys while LYSV has Met, Phe or Trp. Ile is frequent at P4 in the bymoviruses. Pocket S4 of TEV particularly involves the residues Ala 169 , Asn 171 , Tyr 178 and His 214 ( Phan ., 2002 ). While most members of the family have an aromatic Tyr or Phe in the position corresponding to 178 on TEV, the three potyviruses BYMV, ClYVV and LYSV have Val, Ile and Met, respectively ( Fig. 7 ). The effect of such substitutions in BYMV was examined by protein modelling and this confirmed the complementary nature of the changes at the P4 position and in the S4 pocket amino acids ( Fig. 8 ). Using the X‐ray diffraction structure for the TEV NIa protease ( Phan ., 2002 ) as a template, we constructed a homology model of the BYMV protease and analysed the effect of a P4 Leu→Tyr substitution in both proteins. Inspection of the TEV structure revealed that the S4 pocket is shaped as a bent finger. It is lined by hydrophilic surfaces due to His 214 and Tyr 178 on the outside, and by hydrophobic surfaces due to Val 216 and Ala 169 on the inside and to Phe 139 at the bottom. Although in the TEV structure the pocket contains a Leu residue ( Fig. 8A ), in silico mutagenesis showed that it may also accommodate Ile and Val residues (data not shown). However, our results also suggested that it is too shallow to accommodate residues with aromatic side chains such as Phe or Tyr ( Fig. 8B ). In contrast, in the BYMV protease His 214 →Ile and Tyr 178 →Val substitutions create a deeper more linear and more hydrophobic S4 pocket that may accommodate P4 Leu and Trp residues ( Fig. 8C,D ). Interestingly, our study also suggested that due to its narrower and more linear shape the S4 pocket in our BYMV model is less likely to accommodate residues with side chains branched at the C‐beta position such as Ile and Val (data not shown). 7 Alignment of some amino acids towards the C‐terminus of the NIa‐Pro protein of selected members of the Potyviridae to show residues involved in substrate pockets S1, S2 and S4 as discussed in the text. The upper line shows the TEV sequence to which the crystal structure refers ( Phan ., 2002 ). Selected members of the genus Potyvirus and consensus sequences for the genera Potyvirus , Rymovirus , Tritimovirus and Bymovirus are shown as well as the sequences for SPMMV, genus Ipomovirus , and Cardamom mosaic virus (CdMV), genus Macluravirus . Residues discussed in the text are shaded. Virus abbreviations are as in Table 1 . 8 Molecular models showing substrate molecules (stick and ball) within the S4 substrate binding pocket of potyviral NIa‐Pro proteases using an X‐ray diffraction structure available for the TEV NIa protease (PDB accession 1LVB; Phan ., 2002 ) as a template. This structure (A) is a complex between the catalytically inactive C151A TEV mutant and the substrate peptide TENLYFQSGT, allowing the detailed inspection of its interactions with the enzyme binding sites. Variant substrate peptides were generated by replacing side chains in the substrate peptide from the crystal structure. Hydrogen atoms were added consistent with pH 7 and side chain clashes were avoided by choosing low‐energy rotamers. The models were surrounded by a 5‐Å layer of water molecules and the energy of the structure minimized for 1000 cycles of conjugate gradient minimization using the CHARMM forcefield and CHARMm v.29b1 (Accelrys Inc., San Diego, CA). The stereochemical quality of the models was assessed with the Biotech validation suite for protein structures and Procheck v.3.5 ( Laskowski ., 1993 ) available at http://biotech.ebi.ac.uk:8400/ . Protein structures were visualized and manipulated using INSIGHT II (Accelrys Inc.) on an SGI O2 workstation. Hydrophobic residues lining the pocket are shown in red. In the TEV structure the S4 pocket contains a Leu P4 residue (A) but seems to be too shallow to accommodate a Tyr residue (B). In a model of the BYMV protease a wider and more hydrophobic S4 pocket may accommodate both Leu (C) and Tyr (D) P4 residues. In a recent study using a similar modelling approach, the TVMV S4 pocket was shown to be shallower than that of TEV because TVMV has Leu at the position corresponding to TEV Ala 169 ( Tözsér ., 2005 ). This appears to explain why TVMV accommodates only Val at P4, whereas TEV uses Val, Ile and Leu residues. Our alignments show that nearly all species in the genus Potyvirus resemble TVMV in having a Leu residue in their S4 pocket ( Fig. 7 ), which would explain why TEV is unusual. Position P3 It is difficult to detect any consistent patterns within or between species at this position. Position P2 Experimental evidence showed that TEV could tolerate different amino acids at this position ( Dougherty ., 1989 ) but suggested that His was a strict requirement for TuMV ( Kang ., 2001 ). This is reflected in some distinctive patterns amongst species at this position ( Table 3 ). A substantial number of species (20/37 members of the genus Potyvirus ) have a conserved His at most sites but are different at the Pro/NIb site. It is possible that the Pro/NIb site is different because it is cut in cis , while the others are cut in trans , although in the genus Rymovirus His is conserved except at the 6K2/VPg site. Other potyviruses (PeMoV, BtMV, BYMV, ClYVV, TVMV, PVA, LMoV, LYSV, OYDV) or tritimoviruses show a strong preference for the aromatic residues Tyr and/or Phe and the bymoviruses for Leu. In contrast, TEV, PSbMV and the viruses of the BCMV subgroup show no strong pattern at P2 although hydrophobic residues are used almost exclusively. It has been pointed out earlier that these patterns often correspond to phylogenetic groupings ( Chen ., 2001 ). Pocket S2 of TEV particularly involves the four hydrophobic residues Val 209 , Trp 211 , Val 216 and Met 218 ( Phan ., 2002 ). This region is also fairly well conserved within the family but bymoviruses are notably different with a Cys corresponding to position 209 and a shorter C‐terminus to the protein that disrupts alignment after position 211 (a strongly conserved Trp or Tyr) ( Fig. 7 ). This might provide an explanation for the use of Leu by bymoviruses rather than the more usual aromatic residue at P2. We have examined the other patterns of residue preference at P2 by modelling but have been unable to discover any obvious structural basis for these patterns. 3 Amino acid at the P2 position of each NIa‐Pro cleavage site for each fully sequenced species in the family Potyviridae . P3/6K1 6K1/CI CI/6K2 6K2/VPg VPg/NIa‐Pro NIa‐Pro/NIb NIb/CP Potyvirus ZYMV T L L V L T L BCMV V M L T V T L BCMNV T P L T L V T SMV A V L T M V L WMV A V L T V V L CABMV H V L T V T L DsMV V I F A M L L PeMoV Y Y Y Y T Y Y BtMV Y Y Y Y E E Y PSbMV I C C L T T L PRSV H H H H H E H BYMV F F F F F F F ClYVV M F F F F F F PVY H H H H H E H PepMoV H H H H H E H PTV H H H H H E H PVV H H H H H E H WPMV H H H H H E H TEV E T L F F S F LMV H H H H H F H TVMV F F F F F T F PVA F F F F F T F JYMV H H H H H Q H TuMV H H H H H A H ScaMV H H H H H A H PPV H H H H H T H LMoV F F F F F L F SPFMV H H H H H A H YMV H H H H H A H PLDMV H H H H F A H ChiVMV H H H H H E H LYSV F H Y Y F F Y SCMV H Q H H H E H SrMV H H H H H E H MDMV H H H H H E H JGMV H H H H H N H CSV H H H H H M H OYDV Y Y Y Y F F Y Rymovirus RGMV H H H L H H H AgMV H H H F H H H HoMV H H H F H H H Tritimovirus BStMV F Y Y F F F F ONMV Y Y Y Y Y W Y WSMV Y Y Y F F W Y Ipomovirus SPMMV Q S S Q S S P Bymovirus BaMMV L L L I L M L BaYMV L L L L L M L WYMV L L L L L N L OMV L L L L L E L Position P1 The strong conservation of Gln at this site is well documented and, for many species, the only exception is the suboptimal Glu at the more slowly processed VPg/NIa‐Pro junction. Tritimo‐ and rymo‐viruses also have Glu at the 6K2/VPg and NIb/CP junctions, while JGMV has Glu at all positions except the NIb/CP. It could be significant that Glu appears to be more favoured in those viruses with graminaceous hosts. The residues Thr 146 and His 167 were shown to be involved in the formation of TEV pocket S1 and thus interactions with P1 ( Phan ., 2002 ), confirming earlier predictions for Pepper vein‐banding virus (= ChiVMV) ( Joseph and Savithri, 2000 ). These residues occur five upstream and 16 downstream of the active site Cys and are conserved in these exact positions virtually throughout the family ( Fig. 7 ), explaining the almost universal preference for Gln at P1. The only exception to His is the Asn in SPMMV and this substantial change to a different type of amino acid lends strong support to the suggestion (above) that this virus has a different preference (i.e. for His) at P1. We further substantiated this suggestion by homology modelling, using the TEV NIa protease ( Phan ., 2002 ) as a template ( Fig. 9 ). Two substrate peptides were docked in this model: ENLYFQ/SGT, which was crystallized with TEV NIa‐Pro, and LYVEQH/GKK, which corresponds to the probable P3/6K1 cleavage site in SPMMV and which was modelled using the structure of ENLYFQ/SGT as template. In the TEV protease S1 pocket, the four possible hydrogen bonds for the side chain of the Gln P1 residue are satisfied by interaction with Thr 146 , Asp 148 and His 167 , resulting in a very favourable rotamer energy ( Fig. 9A ). In the same structure the side chain of the P3 Tyr residue is also hydrogen bonded by Asp 148 and Asn 174 . After substitution of the ENLYFQ/SGT substrate with LYVEQH/GKK ( Fig. 9B,C ) modelling suggested that in TEV the S1 pocket would be unable to accommodate the P1 His residue present in the SPMMV cleavage sites. The presence of Asp 148 also created an unfavourable electrostatic interaction with P3 Glu. In contrast, after energy minimization the P1 His of the same substrate peptide fitted well in the S1 pocket of the SPMMV NIa‐Pro model, with the most favourable rotamer energy of 1.31 kJ/mol ( Fig. 9D ). This low but positive energy resulted from suboptimal electrostatic interactions with surrounding residues (including one unsatisfied hydrogen bond) but was more than compensated for by the presence of Arg 170 (instead of Ser 170 in TEV), which strongly interacts with the P3 glutamate, resulting in a best rotamer energy of −18 kJ/mol for this residue and −70 kJ/mol for Arg 170 . The negative interaction observed in TEV between P3 Glu and Asp 148 did not occur as SPMMV contains a Leu residue at this position. 9 Molecular models showing substrate molecules (stick and ball) within the S1 substrate binding pocket of potyviral NIa‐Pro proteases using the TEV structure (A) as a template ( Phan ., 2002 ). See legend to Fig. 8 for experimental details. Hydrophobic residues lining the pocket are shown in red. (A) ENLYFQ/SGT (the TEV substrate peptide) in the TEV binding pocket. The side chain of the P1 glutamine residue fits in the S1 pocket with the most favourable rotamer energy of −30 kJ/mol. (B,C) LYVEQH/GKK (the putative SPMMV substrate peptide) in the TEV binding pocket viewed from two different angles. The side chain of the P1 His residue does not fit well and the two lowest energy rotamers were +1546 kJ/mol and +647 kJ/mol because of VanderWalls clashes with His 167 (B) and Ser 170 (C), respectively. (D) LYVEQH/GKK substrate peptide in a homology model of the proposed SPMMV binding pocket. After energy minimization the P1 His fitted in the S1 pocket with the most favourable rotamer energy of 1.31 kJ/mol. Position P1′ Although experimental work has sometimes shown that many amino acids can be tolerated at this position, in practice > 90% of the positions have one of the small amino acids, i.e. Ala, Gly or Ser ( Table 2 ). However, there are distinctive patterns amongst the different cleavage sites with a majority of Ala at P3/6K1, of Gly at 6K2/VPg (although rarely at NIb/CP) and of Ser at 6K1/CI and CI/6K2 ( Table 4 ). The reason for these preferences is not known. 4 Occurrence of the most frequent amino acids in the P1′, P2′ and P3′ positions at each NIa‐Pro cleavage site for the 49 fully sequenced species in the family Potyviridae . Position Site Amino acid Ala Asp Glu Gly Lys Leu Asn Arg Ser Thr A D E G K L N R S T Other P1′ P3/6K1 32 2 0 6 0 0 0 3 6 0 0 6K1/CI 7 0 0 3 0 0 0 0 37 1 1 CI/6K2 5 0 0 5 1 0 3 0 30 5 0 6K2/VPg 7 0 0 41 0 0 0 0 1 0 0 VPg/Pro 12 0 0 12 0 0 1 0 22 1 1 Pro/NIb 7 0 0 13 0 0 1 0 21 2 5 NIb/CP 26 0 0 1 0 0 0 0 21 0 1 P2′ P3/6K1 0 0 0 0 38 0 1 3 7 0 0 6K1/CI 0 0 0 7 0 32 1 0 2 0 7 CI/6K2 0 3 0 0 35 0 0 6 0 0 5 6K2/VPg 6 0 1 0 32 2 1 1 2 3 1 VPg/Pro 4 0 4 6 9 4 1 6 6 0 9 Pro/NIb 4 11 0 19 3 0 4 2 6 0 0 NIb/CP 3 1 4 5 19 5 1 2 5 2 2 P3′ P3/6K1 1 3 0 2 8 0 3 8 10 9 5 6K1/CI 1 37 3 0 0 0 2 0 1 2 3 CI/6K2 1 9 6 0 1 0 5 0 9 4 14 6K2/VPg 3 0 1 9 10 0 10 3 11 1 1 VPg/Pro 2 0 0 0 0 1 1 2 36 7 0 Pro/NIb 1 7 5 0 6 0 3 3 5 2 17 NIb/CP 3 8 15 1 4 1 1 0 3 8 5 Positions P2′ and P3′ There is little experimental evidence that these positions are significant for the efficiency of cleavage, but there are some distinctive patterns in the amino acids that occur ( Table 4 ). In particular, there is a high frequency of Lys at P2′ for four of the cleavage sites, but of Leu at the 6K1/CI site. At P3′, Asp predominates at the 6K1/CI site and Ser at the VPg/NIa‐Pro site. As the experimental work has used peptide substrates that do not necessarily have the same three‐dimensional conformation as the native polyprotein, these residues may yet be shown to play a role in enzyme‐substrate recognition. Alternatively, they may relate more to the function of the N‐terminus of the protein downstream of the cleavage site. The NIb/CP cleavage site For many viruses in the family Potyviridae there are no complete genome sequences, but there are often partial sequences of the 3′‐end of the genome that frequently include the NIb/CP cleavage site. In total, there are data for 113 species at this cleavage site and these are summarized in Table 5 . Most of these sequences (91) are from the genus Potyvirus and some of the less frequent amino acids occur because of different patterns in other genera. The most distinctive are Glu /Ala for the genus Rymovirus , Cys ‐X‐X‐ Glu /Ser for the genus Tritimovirus , Phe (or Leu )‐Gln/ Met ‐Asp for the genus Macluravirus and Ile (or Thr )‐X‐ Leu ‐Gln/Ala for the genus Bymovirus . Amino acids do not appear to be used at random at any of the positions examined. 5 Occurrence of each amino acid at each position around the NIb/CP cleavage site for 113 species sequenced in this region in the family Potyviridae . The final column shows the numbers of amino acids expected at each position if they were used at random in proportion to their total numbers in the coat proteins of all sequences analysed. Numbers underlined differ significantly (P < 0.01) from random using the Genstat procedure BNTEST to test differences between two proportions ( Payne and Arnold, 2003 ). Amino acid Position P6 P5 P4 P3 P2 P1 P1′ P2′ P3′ Random Ala A 2 1 0 2 0 0 45 6 4 9 Cys C 3 3 5 1 0 0 0 0 0 1 Asp D 19 15 0 2 0 0 0 38 15 7 Glu E 32 31 1 8 0 8 2 0 29 8 Phe F 3 3 3 9 20 0 1 0 0 3 Gly G 2 0 0 0 1 0 12 39 1 7 His H 0 1 0 13 51 1 0 0 1 3 Ile I 6 4 7 1 0 0 0 1 1 5 Lys K 2 3 0 3 0 0 0 7 13 7 Leu L 8 3 4 1 28 0 0 1 4 7 Met M 1 4 0 3 2 0 5 0 0 5 Asn N 3 4 0 0 0 0 0 7 4 7 Pro P 4 2 0 0 1 0 0 0 2 5 Gln Q 3 3 3 2 0 103 0 2 5 5 Arg R 1 2 0 10 0 1 0 1 4 7 Ser S 6 12 0 15 1 0 44 9 8 7 Thr T 5 8 2 4 1 0 1 2 17 8 Val V 5 6 88 14 0 0 3 0 5 7 Trp W 0 1 0 1 0 0 0 0 0 1 Tyr Y 8 7 0 24 8 0 0 0 0 3 80% consensus Val His Gln Ala Ile Leu Ser Phe Gly 50% consensus Glu Glu Val Tyr His Gln Ala Asp Glu Asp Asp Val Leu Ser Gly Thr Leu Ser Ser Lys Tyr Arg CONCLUSIONS The paper has provided a rigorous and holistic examination of the patterns around the different polyprotein cleavage sites in the family Potyviridae . Explanations for these patterns have been suggested and some have been supported by detailed molecular modelling. The data also provide a standard for the prediction of these sites as further virus sequences become available. Further analyses have been done to provide the probable cleavage sites on all published sequences within the family. The results have been used to inform and improve a comprehensive database of plant virus sequences (DPVweb at http://www.dpvweb.net ; Adams and Antoniw, 2004 ) and the cleavage sites identified have also been placed on a dedicated public website ( http://www.rothamsted.bbsrc.ac.uk/ppi/links/pplinks/potycleavage/index.html ), which is updated regularly. ACKNOWLEDGEMENTS We thank Alan Todd for statistical analyses and Kim Hammond‐Kosack for helpful comments on the manuscript. Rothamsted Research receives grant‐aided support from the Biotechnology and Biological Sciences Research Council of the UK. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Molecular Plant Pathology Wiley

Overview and analysis of the polyprotein cleavage sites in the family Potyviridae

Loading next page...
 
/lp/wiley/overview-and-analysis-of-the-polyprotein-cleavage-sites-in-the-family-hAt2GyUHpJ

References (48)

Publisher
Wiley
Copyright
Copyright © 2005 Wiley Subscription Services, Inc., A Wiley Company
ISSN
1464-6722
eISSN
1364-3703
DOI
10.1111/j.1364-3703.2005.00296.x
pmid
20565672
Publisher site
See Article on Publisher Site

Abstract

INTRODUCTION Virus‐encoded proteases are involved in the replication cycle of many different viruses. Members of the large plant virus family Potyviridae have a polyadenylated ssRNA genome, encapsidated in slightly flexuous filamentous particles. The genomic RNA is translated into a large polyprotein which encodes three different proteases that cleave the polyprotein into a total of ten mature peptides. Each protease cleaves at specific sites that have been determined experimentally for a few (up to five) species, but which are often deduced for others from sequence alignments. Most species in the family have a monopartite genome and these are allocated to genera on the basis of their mode of transmission, virion size and genome relatedness ( Berger ., 2000 ). The genus Potyvirus is the largest, containing over 100 species which are transmitted by aphids in a non‐persistent manner and have particles > 700 nm long. The other monopartite genera are: Ipomovirus (three species transmitted by whiteflies), Macluravirus (three species transmitted by aphids but with particles < 700 nm long), Rymovirus (three species restricted to the Gramineae and transmitted by Abacarus mites) and Tritimovirus (three species restricted to the Gramineae and transmitted by Aceria mites). Figure 1A shows the genome organization with the ten mature proteins. In the genus Bymovirus (six species restricted to the Gramineae and transmitted by root‐infecting parasites known as plasmodiophorids), the genome is bipartite and RNA1 corresponds to the 3′‐section (eight mature proteins) of the genome of other members ( Fig. 1B ). The smaller RNA (RNA2) of the bymoviruses has no homology with the other members of the family except that the C‐terminal part of its P2‐1 gene resembles the C‐terminal region of the HC‐Pro of the monopartite viruses ( Shukla ., 1998 ). The proteases and cleavage sites that have been identified are also shown in Fig. 1 . Cleavage probably occurs co‐translationally as there is no evidence that the whole polyprotein is produced in vivo , but the different sites are not all processed at the same rate and some intermediate products can be detected ( Merits ., 2002 ). This, and other earlier evidence, indicates that the NIa‐Pro, which cuts at seven of the nine sites, acts in both cis and trans , whereas the P1 and HC‐Pro probably operate only in cis to cleave themselves from the polyprotein as it is translated. 1 Genome organization in the family Potyviridae . The upper diagram shows the ten mature proteins produced by proteolytic cleavage (arrows) of the polyprotein in most members of the family. The two genomic RNAs of the bymoviruses are shown below, aligned to show the homology between RNA1 and the 3′‐section of the monopartite viruses. The positions of the serine protease (S30) and of the two different cysteine proteases (C6 and C4) are indicated. Most of the mature protein products have multiple functions. P1 also has a role in host range determination and the HC‐Pro of potyviruses is required for aphid transmission. The cylindrical inclusion (CI) protein has RNA helicase and NTPase activity and may be involved in cell‐to‐cell movement. The P3 and 6K2 proteins are possibly membrane anchors for the replication complex. The VPg is covalently bound to the 5′‐terminus of the RNA and is a determinant of virulence. The Nuclear inclusion protein b (NIb) is an RNA‐dependent RNA‐polymerase and the coat protein (CP) is the major structural protein of virions and often has roles in symptom production and transmission. In this review, cleavage site patterns are compiled for all sequenced species within the Potyviridae and the presence of unusual sites within the sequences of some members is explained through homology modelling. These sites are sometimes different from those suggested in the original literature or contained within the header information of sequence files. Some patterns and general conclusions are discussed. A list of the 49 fully sequenced species in the family, with their abbreviations, is given in Table 1 . The classification of proteases here is that used in the MEROPS database ( Rawlings ., 2004 ) and the well‐established labelling convention for enzyme binding pockets and substrate residues ( Fig. 2 ; Schechter and Berger, 1967 ) has been adopted. Sequences were obtained from the international databases and the polyprotein amino acid sequences were then aligned using ClustalX ( Thompson ., 1997 ) and manually adjusted where necessary. Where motifs or putative cleavage sites appeared unusual, these were checked with other sequences of the same species, if available. 1 Virus species in the family Potyviridae for which complete sequence data are available. Species Abbreviation Species Abbreviation Genus Potyvirus Genus Potyvirus (contd) Bean common mosaic necrosis virus BCMNV Sugarcane mosaic virus SCMV Bean common mosaic virus BCMV Sweet potato feathery mottle virus SPFMV Bean yellow mosaic virus BYMV Tobacco etch virus TEV Beet mosaic virus BtMV Tobacco vein mottling virus TVMV Chilli veinal mottle virus ChiVMV Turnip mosaic virus TuMV Clover yellow vein virus ClYVV Watermelon mosaic virus WMV Cocksfoot streak virus CSV Wild potato mosaic virus WPMV Cowpea aphid‐borne mosaic virus CABMV Yam mosaic virus YMV Dasheen mosaic virus DsMV Zucchini yellow mosaic virus ZYMV Japanese yam mosaic virus JYMV Johnsongrass mosaic virus JGMV Genus Bymovirus Leek yellow stripe virus LYSV Barley mild mosaic virus BaMMV Lettuce mosaic virus LMV Barley yellow mosaic virus BaYMV Lily mottle virus LMoV Oat mosaic virus OMV Maize dwarf mosaic virus MDMV Wheat yellow mosaic virus WYMV Onion yellow dwarf virus OYDV Papaya leaf distortion mosaic virus PLDMV Genus Ipomovirus Papaya ringspot virus PRSV Sweet potato mild mottle virus SPMMV Pea seed‐borne mosaic virus PSbMV Peanut mottle virus PeMoV Genus Rymovirus Pepper mottle virus PepMoV Agropyron mosaic virus AgMV Peru tomato mosaic virus PTV Hordeum mosaic virus HoMV Plum pox virus PPV Ryegrass mosaic virus RGMV Potato virus A PVA Potato virus V PVV Genus Tritimovirus Potato virus Y PVY Brome streak mosaic virus BStMV Scallion mosaic virus ScaMV Oat necrotic mottle virus ONMV Sorghum mosaic virus SrMV Wheat streak mosaic virus WSMV Soybean mosaic virus SMV 2 Diagrammatic representation of a protease and its substrate showing the nomenclature of Schechter and Berger (1967 ). The substrate residues are labelled progressively from P1 N‐terminal of the cleavage site and P1′, etc., C‐terminal of the cleavage site. The corresponding binding pockets of the protease are labelled from S1 and S1′. THE P1 PROTEASE The first (N‐terminal) mature protein of all the monopartite viruses is the most variable and least conserved region of the genome ( Adams ., 2005 ). However, the C‐terminus of this protein is relatively conserved and was first identified as a serine protease by Verchot . (1991, 1992 ). These viral proteins are now classified as MEROPS Clan SA, family S30 serine proteases. The catalytic triad His‐(X 7−11 )‐Asp‐(X 30−36 )‐Ser [but Glu rather than Asp for viruses of the Bean common mosaic virus (BCMV) subgroup] with Gly‐X‐Ser‐Gly around the active site serine is strictly conserved ( Fig. 3 ). There is no protein of this type in the bymoviruses. The protein cleaves itself from the polyprotein (i.e. at the P1/HC‐Pro junction) but the site has been definitively determined by experiments with only three viruses. In experiments using in vitro translation of the TVMV RNA in a wheat germ system, determination of the N‐terminal amino acid of the HC‐Pro and modifications by site‐directed mutagenesis of the transcription template identified the cleavage site as Phe 274 /Ser 275 ( Mavankal and Rhoads, 1991 ). In similar experiments with TEV, but using modified P1‐GUS constructs, Tyr 304 /Ser 305 was confirmed as the cleavage site ( Verchot ., 1992 ). Work with infectious clones of PVY in protoplasts, demonstrated that the cleavage site was Phe 284 /Ser 285 and that the Phe was essential while the Ser was optimal for this virus ( Yang ., 1998 ). The only other virus for which there is experimental information is WSMV (genus Tritimovirus ). Its cleavage site was mapped using a series of modified templates for in vitro translation to the region between amino acids 348 and 353; alignments suggested that the actual site was probably Tyr 352 /Gly 353 ( Choi ., 2002 ). 3 Sequence alignment of some amino acids at the C‐terminus of the P1 protein of all sequenced members of the Potyviridae showing the active site of the serine protease and the P1/HC‐Pro cleavage site (vertical arrow). The numbers of amino acids between the conserved residues and the amino acid number of the P1′ residue are shown and the three catalytic residues (His, Asp/Glu, Ser) are marked (*). The conserved Gly mentioned in the text is marked as @. Positions where all amino acids are identical (or highly similar) are shaded black and the two levels of grey show positions where 80% or 60% of the residues are identical. Virus abbreviations are as in Table 1 . Viruses for which the cleavage site has been verified experimentally are marked †. The P1/HC‐Pro junction is the one that has most often been wrongly predicted by authors (the PVY site in particular is wrongly defined in most sequence accessions) but alignments produced for this paper show that a consistent pattern can be identified around the cleavage site ( Fig. 3 ). Site P1 is always Phe or Tyr and site P1′ is nearly always Ser. Site P4 is nearly always (42/45) one of the related highly hydrophobic amino acids Ile, Leu, Val or Met and there is also conservation at P2 with either an aromatic residue (25 His, 1 Phe, 2 Trp, 2 Tyr) or a Gln/Glu (12 and 3, respectively). The cleavage site also occurs at a fairly constant distance (22–28 residues) downstream of a conserved Gly residue (marked @in Fig. 3 ). HC‐Pro AND SIMILAR PROTEASES The C‐terminal region of the potyvirus HC‐Pro was first identified as a cysteine protease in TEV by Carrington . (1989 ). These viral proteins are now classified as MEROPS Clan CA, family C6. The sequence Gly‐Tyr‐Cys‐Tyr usually surrounds the active site Cys, and a conserved His associated with catalytic activity has also been identified ( Oh and Carrington, 1989 ). The cleavage site was shown to be Gly/Gly ( Carrington ., 1989 ) and subsequent work using pulse chase labelling to follow the processing of in vitro translation products of modified transcripts showed that the pattern Tyr‐X‐Val‐Gly/Gly was strictly necessary, but that substitutions could be tolerated at positions P5, P3 and P2′ ( Carrington and Herndon, 1992 ). Other potyviruses proved to be extremely similar at this cleavage site and no further experimental work on the cleavage site has been reported. Alignments produced for this paper show very little variation either at the catalytic and cleavage sites, or in the length of the intervening region ( Fig. 4 ). 4 Sequence alignment of some amino acids at the C‐terminus of the HC‐Pro protein of all sequenced members of the Potyviridae showing the active site of the C6 cysteine protease and the HC‐Pro/P3 cleavage site (vertical arrow). This site has been determined experimentally only for TEV. The lower part of the figure shows the corresponding residues in the P2‐1 protein of bymoviruses and the putative P2‐1/P2‐2 cleavage site. The numbers of amino acids between the conserved residues are shown and the two catalytic residues (Cys, His) are marked (*). Positions where all amino acids are identical (or highly similar) are shaded black and the two levels of grey show positions where 80% or 60% of the residues are identical. Virus abbreviations are as in Table 1 . The first mature protein of the RNA2 of bymoviruses is also closely related to the potyvirus HC‐Pro in its C‐terminal region (and is now included in the same protease family). There is a Gly‐Tyr‐Cys‐Tyr (or similar) and a conserved His downstream. This aligns well with the HC‐Pro over this region and it was suggested that the cleavage site in BaYMV is at Val‐Gly/Ser ( Davidson ., 1991 ; Kashiwazaki ., 1991 ). There has been no experimental work on the cleavage sites of bymoviruses but the alignment with sequences now available ( Fig. 4 ) confirms the conservation of this region. Bymovirus cleavage sites show more variation than those of the HC‐Pro; several different amino acids occur at the putative P1′ position, and only the Gly at position P1 is strictly conserved. THE NIA‐Pro These are the best‐studied potyviral proteases. They have a cysteine residue in the active site but structural motifs shared with eukaryotic cellular serine proteases ( Dougherty and Semler, 1993 ). They are now classified as cysteine proteases [MEROPS Clan PA(C), family C4] and are related to the 3C proteases of picornaviruses ( Ryan and Flint, 1997 ; Ziebuhr ., 2003 ). The four active site residues of His, Asp, Cys and His, with Gly‐X‐Cys‐Gly around the active cysteine, are conserved throughout the family ( Fig. 5 ). Distances between these residues and the C‐terminal cleavage site are also very similar, except that the C‐terminus of bymoviruses is shorter. 5 Sequence alignment of some amino acids towards the C‐terminus of the NIa‐Pro protein of all sequenced members of the Potyviridae showing the active site of the C4 cysteine protease and the NIa‐Pro/NIb cleavage site (vertical arrow). The numbers of amino acids between the conserved residues are shown and the four catalytic residues (His, Asp, Cys, His) are marked (*). Positions where all amino acids are identical (or highly similar) are shaded black and the two levels of grey show positions where 80% or 60% of the residues are identical. Virus abbreviations are as in Table 1 . Viruses for which the cleavage site has been verified experimentally are marked †. Published experimental evidence The NIa‐Pro of TEV has been particularly well studied. In a series of elegant experiments, the 49‐kDa protein (the VPg and NIa‐Pro in Fig. 1 ) was identified as a protease and shown to cleave itself from the NIb and also to cleave the NIb/CP junction at a Gln/Ser dipeptide. Sequence alignments suggested that there was a conserved heptapeptide sequence [Glu‐X‐X‐Tyr‐X‐Gln/Ser (or Gly)] at these junctions and also at the 6K1/CI, CI/6K2 and 6K2/VPg boundaries. If the Gln at P1 was changed to Pro or Leu, no cleavage occurred ( Carrington and Dougherty, 1987 ; Carrington ., 1988 ). Introduction of the NIb/CP site (Glu‐Asn‐Leu‐Tyr‐Phe‐Gln/Ser) into a cleavage cassette confirmed the cleavage site as Gln/Ser and showed cleavage of the heptapeptide, but not the dipeptide alone ( Carrington and Dougherty, 1988 ). Amino acid substitutions at the four conserved sites (P6, P3, P1, P1′) showed preferential processing of the wild‐type sequence. At P1, substitution of the Gln with Cys, Gly, Phe or Asn reduced processing and Thr, Arg, Lys, Tyr, His, Ile, Leu, Met or Pro eliminated it. At P3, substitution of the Tyr with Val/Lys, Cys/Met, Ala/Gln/Leu/Phe, Ser, Asn/Glu or Gly progressively decreased processing. At P6, processing was slightly reduced by substitution of the Glu for Pro, Gln, Ser or Lys, rather more with Met, Gly, Val or Ala and almost entirely by Leu or Thr. At P1′, substitution of the Ser by Ile, Arg or Asn had little effect (or even increased processing), Thr, Phe or Cys reduced processing and Asp or Gly eliminated it ( Dougherty ., 1988 ). Amino acid substitutions at the P7 and P2′ positions had minimal effects on cleavage, while substitutions at the non‐conserved positions P5, P4 and P2 often reduced processing but did not eliminate cleavage. It was proposed that there is strong conservation of P6, P3, P1 and P1′ and that variations in P4, and particularly P2, may regulate the rate of cleavage ( Dougherty ., 1989 ). It was then demonstrated that the 49‐kDa protein had an internal cleavage site between the VPg and proteinase domains at a suboptimal cleavage site Glu/Gly ( Dougherty and Parks, 1991 ) and that mutation of Glu to Leu prevented cleavage ( Carrington ., 1993 ). If substitutions were made at this site (Glu‐Asp‐Leu‐Thr‐Phe‐Glu/Gly) to come closer to the optimal consensus sequence (if P3 was changed to Tyr and/or P1 changed to Gln), the site was cleaved more efficiently but this had a detrimental effect on genome amplification, showing that there was a requirement for suboptimal cleavage at this site, presumably because of some role for the uncleaved protein ( Schaad ., 1996 ). Studies with recombinant TEV NIa‐Pro also showed that 24 amino acids were spontaneously lost from the C‐terminus of the protein and that this substantially decreased proteolytic activity ( Parks ., 1995 ). The highly specific nature of the protease/substrate recognition was demonstrated in experiments in which the TEV NIa‐Pro was shown to cleave its own NIb/CP site (Glu‐Asn‐Leu‐Tyr‐Phe‐Gln/Ser) but not that predicted for TVMV (Glu‐Thr‐Val‐Arg‐Phe‐Gln/Ser) and vice versa. After construction of hybrid proteases, specificity was shown to reside in three domains of the protein ( Parks and Dougherty, 1991 ). In recent years, the high specificity of the TEV NIa‐Pro has been exploited for protein engineering and there has been renewed interest in the specificity of its cleavage. Under digestion conditions probably more realistic than those used in earlier experiments ( Dougherty ., 1988 ) many different amino acids were tolerated at the P1′ position, although some residues (e.g. Glu, Leu and Lys) gave less than optimal cleavage ( Kapust ., 2002 ). The subsequent resolution of the crystal structure of the TEV NIa‐Pro in association with its substrate has now provided detailed information on the positions of the substrate binding pockets ( Phan ., 2002 ). It was shown that P6, P4, P3, P2, P1 and P1′ all make contact with the enzyme active site, and the specific amino acid interactions involved have been identified. Because the NIa‐Pro is such a highly conserved protein, these data should have predictive value for other members of the Potyviridae. In experiments with recombinant NIa‐Pro of TuMV, an internal self‐cleavage site (Gln‐Ala‐Ser‐Gln‐Pro‐Ser 223 /Gly 224 ) near the C‐terminus of the NIa‐Pro was demonstrated (removing the last 20 amino acids). In contrast to results with TEV ( Parks ., 1995 ), the processing speed at the 6K1/CI junction was little affected ( Kim ., 1995 ) but it is not known whether this cleavage occurs in vivo . Examination of the sequence alignment shows that the TuMV and TEV internal cleavage sites occur on different sides of a Pro residue that is conserved throughout the genus Potyvirus (results not shown), so these effects may not be strictly comparable. A second spontaneous internal cleavage site (Thr 207 /Ser 208 ) that removed the C‐terminal 36 amino acids abolished processing activity ( Kim ., 1996 ). Mutation of the P4 Val to Asp or of the P1 Gln to Ala in the NIb/CP junction sequence (Thr 238 ‐Ala‐Val‐Tyr‐Ala‐Gln/Thr) greatly decreased processing but other changes in the region 218–237, including deletions (Lys 230 , Ser 229 ‐Lys 230 , Ser 229 ‐Leu 231 , or Ser 229 ‐Asp 234 ), insertions (5 × Gly at Ser 229 /Lys 230 or Ser 220 /Gln 221 ) and mutations (Phe 226 , Ile 232 or Leu 235 to Asp; Val 228 or Lys 230 to Glu), had little effect on the processing rate ( Kim ., 1998 ). A genetic assay was used to examine the substrate specificity of the 6K1/CI junction (Pro‐Thr‐Val‐Tyr‐His‐Gln/Thr). Random amino acid substitutions were created at sites P6–P4, P3–P2 and P1–P1′ and the sequences of those that produced products cleaved by the NIa‐Pro were determined. At P4, most of the sequences were Val (57%) or Cys (30%), all sequences contained His at P2, 73% had Gln at P1 and 50% had Ser at P1′. Although other amino acids could be tolerated at many of the positions, there was a strong bias towards the TuMV consensus cleavage sequence ( Kang ., 2001 ). Work with PPV showed that the NIa‐Pro cleaved the putative NIb/CP site at Gln/Ala and that Pro/Ala was not cleaved ( Garciá., 1989 ). Later work provided the first identification of a P3/6K1 site; mutation of the Gln at P1 to His abolished cleavage and changes at P3′ also affected processing efficiency, but it appeared that this site was only partially processed ( Garciá., 1992 ); processing was not essential for infectivity but did affect symptom production ( Riechmann ., 1995 ). With TVMV, mutation of any of the consensus P4 to P1 residues (Val‐Arg‐Phe‐Gln) to Gly abolished activity. Reduction of the target substrate sequence from ten to six amino acids before the cleavage site had little effect on processing, but reduction to four decreased activity substantially even though P6 is not conserved in this virus ( Yoon ., 2000 ). Mutation of the TVMV P4 Val to Ala or Leu prevented cleavage, while substitution of the P3 Arg with either Phe or Tyr greatly decreased activity ( Tözsér ., 2005 ). With PVA, mutation of the P4 Val to Ala at the CI/6K2 cleavage site (Glu‐Ala‐Val‐Gln‐Phe‐Gln/Ser) completely abolished cleavage of the site in a baculovirus expression system, while mutation of the P6 Glu to Gly at the 6K2/VPg site (Glu‐Val‐Val‐Ala‐Phe‐Gln/Ser) slowed down, but did not prevent, cleavage ( Merits ., 2002 ). The P3/6K1, CI‐6K2 and VPg/NIa‐Pro junctions were processed more slowly than the other sites. Although there is experimental evidence for only five viruses, there is a sufficient pattern in their amino acid sequences for the cleavage sites to be predicted with confidence for most members of the family. The most difficulty arises with SPMMV, the only fully sequenced member of the genus Ipomovirus , but re‐examination of the data has suggested that it may have a novel cleavage site, as discussed below. A summary of the frequency of each amino acid in each position around the cleavage site for the 49 fully sequenced species in the family is shown in Table 2 . Some major patterns shown by the data are discussed below and related where possible to the active site–substrate interactions deduced from resolution of the crystal structure of TEV NIa‐Pro ( Phan ., 2002 ). 2 Occurrence of each amino acid at each position around the NIa‐Pro cleavage site for fully sequenced species in the family Potyviridae ( n = 343 from 49 sequences × 7 cleavage sites). The final column shows the numbers of amino acids expected at each position if they were used at random in proportion to their total numbers in the polyproteins analysed. Numbers underlined differ significantly ( P < 0.01) from random using the Genstat procedure BNTEST to test differences between two proportions ( Payne and Arnold, 2003 ). Amino acid Position P6 P5 P4 P3 P2 P1 P1′ P2′ P3′ Random Ala A 15 25 5 15 8 0 96 17 12 23 Cys C 2 8 7 5 2 0 1 1 1 6 Asp D 44 30 0 11 0 0 2 15 64 19 Glu E 114 56 0 52 14 83 0 9 30 22 Phe F 6 12 8 6 51 0 0 3 0 15 Gly G 8 15 2 9 0 0 81 37 12 20 His H 3 7 0 20 138 6 2 1 10 10 Ile I 9 14 22 13 3 0 0 4 2 20 Lys K 12 12 0 24 0 0 1 136 29 24 Leu L 8 19 7 4 44 0 0 43 2 30 Met M 4 4 4 5 7 0 3 3 1 10 Asn N 23 16 0 4 2 0 5 9 25 17 Pro P 11 24 1 0 2 0 0 0 8 13 Gln Q 17 11 0 21 4 253 0 5 21 13 Arg R 5 8 1 40 0 1 3 20 16 18 Ser S 21 22 3 29 5 0 138 28 75 23 Thr T 22 31 2 25 18 0 9 5 33 21 Val V 13 18 274 29 12 0 2 5 2 23 Trp W 0 1 1 1 2 0 0 0 0 4 Tyr Y 6 10 6 30 31 0 0 2 0 12 Cleavage sites of SPMMV The authors of the full sequence of SPMMV were unable to suggest exact cleavage sites for the NIa‐Pro/NIb and NIb/CP sites because of the absence of the usual (Gln or Glu)‐(Ala, Ser or Gly) dipeptides ( Colinet ., 1998 ). Two possible positions for the SPMMV NIb/CP site were then suggested in a study of some partial sequences giving cleavage at either a Gln/Arg (Thr‐Ile‐Thr‐Val‐Val‐Gln/Arg) or a Glu/Pro (Phe‐Asp‐Val‐Tyr‐Val‐Glu/Pro) dipeptide ( Mukasa ., 2003 ). However, a partial sequence of a second ipomovirus (CVYV; Lecoq ., 2000 ) can be aligned in the region around the NIb/CP site and this suggests that SPMMV might be cleaved at an His/Ala dipeptide ( Fig. 6A ). Eleven partial sequences of SPMMV have identical amino acids in this region (data not shown). Re‐examination of other cleavage sites in the full SPMMV sequence then shows that at most, if not all, of the sites cleaved by the NIa‐Pro, a revised cleavage site can be identified ( Fig. 6B ). These sites are all in positions consistent with the alignment for other viruses in the family and, apart from the novel P1 amino acid, provide a better consensus than those previously suggested. There is conservation at P4 that is typical of most viruses in the family (see below), Glu (or the similar Asp) at P3, while the conserved His at P1 can be explained by a difference in the amino acids that form the enzyme S1 pocket (see below). 6 Alignment of amino acids around the proposed sites cleaved by the NIa‐Pro in SPMMV (A) the NIb/CP site aligned with that for CVYV; (B) the other probable SPMMV cleavage sites. The numbers are the positions of the proposed cleavage sites from the start of the polyprotein. Position P6 The importance of P6 was first suggested by the conservation of Glu at this position in TEV. This was probably fortunate for the progress of research as there is strict conservation of P6 only in TEV, LMoV and PLDMV (all as Glu). In other species it is impossible to detect any pattern, although Glu was generally the most frequent residue ( Table 2 ). Position P5 There is no recognizable pattern in the amino acids at this position and, of all the sites examined, this is the one where amino acid use is closest to that of the complete polyproteins. These results are consistent with the experimental evidence that this site plays little or no role in recognition by the enzyme and that there is no S5 pocket in the TEV protease ( Phan ., 2002 ). Position P4 The importance of P4 is evident from the experimental work discussed above. Val occurs at almost 80% and other highly hydrophobic residues account for most of the remainder. Most members of the genus Potyvirus have a strictly conserved Val at P4 and TEV is unusual with its mixture of Val, Ile and Leu. The other exceptions within the genus are BYMV and ClYVV (two closely related viruses) which have Tyr, Phe, Leu and Cys while LYSV has Met, Phe or Trp. Ile is frequent at P4 in the bymoviruses. Pocket S4 of TEV particularly involves the residues Ala 169 , Asn 171 , Tyr 178 and His 214 ( Phan ., 2002 ). While most members of the family have an aromatic Tyr or Phe in the position corresponding to 178 on TEV, the three potyviruses BYMV, ClYVV and LYSV have Val, Ile and Met, respectively ( Fig. 7 ). The effect of such substitutions in BYMV was examined by protein modelling and this confirmed the complementary nature of the changes at the P4 position and in the S4 pocket amino acids ( Fig. 8 ). Using the X‐ray diffraction structure for the TEV NIa protease ( Phan ., 2002 ) as a template, we constructed a homology model of the BYMV protease and analysed the effect of a P4 Leu→Tyr substitution in both proteins. Inspection of the TEV structure revealed that the S4 pocket is shaped as a bent finger. It is lined by hydrophilic surfaces due to His 214 and Tyr 178 on the outside, and by hydrophobic surfaces due to Val 216 and Ala 169 on the inside and to Phe 139 at the bottom. Although in the TEV structure the pocket contains a Leu residue ( Fig. 8A ), in silico mutagenesis showed that it may also accommodate Ile and Val residues (data not shown). However, our results also suggested that it is too shallow to accommodate residues with aromatic side chains such as Phe or Tyr ( Fig. 8B ). In contrast, in the BYMV protease His 214 →Ile and Tyr 178 →Val substitutions create a deeper more linear and more hydrophobic S4 pocket that may accommodate P4 Leu and Trp residues ( Fig. 8C,D ). Interestingly, our study also suggested that due to its narrower and more linear shape the S4 pocket in our BYMV model is less likely to accommodate residues with side chains branched at the C‐beta position such as Ile and Val (data not shown). 7 Alignment of some amino acids towards the C‐terminus of the NIa‐Pro protein of selected members of the Potyviridae to show residues involved in substrate pockets S1, S2 and S4 as discussed in the text. The upper line shows the TEV sequence to which the crystal structure refers ( Phan ., 2002 ). Selected members of the genus Potyvirus and consensus sequences for the genera Potyvirus , Rymovirus , Tritimovirus and Bymovirus are shown as well as the sequences for SPMMV, genus Ipomovirus , and Cardamom mosaic virus (CdMV), genus Macluravirus . Residues discussed in the text are shaded. Virus abbreviations are as in Table 1 . 8 Molecular models showing substrate molecules (stick and ball) within the S4 substrate binding pocket of potyviral NIa‐Pro proteases using an X‐ray diffraction structure available for the TEV NIa protease (PDB accession 1LVB; Phan ., 2002 ) as a template. This structure (A) is a complex between the catalytically inactive C151A TEV mutant and the substrate peptide TENLYFQSGT, allowing the detailed inspection of its interactions with the enzyme binding sites. Variant substrate peptides were generated by replacing side chains in the substrate peptide from the crystal structure. Hydrogen atoms were added consistent with pH 7 and side chain clashes were avoided by choosing low‐energy rotamers. The models were surrounded by a 5‐Å layer of water molecules and the energy of the structure minimized for 1000 cycles of conjugate gradient minimization using the CHARMM forcefield and CHARMm v.29b1 (Accelrys Inc., San Diego, CA). The stereochemical quality of the models was assessed with the Biotech validation suite for protein structures and Procheck v.3.5 ( Laskowski ., 1993 ) available at http://biotech.ebi.ac.uk:8400/ . Protein structures were visualized and manipulated using INSIGHT II (Accelrys Inc.) on an SGI O2 workstation. Hydrophobic residues lining the pocket are shown in red. In the TEV structure the S4 pocket contains a Leu P4 residue (A) but seems to be too shallow to accommodate a Tyr residue (B). In a model of the BYMV protease a wider and more hydrophobic S4 pocket may accommodate both Leu (C) and Tyr (D) P4 residues. In a recent study using a similar modelling approach, the TVMV S4 pocket was shown to be shallower than that of TEV because TVMV has Leu at the position corresponding to TEV Ala 169 ( Tözsér ., 2005 ). This appears to explain why TVMV accommodates only Val at P4, whereas TEV uses Val, Ile and Leu residues. Our alignments show that nearly all species in the genus Potyvirus resemble TVMV in having a Leu residue in their S4 pocket ( Fig. 7 ), which would explain why TEV is unusual. Position P3 It is difficult to detect any consistent patterns within or between species at this position. Position P2 Experimental evidence showed that TEV could tolerate different amino acids at this position ( Dougherty ., 1989 ) but suggested that His was a strict requirement for TuMV ( Kang ., 2001 ). This is reflected in some distinctive patterns amongst species at this position ( Table 3 ). A substantial number of species (20/37 members of the genus Potyvirus ) have a conserved His at most sites but are different at the Pro/NIb site. It is possible that the Pro/NIb site is different because it is cut in cis , while the others are cut in trans , although in the genus Rymovirus His is conserved except at the 6K2/VPg site. Other potyviruses (PeMoV, BtMV, BYMV, ClYVV, TVMV, PVA, LMoV, LYSV, OYDV) or tritimoviruses show a strong preference for the aromatic residues Tyr and/or Phe and the bymoviruses for Leu. In contrast, TEV, PSbMV and the viruses of the BCMV subgroup show no strong pattern at P2 although hydrophobic residues are used almost exclusively. It has been pointed out earlier that these patterns often correspond to phylogenetic groupings ( Chen ., 2001 ). Pocket S2 of TEV particularly involves the four hydrophobic residues Val 209 , Trp 211 , Val 216 and Met 218 ( Phan ., 2002 ). This region is also fairly well conserved within the family but bymoviruses are notably different with a Cys corresponding to position 209 and a shorter C‐terminus to the protein that disrupts alignment after position 211 (a strongly conserved Trp or Tyr) ( Fig. 7 ). This might provide an explanation for the use of Leu by bymoviruses rather than the more usual aromatic residue at P2. We have examined the other patterns of residue preference at P2 by modelling but have been unable to discover any obvious structural basis for these patterns. 3 Amino acid at the P2 position of each NIa‐Pro cleavage site for each fully sequenced species in the family Potyviridae . P3/6K1 6K1/CI CI/6K2 6K2/VPg VPg/NIa‐Pro NIa‐Pro/NIb NIb/CP Potyvirus ZYMV T L L V L T L BCMV V M L T V T L BCMNV T P L T L V T SMV A V L T M V L WMV A V L T V V L CABMV H V L T V T L DsMV V I F A M L L PeMoV Y Y Y Y T Y Y BtMV Y Y Y Y E E Y PSbMV I C C L T T L PRSV H H H H H E H BYMV F F F F F F F ClYVV M F F F F F F PVY H H H H H E H PepMoV H H H H H E H PTV H H H H H E H PVV H H H H H E H WPMV H H H H H E H TEV E T L F F S F LMV H H H H H F H TVMV F F F F F T F PVA F F F F F T F JYMV H H H H H Q H TuMV H H H H H A H ScaMV H H H H H A H PPV H H H H H T H LMoV F F F F F L F SPFMV H H H H H A H YMV H H H H H A H PLDMV H H H H F A H ChiVMV H H H H H E H LYSV F H Y Y F F Y SCMV H Q H H H E H SrMV H H H H H E H MDMV H H H H H E H JGMV H H H H H N H CSV H H H H H M H OYDV Y Y Y Y F F Y Rymovirus RGMV H H H L H H H AgMV H H H F H H H HoMV H H H F H H H Tritimovirus BStMV F Y Y F F F F ONMV Y Y Y Y Y W Y WSMV Y Y Y F F W Y Ipomovirus SPMMV Q S S Q S S P Bymovirus BaMMV L L L I L M L BaYMV L L L L L M L WYMV L L L L L N L OMV L L L L L E L Position P1 The strong conservation of Gln at this site is well documented and, for many species, the only exception is the suboptimal Glu at the more slowly processed VPg/NIa‐Pro junction. Tritimo‐ and rymo‐viruses also have Glu at the 6K2/VPg and NIb/CP junctions, while JGMV has Glu at all positions except the NIb/CP. It could be significant that Glu appears to be more favoured in those viruses with graminaceous hosts. The residues Thr 146 and His 167 were shown to be involved in the formation of TEV pocket S1 and thus interactions with P1 ( Phan ., 2002 ), confirming earlier predictions for Pepper vein‐banding virus (= ChiVMV) ( Joseph and Savithri, 2000 ). These residues occur five upstream and 16 downstream of the active site Cys and are conserved in these exact positions virtually throughout the family ( Fig. 7 ), explaining the almost universal preference for Gln at P1. The only exception to His is the Asn in SPMMV and this substantial change to a different type of amino acid lends strong support to the suggestion (above) that this virus has a different preference (i.e. for His) at P1. We further substantiated this suggestion by homology modelling, using the TEV NIa protease ( Phan ., 2002 ) as a template ( Fig. 9 ). Two substrate peptides were docked in this model: ENLYFQ/SGT, which was crystallized with TEV NIa‐Pro, and LYVEQH/GKK, which corresponds to the probable P3/6K1 cleavage site in SPMMV and which was modelled using the structure of ENLYFQ/SGT as template. In the TEV protease S1 pocket, the four possible hydrogen bonds for the side chain of the Gln P1 residue are satisfied by interaction with Thr 146 , Asp 148 and His 167 , resulting in a very favourable rotamer energy ( Fig. 9A ). In the same structure the side chain of the P3 Tyr residue is also hydrogen bonded by Asp 148 and Asn 174 . After substitution of the ENLYFQ/SGT substrate with LYVEQH/GKK ( Fig. 9B,C ) modelling suggested that in TEV the S1 pocket would be unable to accommodate the P1 His residue present in the SPMMV cleavage sites. The presence of Asp 148 also created an unfavourable electrostatic interaction with P3 Glu. In contrast, after energy minimization the P1 His of the same substrate peptide fitted well in the S1 pocket of the SPMMV NIa‐Pro model, with the most favourable rotamer energy of 1.31 kJ/mol ( Fig. 9D ). This low but positive energy resulted from suboptimal electrostatic interactions with surrounding residues (including one unsatisfied hydrogen bond) but was more than compensated for by the presence of Arg 170 (instead of Ser 170 in TEV), which strongly interacts with the P3 glutamate, resulting in a best rotamer energy of −18 kJ/mol for this residue and −70 kJ/mol for Arg 170 . The negative interaction observed in TEV between P3 Glu and Asp 148 did not occur as SPMMV contains a Leu residue at this position. 9 Molecular models showing substrate molecules (stick and ball) within the S1 substrate binding pocket of potyviral NIa‐Pro proteases using the TEV structure (A) as a template ( Phan ., 2002 ). See legend to Fig. 8 for experimental details. Hydrophobic residues lining the pocket are shown in red. (A) ENLYFQ/SGT (the TEV substrate peptide) in the TEV binding pocket. The side chain of the P1 glutamine residue fits in the S1 pocket with the most favourable rotamer energy of −30 kJ/mol. (B,C) LYVEQH/GKK (the putative SPMMV substrate peptide) in the TEV binding pocket viewed from two different angles. The side chain of the P1 His residue does not fit well and the two lowest energy rotamers were +1546 kJ/mol and +647 kJ/mol because of VanderWalls clashes with His 167 (B) and Ser 170 (C), respectively. (D) LYVEQH/GKK substrate peptide in a homology model of the proposed SPMMV binding pocket. After energy minimization the P1 His fitted in the S1 pocket with the most favourable rotamer energy of 1.31 kJ/mol. Position P1′ Although experimental work has sometimes shown that many amino acids can be tolerated at this position, in practice > 90% of the positions have one of the small amino acids, i.e. Ala, Gly or Ser ( Table 2 ). However, there are distinctive patterns amongst the different cleavage sites with a majority of Ala at P3/6K1, of Gly at 6K2/VPg (although rarely at NIb/CP) and of Ser at 6K1/CI and CI/6K2 ( Table 4 ). The reason for these preferences is not known. 4 Occurrence of the most frequent amino acids in the P1′, P2′ and P3′ positions at each NIa‐Pro cleavage site for the 49 fully sequenced species in the family Potyviridae . Position Site Amino acid Ala Asp Glu Gly Lys Leu Asn Arg Ser Thr A D E G K L N R S T Other P1′ P3/6K1 32 2 0 6 0 0 0 3 6 0 0 6K1/CI 7 0 0 3 0 0 0 0 37 1 1 CI/6K2 5 0 0 5 1 0 3 0 30 5 0 6K2/VPg 7 0 0 41 0 0 0 0 1 0 0 VPg/Pro 12 0 0 12 0 0 1 0 22 1 1 Pro/NIb 7 0 0 13 0 0 1 0 21 2 5 NIb/CP 26 0 0 1 0 0 0 0 21 0 1 P2′ P3/6K1 0 0 0 0 38 0 1 3 7 0 0 6K1/CI 0 0 0 7 0 32 1 0 2 0 7 CI/6K2 0 3 0 0 35 0 0 6 0 0 5 6K2/VPg 6 0 1 0 32 2 1 1 2 3 1 VPg/Pro 4 0 4 6 9 4 1 6 6 0 9 Pro/NIb 4 11 0 19 3 0 4 2 6 0 0 NIb/CP 3 1 4 5 19 5 1 2 5 2 2 P3′ P3/6K1 1 3 0 2 8 0 3 8 10 9 5 6K1/CI 1 37 3 0 0 0 2 0 1 2 3 CI/6K2 1 9 6 0 1 0 5 0 9 4 14 6K2/VPg 3 0 1 9 10 0 10 3 11 1 1 VPg/Pro 2 0 0 0 0 1 1 2 36 7 0 Pro/NIb 1 7 5 0 6 0 3 3 5 2 17 NIb/CP 3 8 15 1 4 1 1 0 3 8 5 Positions P2′ and P3′ There is little experimental evidence that these positions are significant for the efficiency of cleavage, but there are some distinctive patterns in the amino acids that occur ( Table 4 ). In particular, there is a high frequency of Lys at P2′ for four of the cleavage sites, but of Leu at the 6K1/CI site. At P3′, Asp predominates at the 6K1/CI site and Ser at the VPg/NIa‐Pro site. As the experimental work has used peptide substrates that do not necessarily have the same three‐dimensional conformation as the native polyprotein, these residues may yet be shown to play a role in enzyme‐substrate recognition. Alternatively, they may relate more to the function of the N‐terminus of the protein downstream of the cleavage site. The NIb/CP cleavage site For many viruses in the family Potyviridae there are no complete genome sequences, but there are often partial sequences of the 3′‐end of the genome that frequently include the NIb/CP cleavage site. In total, there are data for 113 species at this cleavage site and these are summarized in Table 5 . Most of these sequences (91) are from the genus Potyvirus and some of the less frequent amino acids occur because of different patterns in other genera. The most distinctive are Glu /Ala for the genus Rymovirus , Cys ‐X‐X‐ Glu /Ser for the genus Tritimovirus , Phe (or Leu )‐Gln/ Met ‐Asp for the genus Macluravirus and Ile (or Thr )‐X‐ Leu ‐Gln/Ala for the genus Bymovirus . Amino acids do not appear to be used at random at any of the positions examined. 5 Occurrence of each amino acid at each position around the NIb/CP cleavage site for 113 species sequenced in this region in the family Potyviridae . The final column shows the numbers of amino acids expected at each position if they were used at random in proportion to their total numbers in the coat proteins of all sequences analysed. Numbers underlined differ significantly (P < 0.01) from random using the Genstat procedure BNTEST to test differences between two proportions ( Payne and Arnold, 2003 ). Amino acid Position P6 P5 P4 P3 P2 P1 P1′ P2′ P3′ Random Ala A 2 1 0 2 0 0 45 6 4 9 Cys C 3 3 5 1 0 0 0 0 0 1 Asp D 19 15 0 2 0 0 0 38 15 7 Glu E 32 31 1 8 0 8 2 0 29 8 Phe F 3 3 3 9 20 0 1 0 0 3 Gly G 2 0 0 0 1 0 12 39 1 7 His H 0 1 0 13 51 1 0 0 1 3 Ile I 6 4 7 1 0 0 0 1 1 5 Lys K 2 3 0 3 0 0 0 7 13 7 Leu L 8 3 4 1 28 0 0 1 4 7 Met M 1 4 0 3 2 0 5 0 0 5 Asn N 3 4 0 0 0 0 0 7 4 7 Pro P 4 2 0 0 1 0 0 0 2 5 Gln Q 3 3 3 2 0 103 0 2 5 5 Arg R 1 2 0 10 0 1 0 1 4 7 Ser S 6 12 0 15 1 0 44 9 8 7 Thr T 5 8 2 4 1 0 1 2 17 8 Val V 5 6 88 14 0 0 3 0 5 7 Trp W 0 1 0 1 0 0 0 0 0 1 Tyr Y 8 7 0 24 8 0 0 0 0 3 80% consensus Val His Gln Ala Ile Leu Ser Phe Gly 50% consensus Glu Glu Val Tyr His Gln Ala Asp Glu Asp Asp Val Leu Ser Gly Thr Leu Ser Ser Lys Tyr Arg CONCLUSIONS The paper has provided a rigorous and holistic examination of the patterns around the different polyprotein cleavage sites in the family Potyviridae . Explanations for these patterns have been suggested and some have been supported by detailed molecular modelling. The data also provide a standard for the prediction of these sites as further virus sequences become available. Further analyses have been done to provide the probable cleavage sites on all published sequences within the family. The results have been used to inform and improve a comprehensive database of plant virus sequences (DPVweb at http://www.dpvweb.net ; Adams and Antoniw, 2004 ) and the cleavage sites identified have also been placed on a dedicated public website ( http://www.rothamsted.bbsrc.ac.uk/ppi/links/pplinks/potycleavage/index.html ), which is updated regularly. ACKNOWLEDGEMENTS We thank Alan Todd for statistical analyses and Kim Hammond‐Kosack for helpful comments on the manuscript. Rothamsted Research receives grant‐aided support from the Biotechnology and Biological Sciences Research Council of the UK.

Journal

Molecular Plant PathologyWiley

Published: Jul 1, 2005

There are no references for this article.