May 16, 2007 - A system, method, and computer readable medium for. (21) Appl. No.: 11/803,933 .... tracking and payment of vehicular tolls using automated ... conventional means has had a negative effect upon highway throughput and ...
May 20, 2002 - 0006 Oriented large-volume single crystals in spite of all usually have no homogeneous optical and mechanical prop erties. Crystals of this ...
Jul 18, 1994 - Starter-Generator for Aircraft Engine Applicationsâ, 1989. NAECON, May 1989, pp. 1758-1764. Richter, "Switched Reluctance Machines for ...
A method for monitoring and reporting information regard. G06F 17/30 ..... EI];. :2 work,â IEEE- Proceedings of Inter. Performance, Computing, and. GB. 2343036 A. 40000 .... over the Internet, such as by servers, e.g., hosting Web pages.
"Fame" Wild Charming Fame at Real Pearl (Border Collie) (A1 Large) - 1110. PAULUS .... SCHNUR, Gerhard - DVG - HSV Urexweiler - 142484. 00,00 s. 0. 0.
EXERCISE 3) This is a conversation between Ana and Jorge. What does Jorge say to Ana? Complete the dialogue. For questions 1-5, write the correct letter A-F ...
future by donating your winningto'+'Amnesty Interna tional linked styled button (modes: enabled)+an image of an African child smiling. âOKâ button at the bottom ...
Oct 31, 2015 - GERHARD, Astrid - DVG - MV St. Wendel - 87630. 37,35 s. 0. 0 ... "Fame" Wild Charming Fame at Real Pearl (Border Collie) (A1 Large) - 1110.
May 31, 2002 - supply flatten over a prolonged period of time after drinking ... kidney threshold, the sugar starts to leak into the urine. To .... pregnant glucose test, which means no more than 10% by weight of ... without significant leakage throu
Aug 10, 2015 - Scopoiamine-induced amnesia T-Maze test. 50 ....  Other important parameters controlling BBB pen ... of the drug is limited to 16-24 mg per day, and this dose is ...... their target molecules or are first enzymatically liberated
(19) United States
(2) Patent Application Publication (10) Pub. No.: US 2009/0215172 A1 Schmidt et al.
(43) Pub. Date:
(54) METHODS AND COMPOSITIONS RELATED TO CYCLIC PEPTIDE SYNTHESIS
Eric W. Schmidt, Salt Lake City, UT (US); Brian Hathaway, Salt Lake City, UT (US); James T.
Related U.S. Application Data (60) Provisional application No. 60/777,954, filed on Mar. 1, 2006. Publication ClassificatiIOIl UlDIIcalIOIl UIa?SIIIcal
Nelson, Cleveland Heights, OH
(51) Int. Cl.
Correspondence Address: Ballard Spahr Andrews & Ingersoll, LLP SUITE 1000, 999 PEACHTREE STREET ATLANTA, GA 30309-3915 (US)
METHODS AND COMPOSITIONS RELATED TO CYCLIC PEPTIDE SYNTHESIS I. CROSS-REFERENCE TO RELATED APPLICATIONS
 1. This application claims benefit of U.S. Provi sional Application No. 60/777,954, filed Mar. 1, 2006, which is hereby incorporated herein by reference in its entirety. II. ACKNOWLEDGEMENTS
 2. This invention was made with government sup port under federal grant NIH RO1 GMO7142501A1 awarded by the NIH and NSF EF-0412226 subcontract from the Insti tute for Genomic Research. The Government has certain
rights to this invention. III. BACKGROUND
 3. Prochlorom spp. are obligate cyanobacterial sym bionts of many didemnid family ascidians. It has been pro posed that the cyclic peptides of the patellamide class found in didemnid extracts are synthesized by Prochloron sp., but studies in which host and symbiont cells are separated and chemically analyzed to identify the biosynthetic source have yielded inconclusive results. As part of the Prochloron didemni sequencing project, patellamide biosynthetic genes were identified, and their function confirmed by heterologous expression of the whole pathway in Escherichia coli. The primary sequence of patellamides A and C is encoded on a single open reading frame that resembles a precursor peptide. This pre-patellamide is heterocyclized to form thiazole and oxazoline rings, and the peptide is cleaved to yield the two cyclic patellamides, A and C.  4. Marine invertebrates, particularly sponges and ascidians, are well known for their production of bioactive natural products (Newman et al. (2005) Mol. Cancer. Ther. 4, 333-342. A major hurdle in the development of many of these agents into drugs has been their supply, since collection or aquaculture of marine invertebrates pose many difficulties and may not be environmentally acceptable. Because marine invertebrate compounds often resemble molecules isolated from bacteria, many compounds are synthesized by symbi otic bacteria and not by the animals themselves (Faulkner et al. (1993) Gazz. Chim. Ital. 123, 301-307; Kobayashi et al. (1993) Chem. Rev. 93, 1753-1770; Sings et al. (1996) J. Ind. Microbiol. Biot. 17, 385-396; Haygood et al. (1999).J. Mol. Microbiol. Biot. 1, 33-34). Recently, these early speculations have been borne out in the cloning and sequencing of genes from two symbiotic natural product pathways (Piel et al. (2004) Proc. Natl. Acad. Sci. USA 101, 16222-16227; Hilde brand et al. (2004) Chem. Biol. 11, 1543-1552), opening a new era in marine natural products discovery and develop ment.
 5. Ascidians in the family Didemnidae contain numerous structural classes of cyclic peptides and harbor symbiotic cyanobacteria, Prochloron spp. (FIG. 15) (Withers et al. (1978) Phycologia 17, 167-171; Lewin, R.A. & Cheng, L. (1989) (Chapman and Hall, New York)). Despite nearly 30 years of attempts, Prochloron sp. have eluded cultivation and are thus considered to be obligate symbionts. Prochloron sp., unlike the vast majority of cyanobacteria but like plants, use both chlorophylls a and b for photosynthesis, lack phycobi lins, and have plant-like thylakoids (Withers et al. (1978) Proc. Natl. Acad. Sci. USA 75, 2301-2305). The cells are
Aug. 27, 2009 relatively large for bacteria (10-20 in diameter). Prochlo ron has also been implicated in the biosynthesis of cyclic peptides isolated from whole didemnid ascidians. In early cell-separation studies, it was reported that the peptides were localized in Prochloron cells (Degnan et al. (1989) J. Med. Chem. 32, 1349-1354; Biard etal. (1990).J. Mar. Biol. Assoc. UK 70, 741-746), but a later investigation found the mol ecules distributed throughout the ascidian tunic, as well as in the cyanobacteria (Salomon, C. E. & Faulkner, D.J. (2002).J. Nat. Prod. 65,689-692). Because of the unique biological and chemical features of the Prochloron-ascidian symbiosis, a project was initiated to sequence the genome of Prochloron didemni, isolated from the ascidian Lissoclinum patella.  6. The patellamides and trunkamide (another didemnid product) are peptides that exemplify both the unique structural features and potent bioactivities of didem mid ascidian natural products (FIG. 15). Both groups have clinical usefulness, since patellamides are typically moder ately cytotoxic, and patellamides B, C, and D reportedly reverse multidrug resistance (Williams et al. (1993) Cancer Lett. 71,97-102; Fuetal (1998).J. Nat. Prod. 61, 1547-1551), while trunkamide was initially isolated because of specific and unusual activity against the multidrug resistant UO-31 renal cell line (Carroll, A. et al (1996) Aust. J. Chem. 49, 659-667). Patellamides are characteristically composed of pseudo-symmetrical, cyclic dimers, with each substructure having the sequence thiazole-nonpolar amino acid-oxazo line-nonpolar amino acid. Trunkamide and related molecules often contain proline, thiazolines, and prenylated serine and threonine derivatives. These features can result from either a
ribosomal or a nonribosomal peptide biosynthetic pathway, since precedents exist for heterocyclization and cyclization in both cases (Gehring et al. (1998) Biochemistry 37, 11637 11650; et al. (2000) Nature 407, 215-218, Li et al. (1996) Science 274, 1188–1193; Solbiati et al. (1999) J. Bacteriol. 181, 2659-2662). The nonribosomal hypothesis of patella mide biosynthesis was investigated using a homology-based approach (Schmidt etal (2004).J. Nat. Prod. 67, 1341-1345). Only a single nonribosomal peptide synthetase (NRPS) gene was identified in fosmid clones, but the gene was found in only a few strains, and its presence did not correlate with patellamide production.  7. Bacterial secondary metabolites are bioactive small molecules that often find use as pharmaceuticals. (New man et al. J. Nat. Prod. 66, 1022-1037 (2003)). Numerous studies of secondary metabolite biosynthetic genes have led to an increasing ability to synthesize new small molecules through rational pathway engineering (Floss J. Biotechnol. epub (2006); Walsh, C.T. ChembioChem, 124-134 (2002)). Much of this capability comes from gene sequence compari son, in which the observation of evolution of these pathways has enabled engineering. Despite the advances, a weakness of this approach is that most described pathways are relatively distantly related, making an analysis of single evolutionary events difficult to discern. This difficulty is compounded by the large number of dedicated enzymatic steps (up to approxi mately 60 or so) commonly required to synthesize individual secondary metabolites.  8. Small, cyclic peptides are valuable pharmaceuti cals, biotechnological products, and tools for scientific research (Davies, J. S. Amino Acids, Peptides and Proteins 2003, 34, 149-217). Cyclic peptides in general have advan tages over their linear relatives in that they sample a more constricted conformational and configurational space. (Payne
Aug. 27, 2009
US 2009/0215172 A1
et al. Curr. Org. Chem. 2002, 6, 1221-1246). Stemming from this basic property, cyclic peptides often have stronger bind ing constants and favorable pharmacological properties such as resistance to proteases (Fairlie, D. P.; Tyndall, J. D. A.; Reid, R. C.; Wong, A. K.; Abbenante, G.; Scanlon, M. J.; March, D. R.; Bergman, D.A.; Chai, C. L. L.; Burkett, B. A. J. Med. Chem. 2000, 43,1271-1281). Because of this, numer ous investigators have developed means to produce arrays of small, cyclic peptides. Synthetic and enzymatic systems, as well as combinations of the two, have been used successfully on small and medium scale (Davies et al. J. Peptide Sci. 2003, 9, 471-501; Hahn et al. Proc. Nat. Acad. Sci. USA 2004, 101,
15585-15590). At the large scale, peptides in phage-display libraries have been cyclized via disulfide bonds or via semi synthesis from the same libraries (Kehoe, J. W.; Kay, B. K. Chem. Rev. 2005, 105, 4056-4072; Ho, K. L.; Yusoff. K.;
Seow, H. F.; Tan, W. S. J. Med. Virol. 2003, 69, 27–32).  9. There is a great need for new methods for making cyclic peptides, particularly for the manufacture of synthetic cyclic peptides for clinical investigations and therapeutic use, and for the production of cyclic peptide libraries that can be screened to identify cyclic peptides with a desired activity. What is needed in the art are methods for the in vivo construc
tion of cyclic peptide libraries that are enzymatically cyclized at the C–N terminus. IV. SUMMARY
[0010) 10. Disclosed are methods and compositions related to cyclization of polymers such as peptides. V. BRIEF DESCRIPTION OF THE DRAWINGS
 11. The accompanying drawings, which are incor porated in and constitute a part of this specification, illustrate several embodiments and together with the description illus trate the disclosed compositions and methods.  12. FIG. 1 shows Pate2 (SEQ ID NO: 43) encodes patellamide C (yellow) and ulithiacyclamide (green). Muta tion of the sequence to Patedm (SEQ ID NO. 44) leads to production of eptidemnamide (blue). Bold: proposed recog nition sequences for heterocyclization and C–N terminal cyclization. Eptifibatide (bottom left) is shown for compari SO[l.
 13. FIG. 2 shows the pat pathway. Genes for required enzymes are shown in blue and the precursor peptide gene is red. path (white) increases peptide yield, while patC (black) is apparently not required for biosynthesis.  14 FIG. 3 shows HPLC-MS traces for gene combi nations ADEG (top), ABDEFG (middle) and ABDEdrim!'G (bottom), monitored on mass 763 (upper) and 853 (lower).  15. FIG. 4 shows the tri gene cluster. Arrows denote ORFs and their direction, black ORFs are trNA synthetases, white ORFs are conserved hypothetical without homolog in the pat cluster, green ORFs are pat homologs, the precursor peptide gene is in orange.  16. FIG. 5 shows alignment of the precursor pep tides Pate and TriG. The sequence encoding patellamide C, patellamide A and trichamide (top to bottom) is underlined, proposed cyclization signal is in bold.  17. FIG. 6A shows structure of trichamide. Stere ochemistry is inferred, not determined experimentally, as described in the text. 6B. Assignment of CID-MS fragments from table 5 to the trichamide structure. 6C. Assignment of IRMPD-MS fragments.
 18. FIG. 7 shows a biosynthetic pathway to tricha mide.
[0019 19. FIG. 8 shows FT-MS of a crude Trichodesmium extract. Peaks are present for the trichamide parention (I), the 34S isotope (II) and the 13C2 isotope (III). [0020, 20. FIG. 9 show MS fragmentation patters of ion 550.2 with two different dissociation techniques. CID=collision-induced dissociation, IRMPD=infrared mul
tiphoton dissociation. Peaks labeled “x” are artifacts of the instrument, and all otherions can be accounted for as in Table 5 and FIG. 6.
 21. FIG. 10 shows the pat pathway. The Pate pro tein, now renamed Pate1, directly encodes the production of highly modified peptides, patellamides A and C. Putative recognition sequences flank the coding regions and are shown in bold.
 22. FIG. 11 shows diverse ascidians were collected from Palau and Papua New Guinea. Top: map of collection sites (red arrows). Middle: Didemum molle. Bottom: Lisso clinum patella.  23. FIG. 12 show pate diversity. Although pat path way variants are >99% identical at the DNA level, pate is hypervariable in the region encoding patellamides. Top: Schematic view ofgate. Bottom: Sequence differences between pate 1-E6. Dashes indicate residues that are identical to those in Pate1, and all residues N are identical between
 24. FIG. 13 shows sequences and structures pre dicted from pate sequence variants. All of the known com pounds (blue) have been identified in the requisite ascidian samples.  25. FIG. 14 shows quantitative PCR of Prochloron samples. Relative amounts of pate 1-E3 genes present in samples 05-019 and 03-005, normalized to the pate 3 concen tration.
 26. FIG. 15 shows top: Single cell of P didemni (right) isolated from the ascidian L. patella. The green pock ets near the surface of L. patella are monocultures of P didemni. Bottom: Patellamides A and C.
 27. FIG. 16 shows a Pate sequence. In italics, the conserved leader sequence; in bold, the proposed start and stop cyclization sequences; underlined, product-coding sequences. Sequences corresponding to patellamide C (top) and A (bottom) are aligned for clarity.  28. FIG. 17 shows thepat gene cluster (A) and GC skew (B). Colored genes represent those that can have a function assigned. White genes are those that have no signifi cant homolog. Blue genes contains protease activity. The G+C 9% skew below is altered where a coding region is present, as is common in many species and suggests that the gene predictions are correct. Additionally, the increase of the G+C 9% in this area shows that this region could have been transferred into this species via horizontal gene transfer.  29. FIG. 18 shows proofoffunction of the patclus ter. (A) Standard from 25 mL culture broth containing 20 pig patellamides, under SRM conditions observing m/z =725 (pa tellamide A daughter ion). (B) 2 L sample pCR2.1-pat #9, under SRM conditions for m/z =725. (C) Blind control: SRM using a sample identical to (B), except that empty pCR2.1 vector was used.
[0030) 30. FIG. 19 shows a proposed pathway to patella mides, showing route to patellamide C.
Aug. 27, 2009
US 2009/0215172 A1
 31. FIG. 20 shows marine symbionts and filamen tous fungi.  32. FIG. 21 shows a family of compounds and vari ous amino acid positions.  33. FIG.22 shows the origin of various samples, the organism from which it was derived, its chemistry, source of the 16S rRNA, and whether or not it was positive for pmA.  34. FIG. 23 shows the pat cluster, and the coding region of Pate.  35. FIG. 24 shows biogenesis, and heterocycliza tion/oxidation for Pat?) and PatC.
[0036) 36. FIG. 25 shows biogenesis, and cyclization/ cleavage for PatC and PatA. Also shown are recognition sequences.
 37. FIG. 26 shows that Trichodesmium erythraeum contains a pathway similar to pat.  38. FIG. 27 shows the structure prediction of Pate and TriG. TriG is a Pate homolog, as the coding sequence is different but the recognition sequences are closely related. [0039) 39. FIG. 28 shows the predicted product trichamide based on the mass using MALDI-TOF, and structure eluci dation by FT-MS.  40. FIG. 29 shows the methodology of structure elucidation, using mass spectrometry and NMR confirma tion.
 41. FIG. 30 shows Pate evolution. The DNA is identical except in coding regions. Only Patellamide A region is changed (compared to ulithiacyclamide).  42. FIG. 31 shows 6 Pate variants. They are 99% identical, except in the exact coding region.  43. FIG. 32 shows Pate evolution. Various com pounds and coding sequences are compared, and shown along with their structures. There is an unprecedented type of np evolution.
[0044) 44. FIG. 33 shows the biochemistry of pat. Impor tantly, it is shown that the required proteins include Patrº.  45. FIG. 34 shows eptidemnamide synthesis. [0046) 46. FIG. 35 shows the recognition sequence, and that a single mutation can abolish synthesis.  47. FIG. 36 shows the gene cluster for trunkamide. The first four coding sequences are very similar to those for Pata, Patb, PatC, and Pat?). The homolog of Pate, which directly encodes trunkamide, is identical to Pate until about midway through the coding sequence, where there is a clear insertion event leading to the new trunkamide-like sequences. The following 2 kbp of DNA sequence is not similar to that found in the previously reported patellamide sequence. Fol lowing this insertion, the latter half of PatC is present. This contains the protease domain found in patellamides, but it lacks the oxidase found in the patellamide pathway. This was expected, since trunkamide and relatives are not oxidized. However, the remainder of PatC is >95% identical to that of
the patellamides. Within this insertion, in addition to the latter half of the new Pate homolog, there are encoded two new proteins. These are both 40-50% identical to the previously described Patrº. It appears that at least one of these performs the prenyltransfer reaction important to formation of trunka mide; this is the major difference betweenthese two classes of metabolites. These comprise a unique class of proteins with two functions: heterocyclization of Thr/Ser (in the case of patellamides); and prenylation of Thr/Ser (in the trunkamide family).  48. FIG. 37 shows patellamides versus patellin. The bottom cluster (pat pn) was sequenced, which directly
encodes patellins 2 and 3. The pathway is very similar (<90% identical) to the previously reported pat pathway, with 2 major differences: 1) patG is missing the oxidase domain; 2) there are 2 copies of patrº, both of which are only about 40% identical to the patrº from the patellamide cluster. [0049) 49. FIG. 38 shows that a new family of enzymes have been identified. In one case, heterocyclization occurs, and in the other, prenylation. Prenylation is extremely impor tant, since cyclic peptide libraries can be prenylated. [0050, 50. FIG. 39 shows a proof of function of patellin synthesis. The whole gene cluster out of the Prochloron bac teria was amplified by PCR and put it into the pCR2.1 TOPO vector (Invitrogen). Expression and chemical analysis was carried out in E. coli. Methodology overall was similar to that used for patellamides.  51. FIG.40 shows an expression design. By LC-MS, the TOPO clone could make patellins 2 and 3, proving that the identified cluster is necessary and sufficient for patellin syn thesis.
 52. FIG. 41 shows heterologous expression of Patellins 2 and 3. Shown is the LC-MS run (y-axis: % abun dance; x-axis: time (min)). The top panel is an extract of E. coli containing the patellin cluster. The bottom is an extract of whole ascidian containing patellins (positive control).  53. FIG. 42 shows heterologous expression of Patellin 2. Mass analysis of this peak clearly shows that patellin 2 is synthesized in E. coli when the patellin gene cluster is present.  54. FIG. 43 also shows heterologous expression of Patellin 3. Mass analysis of this peak clearly shows that patellin 3 is synthesized in E. coli when the patellin gene cluster is present.  55. FIG. 44 shows that patellin 3 is clearly synthe sized when the identified gene cluster is used.  56. FIG. 45 shows trunkamide. A gene cluster that produces trunkamide (the clinically important molecule) and patellin 6 was cloned. The pathways are nearly identical, except that they make different molecules.  57. FIG. 46 shows trunkamide cluster verification. To address the orientation of the cluster, PCR with primers from the patellin cluster covering the whole cluster in pieces was used. This clearly indicates that these clusters are nearly identical, with the exception of the products synthesized. VI. DETAILED DESCRIPTION
 58. Before the present compounds, compositions, articles, devices, and/or methods are disclosed and described,
it is to be understood that they are not limited to specific synthetic methods or specific recombinant biotechnology methods unless otherwise specified, or to particular reagents unless otherwise specified, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. A. DEFINITIONS
 59. As used in the specification and the appended claims, the singular forms “a,” “an” and “the include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a pharmaceutical carrier” includes mixtures of two or more such carriers, and the like.
[0060) 60. Ranges can be expressed herein as from “about” one particular value, and/or to “about” another particular
Aug. 27, 2009
US 2009/0215172 A1
value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another embodi ment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. It is also
in any particular order. A mixture is homogeneous and not spatially separable into its different constituents. Examples of mixtures of elements include a number of different polypep tides that are present in the same solution (e.g., an aqueous solution). In other words, a mixture is not addressable. To be specific, an arrayed library of polypeptides, as is commonly known in the art, is not a mixture of polypeptides because the elements of the library are spatially distinct and the array is
understood that there are a number of values disclosed herein,
and that each value is also herein disclosed as “about” that
 67. The terms “treatment”, “treating”, “treat”, and the like, refer to obtaining a desired pharmacologic and/or physiologic effect. The effect may be prophylactic in terms of completely or partially preventing a disease or symptom thereof and/or may be therapeutic in terms of a partial or complete cure for a disease and/or adverse affect attributable to the disease. “Treatment”, as used herein, covers any treat ment of a disease in a mammal, particularly in a human, and includes: (a) preventing the disease from occurring in a sub ject which may be predisposed to the disease but has not yet been diagnosed as having it; (b) inhibiting the disease, i.e., arresting its development; and (c) relieving the disease, i.e., causing regression of the disease and/or relieving one or more disease symptoms. “Treatment” is also meant to encompass delivery of an agent in order to provide for a pharmacologic
particular value in addition to the value itself. For example, if the value “10” is disclosed, then “about 10” is also disclosed. It is also understood that when a value is disclosed that “less
than or equal to the value, “greater than or equal to the value” and possible ranges between values are also disclosed, as appropriately understood by the skilled artisan. For example, if the value “10” is disclosed the “less than or equal to 10” as well as “greater than or equal to 10” is also disclosed. It is also understood that the throughout the application, data is pro vided in a number of different formats, and that this data,
represents endpoints and starting points, and ranges for any combination of the data points. For example, if a particular data point “10” and a particular data point 15 are disclosed, it is understood that greater than, greater than or equal to, less than, less than or equal to, and equal to 10 and 15 are consid ered disclosed as well as between 10 and 15. It is also under
stood that each unit between two particular units are also disclosed. For example, if 10 and 15 are disclosed, then 11, 12, 13, and 14 are also disclosed.
 61. In this specification and in the claims which follow, reference will be made to a number of terms which
shall be defined to have the following meanings: [0062) 62. “Optional” or “optionally” means that the sub sequently described event or circumstance may or may not occur, and that the description includes instances where said event or circumstance occurs and instances where it does not.
 63. A “cyclic polypeptide” is a type of conforma tionally restrained polypeptide that, as its name suggests, contains a cyclic polymer of amino acids. The term “cyclic polypeptide” is used to describe a polypeptide (including a cyclic peptide) that is circularized via a peptide bond between the N and C terminal amino acids of a linear polypeptide (as described in U.S. published patent application 20040014100, for example).  64. The term “randomized amino acid sequence” refers to a polypeptide having an amino acid sequence that is at least partially randomized, including fully randomized. When made recombinantly, a library of polypeptides having randomized amino acid sequences usually contains polypep tides having any of the naturally occurring amino acids, or any subset thereof, present into at least one or all positions (e.g., at last 1, 2, 3, 4, 5, about 8, about 10, about 15, about 20, usually up to at least 100 or more positions) of the polypep tide. Polypeptides having a randomized amino acid sequence are usually produced using synthetic nucleic acids that con tain any of the four nucleotides, or a subset thereof, in at least one or all positions of the polynucleotide.  65. A “library” of cells is a plurality of cells. Such a library may be a mixture of different cells, or may contain cells that are separated from each other (e.g., in the wells of a multi-well plate).  66. The terms “pool” or “mixture”, as used herein, refers to a combination of elements, e.g., cells or polypep tides, that are interspersed in two or three dimensions and not
effect, even in the absence of a disease or condition. For
example, “treatment” encompasses delivery of a receptor modulator that can provide for enhanced or desirable effects in the subject (e.g., reduction of pathogen load, beneficial increase in a physiological parameter of the subject, reduction of disease symptoms, etc.).  68. Throughout this application, various publica tions are referenced. The disclosures of these publications in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art to which this pertains. The references disclosed are also indi vidually and specifically incorporated by reference hereinfor the material contained in them that is discussed in the sen
tence in which the reference is relied upon. B. General
 1. Patellamides [0070) 69. Patellamides are a family of N–C terminally cyclized peptide natural products isolated from marine ascid ians (Ireland, C. M.; Durso, Jr., A. R.; Newman, R. A.; Hacker, M. P. J. Org. Chem. 1982, 47, 1807-1811) (FIG. 1). These peptides and their relatives often contain thiazole, thia zoline, and oxazoline heterocycles derived from Cys, Thr, and Ser. They form a large family of molecules, some of which are relatively unrelated to the parent patellamide structure (Davidson, B. S. Chem. Rev. 1993, 93, 1771-1791; Sings et al. Ind. Mirobiol. 1996, 17, 385-396; Schmidt et al. J. Nat. Prod. 2004, 67, 1341-1345). To investigate the biosynthesis and biotechnological utility of this family, the patellamide A/C biosynthetic gene cluster, pat, was cloned and synthe sized from an uncultivated bacterial symbiont of ascidians (FIG. 1). When expressed in E. coli, pat led to the production of very small amounts of patellamides (Long et al. ChemPio Chem, 2005, 6, 1-7). This represented the first fully validated natural product pathway from uncultured symbionts.  70. pat is composed of seven coding sequences, patA-G, which had little to no similarity with other charac terized gene clusters. Pate encoded the cyclic peptides, patel lamides A and C, directly on a single prepeptide (FIG. 1). Putative start- and stop-cyclization recognition sequences
Aug. 27, 2009
US 2009/0215172 A1
were found, leading to the speculation that the coding sequences themselves could be modified to produce new, cyclic peptides.  71. pat was originally cloned from an environmental (uncultured) bacterial sample, and the intact pathway pro duced low levels of patellamides. Therefore, patA-G were cloned and expressed in compatible DUET vectors in E. coli. On the basis of sequence analysis, it was predicted that PatA, Pat?), Pate, and PatG would be required for patellamide bio synthesis. Pate, as the direct patellamide prepeptide, is obvi ously a required precursor. Pat?) has low sequence similarity to a series of enzymes involved in thiazole formation in a group of microcins, (Roy et al. Nat. Prod. Rep. 1999, 16, 249-263; Milne et al. Biochemistry 1999, 38, 4768-4781; Kelleher et al. Biochemistry, 1999, 38, 15623-15630) indi cating that it is likely required for the same function in pat. PatA and PatC both contained serine protease domains that were predicted to be involved in maturation (Chatterje et al. A. Chem. Rev. 2005, 105,633-683) and cyclization of patel lamides. In addition, PatC harbored an N-terminal domain
with homology to FAD-dependent oxidases, indicating that it would likely be required to synthesize thiazole from thiazo line. The other three predicted coding sequences, Patb, Patc. and Patº, had no significant similarity to any protein with known function.
 72. It was discovered that pate2, which was identi cal to pate except that the nucleotides encoding patellamide A were neatly replaced with those encoding the known com pound, ulithiacyclamide. pate2 was used for the studies described, in part because ulithiacyclamide was much more readily detected in comparison to patellamide A or C. In order to achieve better production with pate2, all pat genes were removed from their native context and placed undercontrol of individual T7 promoters in E. coli. Production of patella mides and ulithiacyclamide was monitored by HPLC-ESI MS, using an authentic standard of ulithiacyclamide as a positive control.  73. Co-expression of the full gene set patA-G fol lowed by subtraction of genes one at a time led to the discov ery that PatADE2G was required, but that Patrº was also required for patellamide C/ulithiacyclamide production. Patb and Patc., by contrast, were not necessary for the pro duction of the patellamides, although Patb increased the detected yield. Strains that lacked any of the proteins PatADE2FG did not make patellamides. On this basis, the minimal gene set was defined as patADEFG (FIG. 2).  74. A series of pat relatives encoding both new and known products were identified. Only the patellamide-like coding sequences were mutated, while other sequences remained identical. However, most of the mutations were
relatively conservative, in that aliphatic amino acids could be swapped, and Thrand Ser were interchangeable. Thus, it was sought whether less conservative mutations could be toler ated by the pat system.  75. A mutant, patedm, was synthesized in which the entire ulithiacyclamide sequence was swapped with a sequence encoding “eptidemnamide”. This new peptide sequence has no biosynthetic precedent in the literature, is not related in any way to known patellamide relatives, and was meant to be an amide-cyclized relative of the clinically used disulfide-bridged anticoagulant, eptifibatide (Curran, M. P.; Keating, G. M. Drugs 2005, 65, 2009-2035). In contrast to patellamides, eptidemnamide contains charged and polar residues and new hydrophobic amino acids Trp and Gly. This
new peptide was designed in order to define the sequence tolerance of PatADFG in one step.  76. patedm was synthesized in a single round of mutational PCR, (Kunkel, T. A. Proc. Nat. Acad. Sci. USA 1985, 82, 488-492) and its identity was verified by sequenc ing. In addition, a mutant patedm” was discovered in a clone
library that was very similar to patedm but contained a Pº-Q
mutation in the recognition sequence immediately upstream of eptidemnamide. Both patedm and patedm" were cloned into pKSF-DUET vector and co-expressed with patABDFG. By HPLC-ESI-MS analysis, the strain containing patedm produced eptidemnamide, while the patedm" strain did not produce any detectable new compound. From the patedm expressing strain, eptidemnamide was isolated, and its struc ture verified by NMR and ESI-FTMS. These experiments demonstrate the crucial nature of the recognition region in controlling peptide cyclization, while also showing that the coding sequences of these peptides can be varied greatly.  77. The absolute configuration of the new com pound can be all L, based upon the following consideration: In all cases, patellamides and relatives contain L-amino acids except adjacent to thiazole, in which case D- or L-amino acids are present. As noted by numerous synthetic and natural products chemists, this position is notoriously labile, under going racemization under many different conditions (Milne etal. Org. Biomol. Chem. 2006; Wipfetal. J. Am. Chem. Soc. 1998, 120,4105-4112). [0079) 78. The experiments described above are useful in the enzymatic synthesis of cyclic peptide libraries by allow ing the rapid construction of C–N terminally amide-linked cyclic peptides on a reasonable scale. In addition, the biosyn thetic gene set has been defined, facilitating a complete bio chemical analysis of the unique steps involved in the synthe sis of this family of compounds. Finally, numerous compounds have been isolated from marine invertebrates, many with novel architectures and functional groups (Bluntet al. Nat. Prod. Rep. 2006, 23, 26-78; Newman et al. J. Nat. Prod. 2004, 67, 1216–1238; et al. Mol. Cancer. Ther. 2005, 4,
333-342).  79. Also disclosed is the enzymatic synthesis of prenylated peptide libraries using those peptides disclosed herein.
 2. Trichamide  80. A gene cluster for the biosynthesis of a new small cyclic peptide, dubbed trichamide, was discovered in the genome of the global, bloom-forming marine cyanobac terium Trichodesmium erythraeum ISM101 because of strik ing similarities to the previously characterized patellamide biosynthesis cluster. The tri cluster consists of a precursor peptide gene containing the amino acid sequence for mature trichamide, a putative heterocyclization gene, an oxidase, two proteases and hypothetical genes. Based upon detailed sequence analysis, a structure was predicted for trichamide and confirmed by Fourier-transform mass spectrometry. Tri chamide consists of 11 amino acids, including two cysteine derived thiazole groups, and is cyclized by an N–C terminal amide bond.
[0083) 81. Trichodesmium is a genus of marine diaz otrophic, non-heterocysteous cyanobacteria. It occurs throughout the open waters of oligotrophic tropical and sub tropical oceans and forms filaments (trichomes) of 20-200 cells that can further aggregate into colonies several millime ters across. Trichodesmium can form enormous blooms in
excess of 100,000 km (Karl et al. 2002. Dinitrogen fixation
US 2009/0215172 A1
in the world’s oceans. Biogeochemistry 57/58: 47-98), which are most commonly composed of T, erythraeum and T. thie bautii. Trichodesmium sp. have been the subject of intense research mainly for two reasons. First, they contribute a sig nificant portion (40% or more) to global oceanic nitrogen fixation, thereby directly affecting the biogeochemical car bon flux in tropical oceans with implications for the world’s climate. Second, massive coastal Trichodesmium blooms
have been reported to have toxic effects, both directly on invertebrates (Guo C., P. A. Tester. 1994. Toxic effect of the bloom-forming Trichodesmium sp. (Cyanophyta) to the copepod Acartia tonsa. Nat. Toxins 2: 222-227; Hawser S. P., J. M. ONeil, M. R. Roman, G. A. Codd. 1992. Toxicity of blooms of the cyanobacterium Trichodesmium to zooplank ton. J. Appl Phycol 4: 79–86) and on humans (“Trichodes mium or Tamandare fever”, (Sato et al. Trab do Instit. Ocean ogr. Univ. Fed de Pernambuco Recife 5/6: 7–50) as well as indirectly by inducing blooms of other organisms (Devassy et al 1979. Indian J. Mar. Sci. 8: 88-93; Lenes et al. 2001.
Limnol. Oceanogr. 46: 1261-1277) that can be potentially harmful. While cyanobacteria are a prolific source of diverse natural products and toxins (Carmichael W. W. 1992. 72: 445-459; Gerwick et al. 2001. Alkaloids Chem. Biol. 57:
75-184; Namikoshi et al. 1996. Bioactive compounds pro duced by cyanobacteria. J. Ind. Microbiol. 17:373-384), a toxic compound (or any natural product) has not been isolated from a Trichodesmium species despite some efforts (Hawser et al. 1991. Toxicon 29: 277–278].  82. BLAST searches in GenBank with the pat genes revealed homologs in T. erythraeum IMS101. This led to the investigation of a potential patellamide-like biosynthesis cluster as well as its product, a small cyclic peptide, dubbed trichamide in T. erythraeum.  3. Prenylation [0086) 83. Prenylated peptides can also beformed using the peptides disclosed herein. Prenylation can be useful for a variety of reasons. For example, it can be useful in the syn thesis of peptide libraries with an unprecedented modifica tion. This can be used in drug discovery, for example. Preny lation can also be useful in the synthesis of peptide libraries with other prenyl modifications, including farnesylation and geranylation. Such modifications are important in cell signal ing, especially as related to cancers.  84. Prenylation provides a unique handle for chemi cal modification of peptides, either individually or in library format. For example, this modification is useful in fluorescent labeling of peptides, for surface labeling, or for addition of specific functional groups. In the case offluorescent labeling, modified peptides are used to determine a drug’s mechanism of action, to probe cellular events by microscopy, as reagents or components in fluorescent detection kits (for metals, drug interactions, etc.), or as clinical diagnostic agents. Surface labeled peptides can also find use as arrayed libraries for drug discovery. Surfaces are labeled via metathesis or by other well known reactions involving terminal olefins. For the addition of specific functional groups, terminal olefins provide a robust chemical platform. Examples of functionalization include fluorescent labeling, surface labeling, addition of hydrophobic or hydrophilic groups, addition of drugs or other small molecules, addition of specific functional groups to increase drug interactions via avidity effects, and many others which are known to those of skill in the art and herein con
Aug. 27, 2009  85. Prenylation was an ancestral function, and the enzymes gradually evolved to catalyze the other function (heterocyclization). Prenylation is a new type of posttransla tional modification, and the regioselectivity of prenylation is a useful aspect. Posttranslational modifications include phos phorylation, acetylation, glycosidation, and other extremely important events in cell biology. [0089) 4. Evolution of Biosynthetic Pathways  86. Biosynthetic pathways to bacterial secondary metabolites are extremely complex, and an understanding of their evolution allows for the engineering of new pharmaceu ticals. Symbiotic bacteria offer an ideal model to follow this evolution because relationships can be precisely defined. The evolution of the patpathway was examined, from Prochloron spp. cyanobacterial symbionts of ascidians collected in the tropical Pacific. Six variants of the 70-amino acid patellamide precursor protein, Pate, were discovered from tropical Pacific Prochloron samples. In all cases, amino acid and DNA sequences were virtually identical except in the 16-amino acid regions encoding the actual patellamides, which had highly diverse DNA and amino acid sequences. By contrast, Prochloron spp. were found to be >99% identical by molecu lar methods. Thus, the coding sequences for patellamide bio synthesis have rapidly diversified by recombination that is unprecedented in bacterial metabolic pathways.  87. Bacteria living symbiotically with higher organ isms provide a potential mechanism to more readily discern important events in the evolution of complex secondary metabolites. Often, bacteria-host relationships can be rigor ously defined because of vertical transmission of symbionts, (Baumann, P. Annu. Rev. Microbiol. 59, 155-1589 (2005)) simplifying evolutionary scenarios. In addition, the common relationship of microscopic organisms with macroscopic, chemically defined animals or plants provides a platform for the study of pathway evolution.  88. Prochloron spp. are common symbiotic cyano bacteria that are intimately associated with marine animals, especially ascidians of the Family Didemnidae (Withers et al. Phycologia 17, 167-171 (1978); Lewin et al. Prochloron: A Microbial Enigma (Chapman and Hall, New York, 1989)). They are also found associated with stromatolites (bacterial mat structures), but they have not yet been found outside of these structured, metabolically active environments. Numer ous cyclic peptides, especially those of the patellamide class, have been isolated from didemnid ascidians, forming over lapping families of evolutionarily related metabolites. (Sings etal. Journal of Industrial Microbiology & Biotechnology 17, 385–396 (1996); Schmidt et al. J. Nat. Prod. 67, 1341-1345 (2004)). The first gene cluster, pat, is described herein. pat is responsible for patellamide biosynthesis, demonstrating that Prochloron symbiotic bacteria are responsible for patella mide production (FIG. 10).  89. The pat cluster is composed of seven coding sequences, patA-G, five of which are essential for patella mide biosynthesis. The patellamides are produced by a microcin-like pathway, in which a precursor peptide Pate directly encodes the amino acid sequences of the patellamide products. Pate is modified by heterocyclization of Cys, Ser, and Thr residues, followed by N–C terminal cyclization to afford the final patellamides. It was proposed that start/stop recognition sequences are responsible for the modification to the Pate precursor peptide, while the actual coding sequences between the start/stop have little or no effect on modification.
Aug. 27, 2009
US 2009/0215172 A1
[0094) 90. A large family of patellamides and related com pounds have been isolated from didemnid ascidians, leading to the proposal that the pat pathway has rapidly diversified to produce a natural combinatorial library of cyclic peptides. To test this idea, 46 Prochloron-containing ascidians were col lected in Palau and Papua New Guinea in the tropical Pacific. Ascidians species included Lissoclinum spp., L. patella, Didemnum spp., D. molle, and others. DNA and cyclic pep tides were readily purified from these organisms and analyzed
by PCR/sequencing, mass spectrometry, and "H NMR.
 91. pate PCR primers were applied to Prochlorom DNA samples, and the products were directly sequenced. Overlapping sequences were deconvoluted, leading to the discovery of six pate variants (E1-E6), encoding a total of 9 different patellamide-like products. The existence of these putative variants was confirmed by PCR with specific primers for the variants. While most encoded known compounds, some encoded potentially new structures. These pate variants were virtually identical to each other, except that the nucle otide sequence encoding amino acids forming the patella mides were highly mutated, exhibiting identities down to 46%. Some pate variants encoded eight amino-acid products, while others encoded seven amino-acid compounds. The variability in DNA led to highly varied predicted peptide products, although trends could be readily observed. All pate variants encode two patellamide-like molecules, and the rec ognition sequence regions flanking the coding regions are highly conserved at the DNA and protein levels. This indi cates that the second recognition/coding region in pate arose via a duplication.  92. Both ribosomal RNA and primary metabolic genes were examined to determine whether there was a simi lar high level of mutation across the Prochloron genomes. All 16S rRNA clones sequenced were virtually identical. Unlike the majority of cyanobacteria, Prochloron spp. contain chlo rophyll b as well as chlorophylla. Chlorophylla oxidase (cao) is thus a relatively specific primary metabolic gene that can be used to identify Prochloron. cao was amplified from a series of samples with different pate sequences, and it was highly conserved at the DNA sequence level. cao was >99% identi cal in all strains tested except for two, which exhibited 98% and 97% identity. The presence of pate 1-E6 did not correlate with either host or symbiont taxonomy, implicating horizon tal transfer as the source of variability.  93. Specific primers were designed for pate1-E6, and primers from different locations in the known pat were used to determine the presence of whole pathways. Intact pathways contained continuous sequence between patl)-pate and pate-patrº, while some variants appeared to be non-con tiguous with other pat genes. In these cases of isolation, no patellamide-like products could be detected as major com pounds in extracts, showing that intact pat pathways are required to produce these compounds. Sequence analysis of numerous patA-G pathway genes, including those clustered with new pate variants, showed that these genes were essen tially identical (>99% identical) with each other. [0098) 94. Often, one, two, or three different pate variants were discovered in the same sample. There are two possible explanations: either there are multiple sequences in single strains, or there are multiple strains in the same ascidian. The difference is highly pertinent to the mode of pathway evolu tion, since pilin genes in bacteria evolve by recombination from up to six copies in a single genome. Two genes, pate 1 and pate2, were present in a sample from Palau that was the
subject of genome sequence analysis. In this sample, the sequencing reads for the two genes were present in a 1:2 ratio. This ratio was also reinforced by quantitative PCR analysis, which gave the same 1:2 pate 1:pate2 ratio.  95. Quantitative PCR analysis was applied to sev eral other samples from Papua New Guinea. By PCR analy sis, Papua New Guinea L. patella samples 05-019 from the Milne Bay region and 03-005 from Madang contained pate 1, pate2, and pate 3. Q-PCR showed that these genes were present in a 1:15:70 ratio in sample 05-019 and in a 1:4.5:9 ratio in 03-005. In summary, samples from three different locations showed three different ratios of pate genes, indicat ing that multiple strains are indeed present in the same organ ism. Thus, the recombination event leading to Pate variants does not follow the pilin-like mechanism.  96. As mentioned above, intact pat pathways were required for patellamide-like products to be synthesized. pate 3 contained sequences encoding lissoclinamides, com pounds composed of seven amino acids for which no biosyn thetic machinery had been previously described. Anascidian, Lissoclinum patella from Papua New Guinea, contained pate 3 and was selected for detailed chemical analysis. From this sample, lissoclinamides 2-4 and the related ulicyclamide
were purified to homogeneity and characterized by "H NMR
and mass spectrometry. Lissoclinamides 2 and 4 are directly encoded by pate 3. Ulicyclamide and lissoclinamed 3 are encoded by the same sequence as for lissoclinamide 2, but they differ in their posttranslational modifications. Ulicycla mide, for example, contains two thiazoles, while the others contain one thiazole and one thiazoline. The molecules also
differ in their stereochemistry adjacent to thiazole/thiazoline, although this process may be spontaneous. Samples contain ing other pate variants, such as those encoding patellamide C and ulithiacyclamide, also were shown to contain their pre dicted chemical products. Samples from which pate variants could not be amplified did not contain related products at a detectable level.
 97. Thus, it has been shown that evolution of quite different patellamide like products has only required a switch in small cassettes encoding 7-8 amino acids, while the remainder of the pathways were intact. Examination of 16S and selected ITS regions indicated that these Prochloron strains from numerous animals of different species were quite closely related (>99% ITS identity). Thus, within very closely related Prochloron strains, the patellamide pathway has diverged by rapid recombination. The observation of natural variation in pat have allowed for the specific, testable predic tions regarding the engineering of the pathway to make new compounds. An entire patellamide-coding region was mutated to a wholly unnatural pathway and a new, cyclic peptide was obtained (described below). These results rein force the power of studying symbiosis to understand evolu tion and engineering in natural products pathways. C. Compositions  98. Disclosed are the components to be used to prepare the disclosed compositions as well as the composi tions themselves to be used within the methods disclosed
herein. These and other materials are disclosed herein, and it is understood that when combinations, subsets, interactions,
groups, etc. of these materials are disclosed that while spe cific reference of each various individual and collective com
binations and permutation of these compounds may not be explicitly disclosed, each is specifically contemplated and
Aug. 27, 2009
US 2009/0215172 A1
described herein. For example, if a particular patellamide is disclosed and discussed and a number of modifications that
can be made to a number of molecules of the patellamide are discussed, specifically contemplated is each and every com bination and permutation of those and the modifications that are possible unless specifically indicated to the contrary. Thus, ifa class of molecules A, B, and Care disclosed as well
as a class of molecules D, E, and F and an example of a combination molecule, A-D is disclosed, then even if each is
not individually recited each is individually and collectively contemplated meaning combinations, A-E, A-F B-D, B-E, B-F, C-D, C-E, and C–F are considered disclosed. Likewise,
any subset or combination of these is also disclosed. Thus, for example, the sub-group of A-E, B-F, and C-E would be con sidered disclosed. This concept applies to all aspects of this application including, but not limited to, steps in methods of making and using the disclosed compositions. Thus, if there are a variety of additional steps that can be performed it is understood that each of these additional steps can be per formed with any specific embodiment or combination of embodiments of the disclosed methods.
[0103) 99. Disclosed herein are sets of recombinant pro teins that catalyze the N–C terminal cyclization of peptides via amide bonds. This cyclization event does not depend upon the sequence of the cyclized peptide; rather, recognition sequences in a prepeptide surrounding the peptide of interest dictate the cyclization. Disclosed herein are various prepep tides (also referred to as recognition sequences). While the polymer, such as a peptide, to be cyclized (also referred to as the coding sequence) can vary greatly and still be cyclized, and can, in fact, be any peptide capable of being cyclized, the recognition sequence is much more specific.  100. As discussed above, any type of polymer, including peptides, can be cyclized using the recognition sequences disclosed herein, including organic polymers such as biopolymers that contain amino acid or nucleotide mono mers, or a mixture of different types of monomers. Accord ingly, polypeptides, polynucleotides, or a polymer containing both amino acid and nucleotide monomers, for example, may be cyclized using the subject methods. In many embodiments of the invention, the polymer used is a biopolymer containing amino acids, i.e., a polypeptide. Polymers that may be employed herein may not contain any peptide bonds. How ever, in certain embodiments, the polymers may contain pep tide bonds in between the first and second monomers of one or
both ends of the polymer to be cyclized.  101. For example, below, the sequences in bold are the recognition sequences, and the intervening underlined sequences are cyclized by the described enzymes. The com bination of coding sequence and recognition sequence is referred to throughout the application as a “fusion polypep tide.” For example, this sequence was modified this sequence to the completely unnatural variant Pate: (SEQ ID NO : 45) MNKKNILPQQGQPVIRTAGQLSSQLAELSEEALGDAGLEASVTACITFCA YDGWEPSITW CISV CAYDGE
(SEQ ID NO : 46) MNKKNILPQQGQPVIRTAGQLSSQLAELSEEALGDAGLEASVTACITFCA YDGVEPSQGGRGDWPAYDGE
where the second underlined sequence has been changed. This compound was isolated from E. coli broth cultures. This modification proves that the enzymes only rely on the bold recognition sequences, not on the underlined “coding sequences”. Further evidence in favor of this is that the pep tide PateBS2 was synthesized: (SEQ ID NO : 47) MNKKNILPQQGQPVIRTAGQLSSQLAELSEEALGDAGLEASVTACITFCA
where the middle bold sequence, AYDGVEPS, has been mutated to AYDGVEQS. With a modification in the recogni tion sequence, cyclic peptides were no longer produced.  103. The advantage of amide-cyclized peptides is two-fold. First, conformational freedom is greatly restricted, leading to much better binding constants (more potent drugs or biomolecules). Second, amide-cyclized peptides have favorable pharmacological properties, such as resistance to proteases and advantages in delivery.  104. This cyclization may take place either in vitro with purified enzymes or in Escherichia coli expression con structs, or in other vectors and systems as described herein. The cyclization can also take place in in vivo systems, as described below.
 105. Disclosed herein are isolated peptides that can act as “recognition sequences”, and function as prepeptides to allow for the formation of cyclized peptides. For example, disclosed herein is an isolated peptide comprising an amino acid segment comprising the amino acid sequence of SEQID
NO: 1 (GLEASN'AYDGVEPSN*AYDGE) where N is the coding sequence and can be any length, as discussed above. For example, the coding sequence can be 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30, 40, 50, 60, 70, 80,90, or 100 amino acids in length, or any amount in between, for example. There are numerous examples given throughout of various peptides that can be cyclized by the recognition sequences disclosed herein.  106. The isolated peptide can also comprise an amino acid segment comprising the amino acid sequence of
SEQID NO. 2 (GLEASN'AYDGVEPS, where N is the cod
ing sequence and can be any length). Also disclosed is an isolated peptide comprising an amino acid segment compris ing the amino acid sequence of SEQ ID NO: 3
(AYDGVEPSN*AYDGE where N is the coding sequence and
can be any length).  107. As discussed in greater detail below, the iso lated peptide can comprise an amino acid sequence at least about 90% identical to the amino acid sequence of SEQ ID NO: 1, or the amino acid sequence of SEQID NO: 1 can have
one or more conservative amino acid substitutions. For
example, recognition sequences are more highly conserved, but can contain modifications such as LEAS/VEPS/PGPS in
 102. The sequences in bold are the recognition sequences, and the intervening underlined sequences are cyclized by the described enzymes. This sequence was modi fied this sequence to the completely unnatural variant PatRBS:
the first position of pate1, pate2, and triCº, respectively.  108. Examples of recognition sequences can be found in SEQ ID NO. 4 (GLEASVTACITFCAYDGVEP SCTLCCTLCAYDGE), which encodes both Patellamide C and ulithiacyclamide, and in SEQID NO: 5 (GLEASVTAC
Aug. 27, 2009
US 2009/0215172 A1
ITFCAYDGVEPSQGGRGDWPAYDGE), which encodes Patellamide C and eptidemnamide.  109. A further example can be found in SEQID NO: 6 (GLEASVTACITFCAYDGVEPSITVCISVCAYDGE),
which encodes Patellamide A and Patellamide C.
 110. As discussed above, recognition sequences can also be found in the Trichodesmium species. For example, disclosed is SEQID NO: 7 (MGKKNIQPNSSQPVFRSLVA RPALEELREENLTEGNQGHGPLANGPGPSGDGL HPRLCSCSYDGDDE), which encodes the cyclic peptide trichamide. This sequence can be further shortened and still produce trichamide, for example, using SEQ ID NO: 8 (GPGPSGDGLHPRLCSCSYDGDDE).  111. Also disclosed is the amino acid sequence of SEQID NO: 9 (GPGPSNSYDGDDE), wherein N can be any length, and the remaining sequence is a recognition sequence which allows for the cyclization of whichever peptide is placed in the “N” position  112. Also disclosed herein is an isolated peptide comprising an amino acid segment comprising the amino acid
(GVDASN'SYDGVDASN’SYDD) where N is the coding
sequence and can be any length, as discussed above. For example, the coding sequence can be 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 40, 50, 60, 70, 80, 90, or 100 amino acids in length, or any amount in between, for example. There are numerous examples given throughout of various peptides that can be cyclized by the recognition sequences disclosed herein.  113. The isolated peptide can also comprise an amino acid segment comprising the amino acid sequence of
SEQ ID NO. 52 (GVDASN'SYDGVDAS, where N is the
coding sequence and can be any length). Also disclosed is an isolated peptide comprising an amino acid segment compris ing the amino acid sequence of SEQ ID NO. 53
(SYDGVDASN’SYDD where N is the coding sequence and
can be any length).  1 Sequence Similarities [0119) 114. It is understood that as discussed herein the use of the terms homology and identity mean the same thing as similarity. Thus, for example, if the use of the word homology is used between two non-natural sequences it is understood that this is not necessarily indicating an evolutionary relation ship between these two sequences, but rather is looking at the similarity or relatedness between their nucleic acid sequences. Many of the methods for determining homology between two evolutionarily related molecules are routinely applied to any two or more nucleic acids or proteins for the purpose of measuring sequence similarity regardless of whether they are evolutionarily related or not. [0120) 115. In general, it is understood that one way to define any known variants and derivatives or those that might arise, of the disclosed genes and proteins herein, is through defining the variants and derivatives in terms of homology to specific known sequences. This identity of particular sequences disclosed herein is also discussed elsewhere herein. In general, variants of genes and proteins herein dis closed typically have at least, about 70, 71, 72,73, 74, 75, 76, 77,78, 79,80, 81,82.83, 84,85, 86, 87, 88,89, 90,91, 92,93,
94, 95, 96, 97, 98, or 99 percent homology to the stated sequence or the native sequence. Those of skill in the art readily understand how to determine the homology of two proteins or nucleic acids, such as genes. For example, the
homology can be calculated after aligning the two sequences so that the homology is at its highest level. [012.1] 116. Another way of calculating homology can be performed by published algorithmns. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48: 443 (1970), by the search for similarity method of Pearson and Lipman, Proc. Natl. Acad. Sci. U.S.A. 85: 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software
Package, Genetics Computer Group, 575 Science Dr., Madi son, Wis.), or by inspection. [0122) 117. The same types of homology can be obtained for nucleic acids by for example the algorithms disclosed in Zuker, M. Science 244:48-52, 1989, Jaeger et al. Proc. Natl. Acad. Sci. USA 86:7706-7710, 1989, Jaeger et al. Methods Enzymol 183:281-306, 1989 which are herein incorporated by reference for at least material related to nucleic acid align ment. It is understood that any of the methods typically can be used and that in certain instances the results of these various
methods may differ, but the skilled artisan understands if identity is found with at least one of these methods, the sequences would be said to have the stated identity, and be disclosed herein.
 118. For example, as used herein, a sequence recited as having a particular percent homology to another sequence refers to sequences that have the recited homology as calcu lated by any one or more of the calculation methods described above. For example, a first sequence has 80 percent homol ogy, as defined herein, to a second sequence if the first sequence is calculated to have 80 percent homology to the second sequence using the Zuker calculation method even if the first sequence does not have 80 percent homology to the second sequence as calculated by any of the other calculation methods. As another example, a first sequence has 80 percent homology, as defined herein, to a second sequence if the first sequence is calculated to have 80 percent homology to the second sequence using both the Zuker calculation method and the Pearson and Lipman calculation method even if the first sequence does not have 80 percent homology to the second sequence as calculated by the Smith and Waterman calculation method, the Needleman and Wunsch calculation
method, the Jaeger calculation methods, or any of the other calculation methods. As yet another example, a first sequence has 80 percent homology, as defined herein, to a second sequence if the first sequence is calculated to have 80 percent homology to the second sequence using each of calculation methods (although, in practice, the different calculation methods will often result in different calculated homology percentages). [012.4] 2. Hybridization/Selective Hybridization  119. The term hybridization typically means a sequence driven interaction between at least two nucleic acid molecules, such as a primer or a probe and a gene. Sequence driven interaction means an interaction that occurs between
two nucleotides or nucleotide analogs or nucleotide deriva tives in a nucleotide specific manner. For example, G inter acting with C or A interacting with T are sequence driven interactions. Typically sequence driven interactions occur on the Watson-Crick face or Hoogsteen face of the nucleotide. The hybridization of two nucleic acids is affected by a num ber of conditions and parameters known to those of skill in the
Aug. 27, 2009
US 2009/0215172 A1
art. For example, the salt concentrations, pH, and temperature of the reaction all affect whether two nucleic acid molecules
will hybridize.  120. Parameters for selective hybridization between two nucleic acid molecules are well known to those of skill in
the art. For example, in some embodiments selective hybrid ization conditions can be defined as stringent hybridization conditions. For example, stringency of hybridization is con trolled by both temperature and salt concentration of either or both of the hybridization and washing steps. For example, the conditions of hybridization to achieve selective hybridization may involve hybridization in high ionic strength solution (6xSSC or 6×SSPE) at a temperature that is about 12-25°C. below the Tm (the melting temperature at which half of the molecules dissociate from their hybridization partners) fol lowed by washing at a combination of temperature and salt concentration chosen so that the washing temperature is about 5°C. to 20°C. below the Tm. The temperature and salt conditions are readily determined empirically in preliminary experiments in which samples of reference DNA immobi lized on filters are hybridized to a labeled nucleic acid of interest and then washed under conditions of different strin
gencies. Hybridization temperatures are typically higher for DNA-RNA and RNA-RNA hybridizations. The conditions can be used as described above to achieve stringency, or as is known in the art. (Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1989; Kunkeletal. Methods Enzy mol. 1987:154:367, 1987 which is herein incorporated by reference for material at least related to hybridization of nucleic acids). A preferable stringent hybridization condition for a DNA:DNA hybridization can be at about 68° C. (in aqueous solution) in 6×SSC or 6×SSPE followed by washing at 68°C. Stringency of hybridization and washing, if desired, can be reduced accordingly as the degree of complementarity desired is decreased, and further, depending upon the G-C or A-T richness of any area wherein variability is searched for. Likewise, stringency of hybridization and washing, if desired, can be increased accordingly as homology desired is increased, and further, depending upon the G-C or A-T rich ness of any area wherein high homology is desired, all as known in the art.
 121. Another way to define selective hybridization is by looking at the amount (percentage) of one of the nucleic acids bound to the other nucleic acid. For example, in some embodiments selective hybridization conditions would be when at least about, 60, 65, 70, 71, 72, 73, 74, 75, 76, 77, 78. 79,80, 81, 82, 83, 84,85, 86, 87, 88, 89, 90,91, 92,93, 94, 95,
96, 97, 98, 99, 100 percent of the limiting nucleic acid is bound to the non-limiting nucleic acid. Typically, the non limiting primer is in for example, 10 or 100 or 1000 fold excess. This type of assay can be performed at under condi tions where both the limiting and non-limiting primer are for example, 10 fold or 100 fold or 1000 fold below their k, or where only one of the nucleic acid molecules is 10 fold or 100 fold or 1000 fold or where one or both nucleic acid molecules
are above their ky.  122. Another way to define selective hybridization is by looking at the percentage of primer that gets enzymati cally manipulated under conditions where hybridization is required to promote the desired enzymatic manipulation. For example, in some embodiments selective hybridization con ditions would be when at least about, 60, 65, 70, 71, 72, 73. 74, 75,76, 77,78, 79,80,81, 82, 83, 84,85,86, 87,88, 89,90,
91, 92,93, 94, 95, 96, 97,98, 99, 100 percent of the primer is enzymatically manipulated under conditions which promote the enzymatic manipulation, for example if the enzymatic manipulation is DNA extension, then selective hybridization conditions would be when at least about 60, 65,70, 71, 72,73, 74, 75,76, 77,78, 79,80,81, 82, 83, 84,85,86, 87,88, 89,90,
91, 92,93, 94, 95, 96, 97, 98, 99, 100 percent of the primer molecules are extended. Preferred conditions also include
those suggested by the manufacturer or indicated in the art as being appropriate for the enzyme performing the manipula tion.
 123. Just as with homology, it is understood that there are a variety of methods herein disclosed for determin ing the level of hybridization between two nucleic acid mol ecules. It is understood that these methods and conditions
may provide different percentages of hybridization between two nucleic acid molecules, but unless otherwise indicated
meeting the parameters of any of the methods would be sufficient, For example if 80% hybridization was required and as long as hybridization occurs within the required param eters in any one of these methods it is considered disclosed herein.
[0130J 124. It is understood that those of skill in the art understand that if a composition or method meets any one of these criteria for determining hybridization either collec tively or singly it is a composition or method that is disclosed herein.
 3. Nucleic Acids [0132) 125. There are a variety of molecules disclosed herein that are nucleic acid based, including for example the nucleic acids that encode, for example, patellatnides and tri chamide as well as any other proteins disclosed herein, as well as various functional nucleic acids. The disclosed
nucleic acids are made up of for example, nucleotides, nucle otide analogs, or nucleotide substitutes. Non-limiting examples of these and other molecules are discussed herein. It is understood that for example, when a vector is expressed in a cell, that the expressed mRNA will typically be made up of A, C, G, and U. Likewise, it is understood that if, for
example, an antisense molecule is introduced into a cell or cell environment through for example exogenous delivery, it is advantagous that the antisense molecule be made up of nucleotide analogs that reduce the degradation of the anti sense molecule in the cellular environment.
[0133) a) Nucleotides and Related Molecules  126. A nucleotide is a molecule that contains a base moiety, a sugar moiety and a phosphate moiety. Nucleotides can be linked together through their phosphate moieties and sugar moieties creating an internucleoside linkage. The base moiety of a nucleotide can be adenin-9-yl (A), cytosin-1-yl (C), guanin-9-yl (G), uracil-1-yl (U), and thymin-1-yl (T). The sugar moiety of a nucleotide is a ribose or a deoxyribose. The phosphate moiety of a nucleotide is pentavalent phos phate. An non-limiting example of a nucleotide would be 3'-AMP (3'-adenosine monophosphate) or 5'-GMP (5'-gua nosine monophosphate).  127. A nucleotide analog is a nucleotide which con tains some type of modification to either the base, sugar, or phosphate moieties. Modifications to nucleotides are well known in the art and would include for example, 5-methyl cytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, and 2-aminoadenine as well as modifications at the sugar or phosphate moieties.
Aug. 27, 2009
US 2009/0215172 A1
analog, or nucleotide substitute. The Watson-Crick face of a nucleotide, nucleotide analog, or nucleotide substitute includes the C2, N1, and C6 positions of a purine based nucleotide, nucleotide analog, or nucleotide substitute and the C2, N3, C4 positions of a pyrimidine based nucleotide, nucleotide analog, or nucleotide substitute.  131. A Hoogsteen interaction is the interaction that takes place on the Hoogsteen face of a nucleotide or nucle otide analog, which is exposed in the major groove of duplex DNA. The Hoogsteen face includes the N7 position and reac tive groups (NH2 or 0) at the C6 position of purine nucle
ments the primers can also be extended using non-enzymatic techniques, where for example, the nucleotides or oligonucle otides used to extend the primer are modified such that they will chemically react to extend the primer in a sequence specific manner. Typically the disclosed primers hybridize with the nucleic acid or region of the nucleic acid or they hybridize with the complement of the nucleic acid or comple ment of a region of the nucleic acid.  d) Functional Nucleic Acids  135. Functional nucleic acids are nucleic acid mol ecules that have a specific function, such as binding a target molecule or catalyzing a specific reaction. Functional nucleic acid molecules can be divided into the following categories, which are not meant to be limiting. For example, functional nucleic acids include antisense molecules, aptamers, ribozymes, triplex forming molecules, and external guide sequences. The functional nucleic acid molecules can act as affectors, inhibitors, modulators, and stimulators of a specific activity possessed by a target molecule, or the functional nucleic acid molecules can possess a de novo activity inde pendent of any other molecules. [0147| 136. Functional nucleic acid molecules can interact with any macromolecule, such as DNA, RNA, polypeptides, or carbohydrate chains. The cyclic peptides disclosed herein can be encoded by functional nucleic acids, and indeed can be expressed in vivo. These functional nucleic acids encoding cyclic peptides and the necessary recognition sequences to cyclize them can have a wide variety of applications, as dis
cussed elsewhere herein.
[0140) b) Sequences  132. There are a variety of sequences related to, for example, patellamides and trichamides as well as any other protein disclosed herein that are disclosed on Genbank, and these sequences and others are herein incorporated by refer ence in their entireties as well as for individual subsequences
 137. Often functional nucleic acids are designed to interact with other nucleic acids based on sequence homology between the target molecule and the functional nucleic acid molecule. In other situations, the specific recognition between the functional nucleic acid molecule and the target molecule is not based on sequence homology between the functional nucleic acid molecule and the target molecule, but rather is based on the formation of tertiary structure that allows specific recognition to take place.  138. Antisense molecules are designed to interact with a target nucleic acid molecule through either canonical or non-canonical base pairing. The interaction of the anti sense molecule and the target molecule is designed to pro mote the destruction of the target molecule through, for example, RNAse?i mediated RNA-DNA hybrid degradation. Alternatively the antisense molecule is designed to interrupt a processing function that normally would take place on the target molecule, such as transcription or replication. Anti sense molecules can be designed based on the sequence of the target molecule. Numerous methods for optimization of anti sense efficiency by finding the most accessible regions of the target molecule exist. Exemplary methods would be in vitro selection experiments and DNA modification studies using DMS and DEPC. It is preferred that antisense molecules bind the target molecule with a dissociation constant (ku) less than
 128. Nucleotide substitutes are molecules having similar functional properties to nucleotides, but which do not contain a phosphate moiety, such as peptide nucleic acid (PNA). Nucleotide substitutes are molecules that will recog nize nucleic acids in a Watson-Crick or Hoogsteen manner, but which are linked together through a moiety other than a phosphate moiety. Nucleotide substitutes are able to conform to a double helix type structure when interacting with the appropriate target nucleic acid.  129. It is also possible to link other types of mol ecules (conjugates) to nucleotides or nucleotide analogs to enhance for example, cellular uptake. Conjugates can be chemically linked to the nucleotide or nucleotide analogs. Such conjugates include but are not limited to lipid moieties such as a cholesterol moiety. (Letsinger et al., Proc. Natl. Acad. Sci. USA, 1989, 86,6553-6556),  130. A Watson-Crick interaction is at least one inter action with the Watson-Crickface of a nucleotide, nucleotide
 133. A variety of sequences are provided herein and these and others can be found in Genbank, at www.pubmed. gov. Those of skill in the art understand how to resolve sequence discrepancies and differences and to adjust the com positions and methods relating to a particular sequence to other related sequences. Primers and/or probes can be designed for any sequence given the information disclosed herein and known in the art.
 c) Primers and Probes  134. Disclosed are compositions including primers and probes, which are capable of interacting with the genes disclosed herein. In certain embodiments the primers are used to support DNA amplification reactions. Typically the prim ers will be capable of being extended in a sequence specific manner. Extension of a primer in a sequence specific manner includes any methods wherein the sequence and/or composi tion of the nucleic acid molecule to which the primer is hybridized or otherwise associated directs or influences the composition or sequence of the product produced by the extension of the primer. Extension of the primerin a sequence specific manner therefore includes, but is not limited to, PCR, DNA sequencing, DNA extension, DNA polymerization, RNA transcription, or reverse transcription. Techniques and conditions that amplify the primer in a sequence specific manner are preferred. In certain embodiments the primers are used for the DNA amplification reactions, such as PCR or direct sequencing. It is understood that in certain embodi
or equal to 107°, 107°, 107", or 107*. A representative
sample of methods and techniques which aid in the design and use of antisense molecules can be found in the following non-limiting list of U.S. Pat. Nos. 5,135,917, 5,294,533, 5,627,158, 5,641,754, 5,691.317, 5,780,607, 5,849,903, 5,856,103, 5,919,772, 5,955,590, 5,994,320, 5,998,602, 6,005,095, 6,007,995, 6,017,898, 6,018,042, 6,025,198, 6,033,910, 6,046,004, 6,046,319, and 6,057,437.
5,786,138. 5,990,088, 6,013,522, 6,040,296,
Aug. 27, 2009
US 2009/0215172 A1
 139. Aptamers are molecules that interact with a target molecule, preferably in a specific way. Typically aptamers are small nucleic acids ranging from 15-50 bases in length that fold into defined secondary and tertiary structures, such as stem-loops or G-quartets. Aptamers can bind small molecules, such as ATP (U.S. Pat. No. 5,631,146) and theo philine (U.S. Pat. No. 5,580,737), as well as large molecules, such as reverse transcriptase (U.S. Pat. No. 5,786,462) and thrombin (U.S. Pat. No. 5,543,293). Aptamers can bind very
tightly with kds from the target molecule of less than 107* M. It is preferred that the aptamers bind the target molecule with a k, less than 107°, 107°, 107" or 107*. Aptamers can bind the target molecule with a very high degree of specificity. For example, aptamers have been isolated that have greater than a 10000 fold difference in binding affinities between the target molecule and another molecule that differ at only a single position on the molecule (U.S. Pat. No. 5,543,293). It is preferred that the aptamer have a k, with the target molecule at least 10, 100, 1000, 10,000, or 100,000 fold lower than the
ki, with a background binding molecule. It is preferred when doing the comparison for a polypeptide for example, that the background molecule be a different polypeptide. Represen tative examples of how to make and use aptamers to bind a variety of different target molecules can be found in the following non-limiting list of U.S. Pat. Nos. 5,476,766, 5,503,978, 5,631,146, 5,731,424, 5,780,228, 5.792,613, 5,795,721, 5,846,713, 5,858,660, 5,861,254, 5,864,026, 5,869,641, 5,958,691, 6,001,988, 6,011,020, 6,013,443,
6,020,130, 6,028,186, 6,030,776, and 6,051,698.  140. Ribozymes are nucleic acid molecules that are capable of catalyzing a chemical reaction, eitherintramolecu larly or intermolecularly. Ribozymes are thus catalytic nucleic acid. It is preferred that the ribozymes catalyze inter molecular, reactions. There are a number of different types of ribozymes that catalyze nuclease or nucleic acid polymerase type reactions which are based on ribozymes found in natural systems, such as hammerhead ribozymes, (for example, but not limited to the following U.S. Pat. Nos. 5,334,711, 5,436, 330, 5,616,466, 5,633,133, 5,646,020, 5,652,094, 5,712,384. 5,770,715, 5,856.463, 5,861,288, 5,891,683, 5,891,684, 5,985,621, 5,989,908, 5,998,193, 5,998,203, WO 98.58058
by Ludwig and Sproat, WO 98.58057 by Ludwig and Sproat, and WO 97.18312 by Ludwig and Sproat) hairpin ribozymes (for example, but not limited to the following U.S. Pat. Nos. 5,631,115, 5,646,031, 5,683.902, 5,712,384, 5,856,188,
5,866,701, 5,869,339, and 6,022,962), and tetrahymena ribozymes (for example, but not limited to the following U.S. Pat. Nos. 5,595,873 and 5,652,107). There are also a number of ribozymes that are not found in natural systems, but which have been engineered to catalyze specific reactions de novo (for example, but not limited to the following U.S. Pat. Nos. 5,580,967, 5,688,670, 5,807,718, and 5,910,408). Preferred ribozymes cleave RNA or DNA substrates, and more prefer ably cleave RNA substrates. Ribozymes typically cleave nucleic acid substrates through recognition and binding of the target substrate with subsequent cleavage. This recognition is often based mostly on canonical or non-canonical base pair interactions. This property makes ribozymes particularly good candidates for target specific cleavage of nucleic acids because recognition of the target substrate is based on the target substrates sequence. Representative examples of how to make and use ribozymes to catalyze a variety of different reactions can be found in the following non-limiting list of U.S. Pat. Nos. 5,646,042, 5,693,535, 5,731,295, 5,811,300,
5,837,855, 5,869,253, 5,877,021, 5,877,022, 5,972,699, 5,972,704, 5,989,906, and 6,017,756.
. 141. Triplex forming functional nucleic acid mol ecules are molecules that can interact with either double
stranded or single-stranded nucleic acid. When triplex mol ecules interact with a target region, a structure called a triplex is formed, in which there are three strands of DNA forming a complex dependant on both Watson-Crick and Hoogsteen base-pairing. Triplex molecules are preferred because they can bind target regions with high affinity and specificity. It is preferred that the triplex forming molecules bind the target
molecule with a k, less than 107°, 107°, 107", or 107*.
Representative examples of how to make and use triplex forming molecules to bind a variety of different target mol ecules can be found in the following non-limiting list of U.S. Pat. Nos. 5,176,996, 5,645,985, 5,650,316, 5,683,874, 5,693, 773, 5,834,185, 5,869,246, 5,874,566, and 5,962,426.
 142. External guide sequences (EGSs) are mol ecules that bind a target nucleic acid molecule forming a complex, and this complex is recognized by RNase P, which cleaves the target molecule. EGSs can be designed to specifi cally target a RNA molecule of choice. RNAse P aids in processing transfer RNA (tRNA) within a cell. Bacterial RNAse P can be recruited to cleave virtually any RNA sequence by using an EGS that causes the target RNA:EGS complex to mimic the natural tRNA substrate. (WO92/03566 by Yale, and Forster and Altman, Science 238:407-409 (1990)). [0154) 143. Similarly, eukaryotic EGS/RNAse P-directed cleavage of RNA can be utilized to cleave desired targets within eukarotic cells. (Yuan et al., Proc. Natl. Acad. Sci. USA 89:8006-8010 (1992); WO 93/22434 by Yale, WO 95/24489 by Yale, Yuan and Altman, EMBO J. 14:159-168 (1995), and Carrara et al., Proc. Natl. Acad. Sci. (USA) 92:2627-2631 (1995)). Representative examples of how to make and use EGS molecules to facilitate cleavage of a vari ety of different target molecules be found in the following non-limiting list of U.S. Pat. Nos. 5,168,053, 5,624,824, 5,683,873, 5,728,521, 5,869,248, and 5,877,162.
 4. Vectors and Fusion Polypeptides  144. Disclosed herein are vectors comprising a nucleotide sequence encoding a fusion polypeptide. These vectors can be used to produce a cyclized peptide of interest, are useful with libraries and combinatorial chemistry tech niques (discussed below), and are useful within vivo systems.  145. For example, disclosed herein is a vector com prising, from N-terminus to C-terminus: a) a C-terminal domain comprising SEQID NO: 10 (GLEAS); b) a peptide; c) an N-terminal domain comprising SEQ ID NO: 11 (AYDGVEPS); wherein the fusion polypeptide is able to cyclize the peptide to produce a cyclic peptide in a mamma lian cell.
 146. Also disclosed is a vector comprising a nucle otide sequence encoding a fusion polypeptide comprising, from N-terminus to C-terminus: a) a C-terminal domain com prising SEQ ID NO: 11 (AYDGVEPS); b) a peptide; c) an N-terminal domain comprising SEQ ID NO: 12 (AYDGE): wherein the fusion polypeptide is able to cyclize the peptide to produce a cyclic peptide in a cell. This cell can be prokary otic, such as E. coli, or eukaryotic, such as a mammalian cell.  147. Also disclosed herein is a vector comprising a nucleotide sequence encoding a fusion polypeptide compris ing, from N-terminus to C-terminus: a) a C-terminal domain comprising SEQ ID NO: 10 (GLEAS); b) a peptide; c) an
US 2009/0215172 A1
N-terminal domain comprising SEQ ID NO: 12 (AYDGE): wherein the fusion polypeptide is able to cyclize the peptide to produce a cyclic peptide in a cell. This cell can be prokary otic, such as E. coli, or eukaryotic, such as a mammalian cell.  148. Also disclosed herein is a vector comprising a nucleotide sequence encoding a fusion polypeptide compris ing, from N-terminus to C-terminus: a) a C-terminal domain comprising SEQ ID NO: 13 (GPGPS); b) a peptide; c) an N-terminal domain comprising SEQ ID NO: 14 (SYDGDDE); wherein the fusion polypeptide is able to cyclize the peptide to produce a cyclic peptide in a cell. This cell can be prokaryotic, such as E. coli, or eukaryotic, such as a mammalian cell.
 149. The vectors disclosed above can comprise a random peptide, which are discussed in greater detail below. The peptide of interest (the coding sequence) can be derived from a cDNA library. For example, each vector in the library can encode a different fusion polypeptide. In a further example, the peptide of interest of each different fusion polypeptide can be different. The peptide of interest can be a random peptide at least 3 amino acids in length, as discussed below.
 150. Also disclosed is a cell comprising the vectors discussed above, or progeny thereof. This cell can be prokary otic, or eukaryotic, such as a mammalian cell. Examples of such cells include a tumor cell, a liver cell, a hepatocyte, a mast cell and a lymphocyte cell. The cell can also be a human cell.
 151. There are a number of compositions and meth ods which can be used to deliver nucleic acids, such as those
encoding the cyclic peptides disclosed herein, to cells, either in vitro or in vivo. These methods and compositions can largely be broken down into two classes: viral based delivery systems and non-viral based delivery systems. For example, the nucleic acids can be delivered through a number of direct delivery systems such as, electroporation, lipofection, cal cium phosphate precipitation, plasmids, viral vectors, viral nucleic acids, phage nucleic acids, phages, cosmids, or via transfer of genetic material in cells or carriers such as cationic liposomes. Appropriate means for transfection, including viral vectors, chemical transfectants, or physico-mechanical methods such as electroporation and direct diffusion of DNA, are described by, for example, Wolff, J. A., et al., Science, 247, 1465–1468, (1990); and Wolff, J. A. Nature, 352,815 818, (1991). Such methods are well known in the art and readily adaptable for use with the compositions and methods described herein. In certain cases, the methods will be modi
fied to specifically function with large DNA molecules. Fur ther, these methods can be used to target certain diseases and cell populations by using the targeting characteristics of the carrier.
 a) Nucleic Acid Based Delivery Systems  152. Transfer vectors can be any nucleotide con struction used to deliver genes into cells (e.g., a plasmid), or as part of a general strategy to deliver genes, e.g., as part of recombinant retrovirus or adenovirus (Ram et al. Cancer Res. 53:83-88, (1993)).  153. As used herein, plasmid or viral vectors are agents that transport the disclosed nucleic acids, such as those encoding cyclic peptides, into the cell without degradation and include a promoter yielding expression of the gene in the cells into which it is delivered. In some embodiments the
peptides are derived from either a virus or a retrovirus. Viral vectors are, for example, Adenovirus, Adeno-associated
Aug. 27, 2009 virus, Herpes virus, Vaccinia virus, Polio virus, AIDS virus, neuronal trophic virus, Sindbis and other RNA viruses, including these viruses with the HIV backbone. Also pre ferred are any viral families which share the properties of these viruses which make them suitable for use as vectors.
Retroviruses include Murine Maloney Leukemia virus, MMLV, and retroviruses that express the desirable properties of MMLV as a vector. Retroviral vectors are able to carry a larger genetic payload, i.e., a transgene or marker gene, than other viral vectors, and for this reason are a commonly used vector. However, they are not as useful in non-proliferating cells. Adenovirus vectors are relatively stable and easy to work with, have high titers, and can be delivered in aerosol formulation, and can transfect non-dividing cells. Pox viral vectors are large and have several sites for inserting genes, they are thermostable and can be stored at room temperature. A preferred embodiment is a viral vector which has been engineered so as to suppress the immune response of the host organism, elicited by the viral antigens. Preferred vectors of this type will carry coding regions for Interleukin 8 or 10.  154. Viral vectors can have higher transaction (abil ity to introduce genes) abilities than chemical or physical methods to introduce genes into cells. Typically, viral vectors contain, nonstructural early genes, structural late genes, an RNA polymerase III transcript, inverted terminal repeats nec essary for replication and encapsidation, and promoters to control the transcription and replication of the viral genome. When engineered as vectors, viruses typically have one or more of the early genes removed and a gene or gene/promotor cassette is inserted into the viral genome in place of the removed viral DNA. Constructs of this type can carry up to about 8 kb of foreign genetic material. The necessary func tions of the removed early genes are typically supplied by cell lines which have been engineered to express the gene prod ucts of the early genes in trans. (0168] (1) Retroviral Vectors  155. A retrovirus is an animal virus belonging to the virus family of Retroviridae, including any types, subfami lies, genus, or tropisms. Retroviral vectors, in general, are described by Verma, I. M., Retroviral vectors for gene trans fer. In Microbiology-1985, American Society for Microbiol ogy, pp. 229-232, Washington, (1985), which is incorporated by reference herein. Examples of methods for using retroviral vectors for genetherapy are described in U.S. Pat. Nos. 4,868, 116 and 4,980,286; PCT applications WO 90/02806 and WO 89/07136; and Mulligan, (Science 260:926-932 (1993)); the teachings of which are incorporated herein by reference.  156. A retrovirus is essentially a package which has packed into it nucleic acid cargo. The nucleic acid cargo carries with it a packaging signal, which ensures that the replicated daughter molecules will be efficiently packaged within the package coat. In addition to the package signal, there are a number of molecules which are needed in cis, for
the replication, and packaging of the replicated virus. Typi cally a retroviral genome, contains the gag, pol, and envgenes which are involved in the making of the protein coat. It is the gag, pol, and env genes which are typically replaced by the foreign DNA that it is to be transferred to the target cell. Retrovirus vectors typically contain a packaging signal for incorporation into the package coat, a sequence which signals the start of the gag transcription unit, elements necessary for reverse transcription, including a primer binding site to bind the tRNA primer of reverse transcription, terminal repeat sequences that guide the switch of RNA strands during DNA
Aug. 27, 2009
US 2009/0215172 A1
synthesis, a purine rich sequence 5' to the 3' LTR that serve as the priming site for the synthesis of the second strand of DNA synthesis, and specific sequences near the ends of the LTRs that enable the insertion of the DNA state of the retrovirus to
insert into the host genome. The removal of the gag, pol, and env genes allows for about 8 kb of foreign sequence to be inserted into the viral genome, become reverse transcribed, and upon replication be packaged into a new retroviral par ticle. This amount of nucleic acid is sufficient for the delivery of a one to many genes depending on the size of each tran script. It is preferable to include either positive or negative selectable markers along with other genes in the insert.  157. Since the replication machinery and packaging proteins in most retroviral vectors have been removed (gag, pol, and env), the vectors are typically generated by placing them into a packaging cell line. A packaging cell line is a cell line which has been transfected or transformed with a retro
virus that contains the replication and packaging machinery, but lacks any packaging signal. When the vector carrying the DNA of choice is transfected into these cell lines, the vector
containing the gene of interest is replicated and packaged into new retroviral particles, by the machinery provided in cis by the helper cell. The genomes for the machinery are not pack aged because they lack the necessary signals.  (2) Adenoviral Vectors  158. The construction of replication-defective aden oviruses has been described (Berkner et al., J. Virology 61:1213-1220 (1987); Massie et al., Mol. Cell. Biol. 6:2872 2883 (1986); Haj-Ahmad et al., J. Virology 57:267-274 (1986); Davidson et al., J. Virology 61:1226-1239 (1987); Zhang “Generation and identification of recombinant aden ovirus by liposome-mediated transfection and PCR analysis” BioTechniques 15:868-872 (1993)). The benefit of the use of these viruses as vectors is that they are limited in the extent to which they can spread to other cell types, since they can replicate within an initial infected cell, but are unable to form new infectious viral particles. Recombinant adenoviruses have been shown to achieve high efficiency gene transfer after direct, in vivo delivery to airway epithelium, hepatocytes, vascular endothelium, CNS parenchyma and a number of other tissue sites (Morsy, J. Clin. Invest. 92:1580-1586 (1993); Kirshenbaum, J. Clin. Invest. 92:381-387 (1993); Roessler, J. Clin. Invest. 92:1085-1092 (1993); Moullier, Nature Genetics 4:154-159 (1993); La Salle, Science 259: 988-990 (1993); Gomez-Foix, J. Biol. Chem. 267:25129 25134 (1992); Rich, Human Gene Therapy 4:461-476 (1993); Zabner, Nature Genetics 6:75-83 (1994); Guzman, Circulation Research 73:1201-1207 (1993); Bout, Human Gene Therapy 5:3-10 (1994); Zabner, Cell 75:207-216 (1993); Caillaud, Eur, J. Neuroscience 5:1287-1291 (1993); and Ragot, J. Gen. Virology 74:501-507 (1993)). Recombi nant adenoviruses achieve gene transduction by binding to specific cell surface receptors, after which the virus is inter malized by receptor-mediated endocytosis, in the same man ner as wild type or replication-defective adenovirus (Char donnet and Dales, Virology 40:462-477 (1970); Brown and Burlingham, J. Virology 12:386-396 (1973); Svensson and Persson, J. Virology 55:442-449 (1985); Seth, et al., J. Virol. 51:650-655 (1984); Seth, et al., Mol. Cell. Biol. 4:1528-1533 (1984); Varga et al., J. Virology 65:6061-6070 (1991); Wick ham et al., Cell 73:309-319 (1993)).  159. A viral vector can be one based on an adenovi rus which has had the E1 gene removed and these virons are generated in a cell line such as the human 293 cell line. In
another preferred embodiment both the E1 and E3 genes are removed from the adenovirus genome. (0175] (3) Adeno-Associated Viral Vectors  160. Another type of viral vector is based on an adeno-associated virus (AAV). This defective parvovirus is a preferred vector because it can infect many cell types and is nonpathogenic to humans. AAV type vectors can transport about 4 to 5 kb and wild type AAV is known to stably insert into chromosome 19. Vectors which contain this site specific integration property are preferred. An especially preferred embodiment of this type of vector is the P4.1 C vector pro duced by Avigen, San Francisco, Calif., which can contain the herpes simplex virus thymidine kinase gene, HSV-tk, and/or a marker gene, such as the gene encoding the green fluores cent protein, GFP.  161. In another type of AAV virus, the AAV contains a pair of inverted terminal repeats (ITRs) which flank at least one cassette containing a promoter which directs cell-specific expression operably linked to a heterologous gene. Heterolo gous in this context refers to any nucleotide sequence or gene which is not native to the AAV or B 19 parvovirus.  162. Typically the AAV and B19 coding regions have been deleted, resulting in a safe, noncytotoxic vector. The AAV ITRs, or modifications thereof, confer infectivity and site-specific integration, but not cytotoxicity, and the promoter directs cell-specific expression. United states Patent No. 6,261,834 is herein incorporated by reference for mate rial related to the AAV vector.
 163. The disclosed vectors thus provide DNA mol ecules which are capable of integration into a mammalian chromosome without substantial toxicity. [0180) 164. The inserted genes in viral and retroviral usu ally contain promoters, and/or enhancers to help control the expression of the desired gene product. A promoter is gener ally a sequence or sequences of DNA that function when in a relatively fixed location in regard to the transcription start site. A promoter contains core elements required for basic interaction of RNA polymerase and transcription factors, and may contain upstream elements and response elements.  (4) Large Payload Viral Vectors  165. Molecular genetic experiments with large human herpesviruses have provided a means whereby large heterologous DNA fragments can be cloned, propagated and established in cells permissive for infection with herpesvi ruses (Sun et al., Nature genetics 8:33-41, 1994; Cotter and Robertson, Curr Opin Mol Ther 5: 633-644, 1999). These large DNA viruses (herpes simplex virus (HSV) and Epstein Barr virus (EBV), have the potential to deliver fragments of human heterologous DNA-150 kb to specific cells. EBV recombinants can maintain large pieces of DNA in the infected B-cells as episomal DNA. Individual clones carried human genomic inserts up to 330 kb appeared genetically stable The maintenance of these episomes requires a specific EBV nuclear protein, EBNA1, constitutively expressed dur ing infection with EBV. Additionally, these vectors can be used for transfection, where large amounts of protein can be generated transiently in vitro. Herpesvirus amplicon systems are also being used to package pieces of DNA-220 kb and to infect cells that can stably maintain DNA as episomes. [0183) 166. Other useful systems include, for example, rep licating and host-restricted non-replicating vaccinia virus VectorS.
Aug. 27, 2009
US 2009/0215172 A1
 b) Non-Nucleic Acid Based Systems  167. The disclosed compositions can be delivered to the target cells in a variety of ways. For example, the compo sitions can be delivered through electroporation, or through lipofection, or through calcium phosphate precipitation. The delivery mechanism chosen will depend in part on the type of cell targeted and whether the delivery is occurring for example in vivo or in vitro.  168. Thus, the compositions can comprise, in addi tion to the disclosed vectors for example, lipids such as lipo somes, such as cationic liposomes (e.g., DOTMA, DOPE, DC-cholesterol) or anionic liposomes. Liposomes can further comprise proteins to facilitate targeting a particular cell, if desired. Administration of a composition comprising a com pound and a cationic liposome can be administered to the blood afferent to a target organ or inhaled into the respiratory tract to target cells of the respiratory tract. Regarding lipo somes, see, e.g., Brigham et al. Am. J. Resp. Cell. Mol. Biol. 1:95-100 (1989); Felgner et al. Proc. Natl. Acad. Sci. USA 84:7413-7417 (1987); U.S. Pat. No. 4,897,355. Furthermore, the compound can be administered as a component of a microcapsule that can be targeted to specific cell types, such as macrophages, or where the diffusion of the compound or delivery of the compound from the microcapsule is designed for a specific rate or dosage.  169. In the methods described above which include the administration and uptake of exogenous DNA into the cells of a subject (i.e., gene transduction or transfection), delivery of the compositions to cells can be via a variety of mechanisms. As one example, delivery can be via a liposome, using commercially available liposome preparations such as LIPOFECTIN, LIPOFECTAMINE (GIBCO-BRL, Inc., Gaithersburg, Md.), SUPERFECT (Qiagen, Inc. Hilden, Ger many) and TRANSFECTAM (Promega Biotec, Inc., Madi son, Wis.), as well as other liposomes developed according to procedures standard in the art. In addition, the disclosed nucleic acid or vector can be delivered in vivo by electropo ration, the technology for which is available from Genetron ics, Inc. (San Diego, Calif.) as well as by means of a SONOPORATION machine (ImakX Pharmaceutical Corp., Tucson, Ariz.).  170. The materials may be in solution, suspension (for example, incorporated into microparticles, liposomes, or cells). These may be targeted to a particular cell type via antibodies, receptors, or receptor ligands. The following ref erences are examples of the use of this technology to target specific proteins to tumor tissue (Senter, et al., Bioconjugate Chem., 2:447-451, (1991); Bagshawe, K. D., Br. J. Cancer, 60:275-281, (1989); Bagshawe, et al., Br. J. Cancer, 58.700 703, (1988); Senter, et al., Bioconjugate Chem., 4:3-9, (1993); Battelli, et al., Cancer Immunol. Immunother, 35:421-425, (1992); Pietersz and McKenzie, Immunolog. Reviews, 129:57-80, (1992); and Roffler, et al., Biochem. Pharmacol, 42:2062-2065, (1991)). These techniques can be used for a variety of other specific cell types. Vehicles such as “stealth” and other antibody conjugated liposomes (including lipid mediated drug targeting to colonic carcinoma), receptor mediated targeting of DNA through cell specific ligands, lymphocyte directed tumor targeting, and highly specific therapeutic retroviraltargeting of murine glioma cells in vivo. The following references are examples of the use of this technology to target specific proteins to tumor tissue (Hughes et al., Cancer Research, 49:6214-6220, (1989); and Litzinger and Huang, Biochimica et Biophysica Acta, 1104:179-187,
(1992)). In general, receptors are involved in pathways of endocytosis, either constitutive or ligand induced. These receptors cluster in clathrin-coated pits, enter the cell via clathrin-coated vesicles, pass through an acidified endosome in which the receptors are sorted, and then either recycle to the cell surface, become stored intracellularly, or are degraded in lysosomes. The internalization pathways serve a variety of functions, such as nutrient uptake, removal of activated pro teins, clearance of macromolecules, opportunistic entry of viruses and toxins, dissociation and degradation of ligand, and receptor-level regulation. Many receptors follow more than one intracellular pathway, depending on the cell type, receptor concentration, type of ligand, ligand valency, and ligand concentration. Molecular and cellular mechanisms of receptor-mediated endocytosis has been reviewed (Brown and Greene, DNA and Cell Biology 10.6, 399-409 (1991)). (0189] 171. Nucleic acids that are delivered to cells which are to be integrated into the host cell genome, typically con tain integration sequences. These sequences are often viral related sequences, particularly when viral based systems are used. These viral intergration systems can also be incorpo rated into nucleic acids which are to be delivered using a non-nucleic acid based system of deliver, such as a liposome, so that the nucleic acid contained in the delivery system can be come integrated into the host genome.  172. Other general techniques for integration into the host genome include, for example, systems designed to promote homologous recombination with the host genome. These systems typically rely on sequence flanking the nucleic acid to be expressed that has enough homology with a target sequence within the host cell genome that recombination between the vector nucleic acid and the target nucleic acid takes place, causing the delivered nucleic acid to be integrated into the host genome. These systems and the methods neces sary to promote homologous recombination are known to those of skill in the art.
(0191) c) In Vivo/Ex Vivo  173. As described above, the compositions can be administered in a pharmaceutically acceptable carrier and can be delivered to the subject=s cells in vivo and/or ex vivo by a variety of mechanisms well known in the art (e.g., uptake of naked DNA, liposome fusion, intramuscular injection of DNA via a gene gun, endocytosis and the like).  174. If ex vivo methods are employed, cells or tis sues can be removed and maintained outside the body accord ing to standard protocols well known in the art. The compo sitions can be introduced into the cells via any gene transfer mechanism, such as, for example, calcium phosphate medi ated gene delivery, electroporation, microinjection or proteo liposomes. The transduced cells can then be infused (e.g., in a pharmaceutically acceptable carrier) or homotopically transplanted back into the subject per standard methods for the cell or tissue type. Standard methods are known for trans plantation or infusion of various cells into a subject.  5. Expression Systems (0195] 175. The nucleic acids that are delivered to cells typically contain expression controlling systems. For example, the inserted genes in viral and retroviral systems usually contain promoters, and/or enhancers to help control the expression of the desired gene product. A promoter is generally a sequence or sequences of DNA that function when in a relatively fixed location in regard to the transcrip tion start site. A promoter contains core elements required for
US 2009/0215172 A1
basic interaction of RNA polymerase and transcription fac tors, and may contain upstream elements and response ele mentS.
[0196) a) Viral Promoters and Enhancers  176. Preferred promoters controlling transcription from vectors in mammalian host cells may be obtained from various sources, for example, the genomes of viruses such as: polyoma, Simian Virus 40 (SV40), adenovirus, retroviruses, hepatitis-B virus and most preferably cytomegalovirus, or from heterologous mammalian promoters, e.g. beta actin pro moter. The early and late promoters of the SV40 virus are conveniently obtained as an SV40 restriction fragment which also contains the SV40 viral origin of replication (Fiers et al., Nature, 273: 113 (1978)). The immediate early promoter of the human cytomegalovirus is conveniently obtained as a HindIII E restriction fragment (Greenway, P. J. et al., Gene 18:355-360 (1982). Of course, promoters from the host cell or related species also are useful herein.  177. Enhancer generally refers to a sequence of DNA that functions at no fixed distance from the transcription start site and can be either 5' (Laimins, L. et al., Proc. Natl. Acad. Sci. 78; 993 (1981)) or 3 (Lusky, M.L., et al., Mol. Cell. Bio. 3: 1108 (1983)) to the transcription unit. Furthermore, enhancers can be within an intron (Banerji, J. L. et al., Cell 33: 729 (1983)) as well as within the coding sequence itself (Osborne, T. F., et al., Mol. Cell. Bio. 4: 1293 (1984)). They are usually between 10 and 300 bp in length, and they func tion in cis. Enhancers function to increase transcription from nearby promoters. Enhancers also often contain response ele ments that mediate the regulation of transcription. Promoters can also contain response elements that mediate the regula tion of transcription. Enhancers often determine the regula tion of expression of a gene. While many enhancer sequences are now known from mammalian genes (globin, elastase, albumin, -fetoprotein and insulin), typically one will use an enhancer from a eukaryotic cell virus for general expression. Preferred examples are the SV40 enhancer on the late side of the replication origin (bp 100–270), the cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and adenovirus enhancers.  178. The promotor and/or enhancer may be specifi cally activated either by light or specific chemical events which trigger their function. Systems can be regulated by reagents such as tetracycline and dexamethasone. There are also ways to enhance viral vector gene expression by expo sure to irradiation, such as gamma irradiation, or alkylating chemotherapy drugs. [0200) 179. In certain embodiments the promoter and/or enhancer region can act as a constitutive promoter and/or enhancer to maximize expression of the region of the tran scription unit to be transcribed. In certain constructs the pro moter and/or enhancer region be active in all eukaryotic cell types, even if it is only expressed in a particular type of cell at a particular time. A preferred promoter of this type is the CMV promoter (650 bases). Other preferred promoters are SV40 promoters, cytomegalovirus (full length promoter), and retroviral vector LTR.
 180. It has been shown that all specific regulatory elements can be cloned and used to construct expression vectors that are selectively expressed in specific cell types such as melanoma cells. The glial fibrillary acetic protein (GFAP) promoter has been used to selectively express genes in cells of glial origin.
Aug. 27, 2009  181. Expression vectors used in eukaryotic host cells (yeast, fingi, insect, plant, animal, human or nucleated cells) may also contain sequences necessary for the termina tion of transcription which may affect mRNA expression. These regions are transcribed as polyadenylated segments in the untranslated portion of the mRNA encoding tissue factor protein. The 3' untranslated regions also include transcription termination sites. It is preferred that the transcription unit also contain a polyadenylation region. One benefit of this region is that it increases the likelihood that the transcribed unit will be
processed and transported like mRNA. The identification and use of polyadenylation signals in expression constructs is well established. It is preferred that homologous polyadeny lation signals be used in the transgene constructs. In certain transcription units, the polyadenylation region is derived from the SV40 early polyadenylation signal and consists of about 400 bases. It is also preferred that the transcribed units contain other standard sequences alone or in combination with the above sequences improve expression from, or stabil ity of the construct.  b) Markers  182. The viral vectors can include nucleic acid sequence encoding a marker product. This marker product is used to determine if the gene has been delivered to the cell and once delivered is being expressed. Preferred marker genes are the E. Coli lacz gene, which encodes fl-galactosidase, and green fluorescent protein.  183. In some embodiments the marker may be a selectable marker. Examples of suitable selectable markers for mammalian cells are dihydrofolate reductase (DHFR), thymidine kinase, neomycin, neomycin analog G418, hydro mycin, and puromycin. When such selectable markers are successfully transferred into a mammalian host cell, the trans formed mammalian host cell can survive if placed under selective pressure. There are two widely used distinct catego ries of selective regimes. The first category is based on a cell’s metabolism and the use of a mutant cell line which lacks the
ability to grow independent of a supplemented media. Two examples are: CHODHFR-cells and mouse LTK-cells. These cells lack the ability to grow without the addition of such nutrients as thymidine or hypoxanthine. Because these cells lack certain genes necessary for a complete nucleotide syn thesis pathway, they cannot survive unless the missing nucle otides are provided in a supplemented media. An alternative to supplementing the media is to introduce an intact DHFR or TK gene into cells lacking the respective genes, thus altering their growth requirements. Individual cells which were not transformed with the DHFR or TK gene will not be capable of survival in non-supplemented media.  184. The second category is dominant selection which refers to a selection scheme used in any cell type and does not require the use of a mutant cell line. These schemes typically use a drug to arrest growth of a host cell. Those cells which have a novel gene would express a protein conveying drug resistance and would survive the selection. Examples of such dominant selection use the drugs neomycin, (Southern P. and Berg, P., J. Molec. Appl. Genet. 1; 327 (1982)), myco phenolic acid, (Mulligan, R. C. and Berg, P. Science 209: 1422 (1980)) or hygromycin, (Sugden, B. et al., Mol. Cell. Biol. 5: 410-413 (1985)). The three examples employ bacte rial genes under eukaryotic control to convey resistance to the appropriate drug G418 or neomycin (geneticin), xgpt (myco phenolic acid) or hygromycin, respectively. Others include the neomycin analog G418 and puramycin
Aug. 27, 2009
US 2009/0215172 A1
 6. Peptides [0208) a) Protein Variants  185. As discussed herein, the coding sequence of the peptides disclosed herein can vary widely and still be cyclized. Furthermore, the recognition sequences, which must have more specificity but which can still have some degree of variance and remain functional, are also disclosed herein. For example, there are numerous variants of the cod ing sequences that are known and herein contemplated. In addition, to the known functional strain variants there are
derivatives of the these proteins which also function in the disclosed methods and compositions. Protein variants and derivatives are well understood to those of skill in the art and
in can involve amino acid sequence modifications. For example, amino acid sequence modifications typically fall into one or more of three classes: substitutional, insertional or
deletional variants. Insertions include amino and/or carboxyl terminal fusions as well as intrasequence insertions of single or multiple amino acid residues. Insertions ordinarily will be smaller insertions than those of amino or carboxyl terminal fusions, for example, on the order of one to to four residues. Immunogenic fusion protein derivatives, such as those described in the examples, are made by fusing a polypeptide sufficiently large to confer immunogenicity to the target sequence by cross-linking in vitro or by recombinant cell culture transformed with DNA encoding the fusion. Dele tions are characterized by the removal of one or more amino acid residues from the protein sequence. Typically, no more than about from 2 to 6 residues are deleted at any one site within the protein molecule. These variants ordinarily are prepared by site specific mutagenesis of nucleotides in the DNA encoding the protein, thereby producing DNA encoding the variant, and thereafter expressing the DNA in recombi nant cell culture. Techniques for making substitution muta tions at predetermined sites in DNA having a known sequence are well known, for example M13 primer mutagenesis and PCR mutagenesis. Amino acid substitutions are typically of single residues, but can occur at a number of different loca tions at once; insertions usually will be on the order of about from 1 to 10 amino acid residues; and deletions will range about from 1 to 30 residues. Deletions orinsertions preferably are made in adjacent pairs, i.e. a deletion of 2 residues or insertion of 2 residues. Substitutions, deletions, insertions or
any combination thereof may be combined to arrive at a final construct. The mutations must not place the sequence out of reading frame and preferably will not create complementary regions that could produce secondary mRNA structure. Sub stitutional variants are those in which at least one residue has
been removed and a different residue inserted in its place. Such substitutions generally are made in accordance with the following Tables 1 and 2 and are referred to as conservative substitutions. TABLE 1 Amino Acid Abbreviations
Original Residue Exemplary Conservative Substitutions, others are known in the art.
Arg; lys, gln Asm; gln; his Asp; glu Cys; ser Gln; asn, lys Glu; asp Gly; pro His; asn; gln Ile; leu; val Leu; ile; val Lys; arg; gln Met; leu; ile Phe, met; leu; tyr Ser; thr Thr; ser Trp; tyr Tyr; trp; phe Val; ile; leu
 186. Substantial changes in function or immuno logical identity are made by selecting substitutions that are less conservative than those in Table 2, i.e., selecting residues that differ more significantly in their effect on maintaining (a) the structure of the polypeptide backbone in the area of the substitution, for example as a sheet or helical conformation, (b) the charge or hydrophobicity of the molecule at the target site or (c) the bulk of the side chain. The substitutions which in general are expected to produce the greatest changes in the protein properties will be those in which (a) a hydrophilic residue, e.g. seryl or threonyl, is substituted for (or by) a hydrophobic residue, e.g. leucyl, isoleucyl, phenylalanyl, valyl or alanyl; (b) a cysteine or proline is substituted for (or by) any other residue; (c) a residue having an electropositive side chain, e.g., lysyl, arginyl, or histidyl, is substituted for (or by) an electronegative residue, e.g., glutamyl or aspartyl; or (d) a residuehaving a bulky side chain, e.g., phenylalanine, is substituted for (or by) one not having a side chain, e.g., glycine, in this case, (e) by increasing the number of sites for sulfation and/or glycosylation.
Aug. 27, 2009
US 2009/0215172 A1
 187. For example, the replacement of one amino acid residue with another that is biologically and/or chemi cally similar is known to those skilled in the art as a conser vative substitution. For example, a conservative substitution would be replacing one hydrophobic residue for another, or one polar residue for another. The substitutions include com binations such as, for example, Gly, Ala; Val, Ile, Leu: Asp, Glu, Asm, Gln; Ser, Thr; Lys, Arg; and Phe, Tyr. Such conser vatively substituted variations of each explicitly disclosed sequence are included within the mosaic polypeptides pro vided herein.
 188. Substitutional ordeletional mutagenesis can be employed to insert sites for N-glycosylation (Asn-X-Thr/Ser) or O-glycosylation (Seror Thr). Deletions of cysteine or other labile residues also may be desirable. Deletions or substitu tions of potential proteolysis sites, e.g. Arg, is accomplished for example by deleting one of the basic residues or substi tuting one by glutaminyl or histidyl residues.  189. Certain post-translational derivatizations are the result of the action of recombinant host cells on the
expressed polypeptide. Glutaminyl and asparaginyl residues are frequently post-translationally deamidated to the corre sponding glutamyl and asparyl residues. Alternatively, these residues are deamidated under mildly acidic conditions. Other post-translational modifications include hydroxylation of proline and lysine, phosphorylation of hydroxyl groups of seryl or threonyl residues, methylation of the o-amino groups of lysine, arginine, and histidine side chains (T. E. Creighton, Proteins: Structure and Molecular Properties, W. H. Freeman & Co., San Francisco pp 79-86 ), acetylation of the N-terminal amine and, in some instances, amidation of the
C-terminal carboxyl.  190. It is understood that one way to define the variants and derivatives of the disclosed proteins herein is through defining the variants and derivatives in terms of homology/identity to specific known sequences. [0216) 191. Specifically disclosed are variants of these and other proteins herein disclosed which have at least, 70% or 75% or 80% or 85% or 90% or 95% homology to the stated sequence. Those of skill in the art readily understand how to determine the homology of two proteins. For example, the homology can be calculated after aligning the two sequences so that the homology is at its highest level.  192. Another way of calculating homology can be performed by published algorithms. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48: 443 (1970), by the search for similarity method of Pearson and Lipman, Proc. Natl. Acad. Sci. U.S.A. 85: 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software
Package, Genetics Computer Group, 575 Science Dr., Madi son, Wis.), or by inspection.  193. The same types of homology can be obtained for nucleic acids by for example the algorithms disclosed in Zuker, M. Science 244:48-52, 1989, Jaeger et al. Proc. Natl. Acad. Sci. USA 86:7706-7710, 1989, Jaeger et al. Methods Enzymol. 183:281-306, 1989 which are herein incorporated by reference for at least material related to nucleic acid align ment.
[0219) 194. It is understood that the description of conser vative mutations and homology can be combined together in
any combination, such as embodiments that have at least 70% homology to a particular sequence wherein the variants are conservative mutations.
[0220, 195. As this specification discusses various proteins and protein sequences it is understood that the nucleic acids that can encode those protein sequences are also disclosed. This would include all degenerate sequences related to a specific protein sequence, i.e. all nucleic acids having a sequence that encodes one particular protein sequence as well as all nucleic acids, including degenerate nucleic acids, encoding the disclosed variants and derivatives of the protein sequences. Thus, while each particular nucleic acid sequence may not be written out herein, it is understood that each and every sequence is in fact disclosed and described herein through the disclosed protein sequence. It is understood that while no amino acid sequence indicates what particular DNA sequence encodes that protein within an organism, where particular variants of a disclosed protein are disclosed herein, the known nucleic acid sequence that encodes that protein in the particular sequence from which that protein arises is also known and herein disclosed and described.
[0221) 196. It is understood that there are numerous amino acid and peptide analogs which can be incorporated into the disclosed compositions. For example, there are numerous D amino acids or amino acids which have a different functional substituent then the amino acids shown in Table 1 and Table 2.
The opposite stereo isomers of naturally occurring peptides are disclosed, as well as the stereo isomers of peptide analogs. These amino acids can readily be incorporated into polypep tide chains by charging trNA molecules with the amino acid of choice and engineering genetic constructs that utilize, for example, amber codons, to insert the analog amino acid into a peptide chain in a site specific way (Thorson et al., Methods in Molec. Biol. 77:43-73 (1991), Zoller, Current Opinion in Biotechnology, 3:348-354 (1992); Ibba, Biotechnology & Genetic Engineering Reviews 13:197-216 (1995), Cahill et al., TIBS, 14(10):400-403 (1989); Benner, TIBTech, 12:158 163 (1994); Ibba and Hennecke, Bio/technology, 12:678-682 (1994) all of which are herein incorporated by reference at least for material related to amino acid analogs).  197. Molecules can be produced that resemble pep tides, but which are not connected via a natural peptide link age. For example, linkages for amino acids or amino acid analogs can include CH2NH–, -CH2S? —CH2—CH2—, —CH=CH-(cis and trans), —COCH2—, -CH(OH) CH2—, and –CHH2SO–(These and others can be found in Spatola, A. F. in Chemistry and Biochemistry of Amino Acids, Peptides, and Proteins, B. Weinstein, eds., Marcel Dekker, New York, p. 267 (1983); Spatola, A. F., Vega Data (March 1983), Vol. 1, Issue 3, Peptide Backbone Modifica tions (general review); Morley, Trends Pharm Sci (1980) pp. 463-468; Hudson, D. et al., Int J Pept Prot Res 14:177-185 (1979) (–CH2NH-, CH2CH2—); Spatola et al. Life Sci 38:1243-1249 (1986) (–CH H2—S); Hann J. Chem. Soc Perkin Trans. I 307-314 (1982) (–CH–CH–, cis and trans); Almquist et al. J. Med. Chem. 23:1392-1398 (1980) (—COCH2—); Jennings-White et al. Tetrahedron Lett 23:2533 (1982) (–COCH2—); Szelke etal. European Appln, EP 45665 CA (1982): 97.39405 (1982) (–CH(OH)CH, ); Holladay et al. Tetrahedron. Lett 24:4401-4404 (1983) (–C (OH)CH2—); and Hruby Life Sci 31:189-199 (1982) (—CH2—S-); each of which is incorporated herein by ref erence. A particularly preferred non-peptide linkage is —CH2NH-. It is understood that peptide analogs can have
US 2009/0215172 A1
more than one atom between the bond atoms, such as b-ala
nine, g-aminobutyric acid, and the like. [0223) 198. Amino acid analogs and analogs and peptide analogs often have enhanced or desirable properties, such as, more economical production, greater chemical stability, enhanced pharmacological properties (half-life, absorption, potency, efficacy, etc.), altered specificity (e.g., a broad-spec trum of biological activities), reduced antigenicity, and oth erS.
 199. D-amino acids can be used to generate more stable peptides, because D amino acids are not recognized by peptidases and such. Systematic substitution of one or more amino acids of a consensus sequence with a D-amino acid of the same type (e.g., D-lysine in place of L-lysine) can be used to generate more stable peptides. Cysteine residues can be used to cyclize or attach two or more peptides together. This can be beneficial to constrain peptides into particular confor mations. (Rizo and Gierasch Ann. Rev. Biochem. 61:387 (1992), incorporated herein by reference).  7. Antibodies  (1) Antibodies Generally  200. The term “antibodies” is used herein in a broad sense and includes both polyclonal and monoclonal antibod ies. In addition to intact immunoglobulin molecules, also included in the term “antibodies” are fragments or polymers of those immunoglobulin molecules, and human or human ized versions of immunoglobulin molecules or fragments thereof, as long as they are chosen for their ability to interact with the cyclized peptide. The antibodies can be tested for their desired activity using the in vitro assays described herein, or by analogous methods, after which their in vivo therapeutic and/or prophylactic activities are tested according to known clinical testing methods.  201. The term “monoclonal antibody” as used herein refers to an antibody obtained from a substantially homogeneous population of antibodies, i.e., the individual antibodies within the population are identical except for pos sible naturally occurring mutations that may be present in a small subset of the antibody molecules. The monoclonal anti bodies herein specifically include “chimeric” antibodies in which a portion of the heavy and/or light chain is identical with or homologous to corresponding sequences in antibod ies derived from a particular species or belonging to a par ticular antibody class or subclass, while the remainder of the chain(s) is identical with or homologous to corresponding sequences in antibodies derived from another species or belonging to another antibody class or subclass, as well as fragments of such antibodies, as long as they exhibit the desired antagonistic activity (See, U.S. Pat. No. 4,816,567 and Morrison et al., Proc. Natl. Acad. Sci. USA, 81:6851
6855 (1984)). [0229| 202. The disclosed monoclonal antibodies can be made using any procedure which produces mono clonal anti bodies. For example, disclosed monoclonal antibodies can be prepared using hybridoma methods, such as those described by Kohler and Milstein, Nature, 256:495 (1975). In a hybri doma method, a mouse or other appropriate host animal is typically immunized with an immunizing agent to elicitlym phocytes that produce or are capable of producing antibodies that will specifically bind to the immunizing agent. Alterna tively, the lymphocytes may be immunized in vitro, e.g., using the HIV Env CD4-co-receptor complexes described herein.
Aug. 27, 2009  203. The monoclonal antibodies may also be made by recombinant DNA methods, such as those described in U.S. Pat. No. 4,816,567 (Cabilly et al.). DNA encoding the disclosed monoclonal antibodies can be readily isolated and sequenced using conventional procedures (e.g., by using oli gonucleotide probes that are capable of binding specifically to genes encoding the heavy and light chains of murine anti bodies). Libraries of antibodies or active antibody fragments can also be generated and screened using phage display tech niques, e.g., as described in U.S. Pat. No. 5,804,440 to Burton et al. and U.S. Pat. No. 6,096,441 to Barbas et al.
 204. In vitro methods are also suitable for preparing monovalent antibodies. Digestion of antibodies to produce fragments thereof, particularly, Fab fragments, can be accom plished using routine techniques known in the art. For instance, digestion can be performed using papain. Examples of papain digestion are described in WO 94/29348 published Dec. 22, 1994 and U.S. Pat. No. 4,342,566. Papain digestion of antibodies typically produces two identical antigen bind ing fragments, called Fab fragments, each with a single anti gen binding site, and a residual Fc fragment. Pepsin treatment yields a fragment that has two antigen combining sites and is still capable of cross-linking antigen.  205. The fragments, whether attached to other sequences or not, can also include insertions, deletions, sub stitutions, or other selected modifications of particular regions or specific amino acids residues, provided the activity of the antibody or antibody fragment is not significantly altered orimpaired compared to the non-modified antibody or antibody fragment. These modifications can provide for some additional property, such as to remove/add amino acids capable of disulfide bonding, to increase its bio-longevity, to alter its secretory characteristics, etc. In any case, the anti body or antibody fragment must possess a bioactive property, such as specific binding to its cognate antigen. Functional or active regions of the antibody or antibody fragment may be identified by mutagenesis of a specific region of the protein, followed by expression and testing of the expressed polypep tide. Such methods are readily apparent to a skilled practitio ner in the art and can include site-specific mutagenesis of the nucleic acid encoding the antibody or antibody fragment. (Zoller, M. J. Curr. Opin. Biotechnol. 3:348-354, 1992).  206. As used herein, the term “antibody” or “anti bodies” can also refer to a human antibody and/or a human ized antibody. Many non-human antibodies (e.g., those derived from mice, rats, or rabbits) are naturally antigenic in humans, and thus can give rise to undesirable immune responses when administered to humans. Therefore, the use of human or humanized antibodies in the methods serves to
lessen the chance that an antibody administered to a human will evoke an undesirable immune response.  (2) Human Antibodies  207. The disclosed human antibodies can be pre pared using any technique. Examples of techniques for human monoclonal antibody production include those described by Cole et al. (Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, p. 77, 1985) and by Boerner et al. (J. Immunol., 147(1):86-95, 1991). Human antibodies (and frag ments thereof) can also be produced using phage display libraries (Hoogenboom et al., J. Mol. Biol., 227:381, 1991; Marks et al., J. Mol. Biol., 222:581, 1991).  208. The disclosed human antibodies can also be obtained from transgenic animals. For example, transgenic, mutant mice that are capable of producing a full repertoire of
Aug. 27, 2009
US 2009/0215172 A1 22 bodies, e.g., Handbook of Monoclonal Antibodies, Ferrone et al., eds., Noges Publications, Park Ridge, N.J., (1985) ch. 22 and pp. 303-357; Smith et al., Antibodies in Human Diagno sis and Therapy, Haber et al., eds., Raven Press, New York (1977) pp. 365-389. A typical daily dosage of the antibody used alone might range from about 1 plg/kg to up to 100 mg/kg of body weight or more per day, depending on the factors
desired response. Accordingly, an interaction library must be large enough so that at least one of its members will have a structure that gives it affinity for some molecule, protein, or other factor whose activity is of interest. Although it is diffi cult to gauge the required absolute size of an interaction library, nature provides a hint with the immune response: a
combination with sufficient affinity to interact with most potential antigens faced by an organism. Published in vitro selection techniques have also shown that a library size of
 227. The disclosed compositions and methods can also be used for example as tools to isolate and test new drug candidates for a variety of diseases.  9. Chips and Micro Arrays  228. Disclosed are chips where at least one address is the sequences or part of the sequences set forth in any of the nucleic acid sequences disclosed herein. Also disclosed are chips where at least one address is the sequences orportion of sequences set forth in any of the peptide sequences disclosed herein.
 229. Also disclosed are chips where at least one address is a variant of the sequences or part of the sequences set forthin any of the nucleic acid sequences disclosed herein. Also disclosed are chips where at least one address is a variant of the sequences orportion of sequences set forthin any of the peptide sequences disclosed herein.  10. Computer Readable Mediums  230. It is understood that the disclosed nucleic acids and proteins can be represented as a sequence consisting of the nucleotides of amino acids. There are a variety of ways to display these sequences, for example the nucleotide gua nosine can be represented by G org. Likewise the amino acid valine can be represented by Val or V. Those of skill in the art understand how to display and express any nucleic acid or protein sequence in any of the variety of ways that exist, each of which is considered herein disclosed. Specifically contem plated herein is the display of these sequences on computer readable mediums, such as, commercially available floppy disks, tapes, chips, hard drives, compact disks, and video disks, or other computer readable mediums. Also disclosed are the binary code representations of the disclosed sequences. Those of skill in the art understand what computer readable mediums. Thus, computer readable mediums on which the nucleic acids or protein sequences are recorded, stored, or saved.
 231. Disclosed are computer readable mediums comprising the sequences and information regarding the sequences set forth herein.  11. Compositions Identified by Screening with Dis closed Compositions/Combinatorial Chemistry  a) Combinatorial Chemistry/Libraries  232. The fusion polypeptides of the invention can comprise random peptides. By “random peptides” herein is meant that each peptide consists of essentially random amino acids. Since generally these random peptides (or nucleic acids, discussed below) are chemically synthesized, they may incorporate any amino acid at any position. The synthetic process can be designed to generate randomized proteins to allow the formation of all or most of the possible combina tions overthelength of the sequence, thus forming a library of randomized peptides.  233. This invention provides libraries of fusion polypeptides. By “library” herein is meant a sufficiently structurally diverse population of randomized expression products to effect a probabilistically sufficient range of cel lular responses to provide one or more cells exhibiting a
diversity of 107-10° different antibodies provides at least one 107-10° is sufficient to find structures with affinity for the target. A library of all combinations of a peptide 7 to 20 amino acids in length, such as proposed here for expression in ret
roviruses, has the potential to code for 207 (10”) to 20°. Thus, with libraries of 107-10° per ml of retroviral particles the present methods allow a “working” subset of a theoreti cally complete interaction library for 7 amino acids, and a
subset of shapes for the 20° library. Thus, in a preferred embodiment, at least 10°, preferably at least 107, more pref erably at least 10° and most preferably at least 10° different
expression products are simultaneously analyzed in the sub ject methods. Preferred methods maximize library size and diversity.  234. In a preferred embodiment, libraries of all com binations of a peptide  3 to 30 amino acids in length are synthesized and analyzed as outlined herein. Libraries of smaller cyclic pep tides, i.e., 3 to 4 amino acid in length, are advantageous because they are more constrained and thus there is a better chance that these libraries possess desirable pharmacokinet ics properties as a consequence of their smaller size. Accord ingly, the libraries of the present invention may be one of any of the following lengths: 3 amino acids, 4 amino acids, 5 amino acids, 6 amino acids, 7 amino acids, 8 amino acids, 9 amino acids, 10 amino acids, 11 amino acids, 12 amino acids, 13 amino acids, 14 amino acids, 15 amino acids, 16 amino acids, 17 amino acids, 18 amino acids, 19 amino acids, 20 amino acids, 21 amino acids, 22 amino acids, 23 amino acids, 24 amino acids, 25 amino acids, 26 amino acids, 27 amino acids, 28 amino acids, 29 amino acids and 30 amino acids in
length.  235. The invention further provides fusion nucleic acids encoding the fusion polypeptides of the invention. As will be appreciated by those in the art, due to the degeneracy of the genetic code, an extremely large number of nucleic acids may be made, all of which encode the fusion proteins of the present invention. Thus, having identified a particular amino acid sequence, those skilled in the art could make any number of different nucleic acids, by simply modifying the sequence of one or more codons in a way which does not change the amino acid sequence of the fusion protein.  236. Using the nucleic acids of the present invention which encode a fusion protein, a variety of expression vectors are made. The expression vectors may be either self-replicat ing extrachromosomal vectors or vectors which integrate into a host genome. Generally, these expression vectors include transcriptional and translational regulatory nucleic acid oper ably linked to the nucleic acid encoding the fusion protein. The term “control sequences” refers to DNA sequences nec essary for the expression of an operably linked coding sequence in a particular host organism. The control sequences that are suitable for prokaryotes, for example, include a pro moter, optionally an operator sequence, and a ribosome bind
US 2009/0215172 A1
ing site. Eukaryotic cells are known to utilize promoters, to polyadenylation signals, and enhancers.  237. The fusion nucleic acids are introduced into cells to screen for cyclic peptides capable of altering the phenotype of a cell. By “introduced into” or grammatical equivalents herein is meant that the nucleic acids enter the cells in a manner suitable for subsequent expression of the nucleic acid. The method of introduction is largely dictated by the targeted cell type, discussed below. Exemplary methods include liposome fusion, lipofectin.RTM., electroporation, viral infection, etc. The fusion nucleic acids may stably inte grate into the genome of the host cell, or may exist either transiently or stably in the cytoplasm (i.e. through the use of traditional plasmids, utilizing standard regulatory sequences, selection markers, etc.). As many pharmaceutically important screens require human or model mammalian cell targets, retroviral vectors capable of transfecting such targets are preferred.  238. The fusion nucleic acids can be part of a retro viral particle which infects the cells, as described above. Generally, infection of the cells is straightforward with the application of the infection-enhancing reagent polybrene, which is a polycation that facilitates viral binding to the target cell. Infection can be optimized such that each cell generally expresses a single construct, using the ratio of virus particles to number of cells. Infection follows a Poisson distribution.
 239. The fusion nucleic acids can be introduced into cells using retroviral vectors. This is described in more detail above, however, is reviewed briefly again. Currently, the most efficient gene transfer methodologies harness the capacity of engineered viruses, such as retroviruses, to bypass natural cellular barriers to exogenous nucleic acid uptake. The use of recombinant retroviruses was pioneered by Richard Mulligan and David Baltimore with the Psi-2 lines and analogous ret rovirus packaging systems, based on NIH 3T3 cells (see Mann et al., Cell 33:153-159 (1993), hereby incorporated by reference). Such helper-defective packaging lines are capable of producing all the necessary trans proteins-gag, pol, and envi that are required for packaging, processing, reverse tran scription, and integration of recombinant genomes. Those RNA molecules that have in cis the psi packaging signal are packaged into maturing virions. Retroviruses are preferred for a number of reasons. First, their derivation is easy. Second, unlike Adenovirus-mediated gene delivery, expression from retroviruses is long-term (adenoviruses do not integrate). Adeno-associated viruses have limited space for genes and regulatory units and there is some controversy as to their ability to integrate. Retroviruses therefore offer the best cur rent compromise in terms of long-term expression, genomic flexibility, and stable integration, among other features. The main advantage of retroviruses is that their integration into the host genome allows for their stable transmission through cell division. This ensures that in cell types which undergo multiple independent maturation steps, such as hematopoi etic cell progression, the retrovirus construct will remain resident and continue to express.  240. A particularly well suited retroviral transfec tion system is described in Mann et al, supra: Pear et al., PNAS USA 90(18):8392-6 (1993); Kitamura et al., PNAS USA 92:9146-9150 (1995); Kinsella et al., Human Gene Therapy 7:1405-1413; Hofmann et al., PNAS USA 93:5185 5190; Choateetal., Human Gene Therapy 7:2247 (1996); and WO 94/19478; and references cited therein, all of which are
incorporated by reference.
Aug. 27, 2009  241. The disclosed compositions can be used as targets for any combinatorial technique to identify molecules or macromolecular molecules that interact with the disclosed
compositions in a desired way. Also disclosed are the com positions that are identified through combinatorial techniques or screening techniques in which the compositions disclosed in SEQ ID NOS: 1-13 or portions thereof, are used as the target in a combinatorial or screening protocol.  242. It is understood that when using the disclosed compositions in combinatorial techniques or screening meth ods, molecules, such as macromolecular molecules, will be
identified that have particular desired properties such as inhi bition or stimulation or the target molecule's function.  243. It is understood that the disclosed methods for identifying molecules can be performed using high through put means. For example, putative inhibitors can be identified using Fluorescence Resonance Energy Transfer (FRET) to quickly identify interactions. The underlying theory of the techniques is that when two molecules are close in space, ie, interacting at a level beyond background, a signal is produced or a signal can be quenched. Then, a variety of experiments can be performed, including, for example, adding in a puta tive inhibitor. If the inhibitor competes with the interaction between the two signaling molecules, the signals will be removed from each other in space, and this will cause a decrease oran increase in the signal, depending on the type of signal used. This decrease or increasing signal can be corre lated to the presence or absence of the putative inhibitor. Any signaling means can be used. For example, disclosed are methods of identifying an inhibitor of the interaction between any two of the disclosed molecules comprising, contacting a first molecule and a second molecule together in the presence of a putative inhibitor, wherein the first molecule or second molecule comprises a fluorescence donor, wherein the first or second molecule, typically the molecule not comprising the donor, comprises a fluorescence acceptor; and measuring Fluorescence Resonance Energy Transfer (FRET), in the presence of the putative inhibitor and the in absence of the putative inhibitor, wherein a decrease in FRET in the pres ence of the putative inhibitor as compared to FRET measure ment in its absence indicates the putative inhibitor inhibits binding between the two molecules. This type of method can be performed with a cell system as well.  244. Combinatorial chemistry includes but is not limited to all methods for isolating small molecules or mac romolecules that are capable of binding either a small mol ecule or another macromolecule, typically in an iterative pro cess. Proteins, oligonucleotides, and sugars are examples of macromolecules. For example, oligonucleotide molecules with a given function, catalytic or ligand-binding, can be isolated from a complex mixture of random oligonucleotides in what has been referred to as “in vitrogenetics” (Szostak, TIBS 19:89, 1992). One synthesizes a large pool of molecules bearing random and defined sequences and subjects that com
plex mixture, for example, approximately 10’’ individual
sequences in 100 pig of a 100 nucleotide RNA, to some selec tion and enrichment process. Through repeated cycles of affinity chromatography and PCR amplification of the mol ecules bound to the ligand on the column, Ellington and
Szostak (1990) estimated that 1 in 10° RNA molecules
folded in such a way as to bind a small molecule dyes. DNA molecules with such ligand-binding behavior have been iso lated as well (Ellington and Szostak, 1992; Bocket al., 1992). Techniques aimed at similar goals exist for small organic
Aug. 27, 2009
US 2009/0215172 A1 24 molecules, proteins, antibodies and other macromolecules known to those of skill in the art. Screening sets of molecules for a desired activity whether based on small organic libraries, oligonucleotides, or antibodies is broadly referred to as com binatorial chemistry. Combinatorial techniques are particu larly suited for defining binding interactions between mol ecules and for isolating molecules that have a specific binding activity, often called aptamers when the macromolecules are nucleic acids.
 245. There are a number of methods for isolating proteins which either have de novo activity or a modified activity. For example, phage display libraries have been used to isolate numerous peptides that interact with a specific target. (Seeforexample, U.S. Pat. Nos. 6,031,071; 5,824,520; 5,596,079; and 5,565,332 which are herein incorporated by reference at least for their material related to phage display and methods relate to combinatorial chemistry)  246. A preferred method for isolating proteins that have a given function is described by Roberts and Szostak (Roberts R. W. and Szostak J. W. Proc. Natl. Acad. Sci. USA, 94(23)12997-302 (1997). This combinatorial chemistry method couples the functional power of proteins and the genetic power of nucleic acids. An RNA molecule is gener ated in which a puromycin molecule is covalently attached to the 3'-end of the RNA molecule. An in vitro translation of this
modified RNA molecule causes the correct protein, encoded by the RNA to be translated. In addition, because of the attachment of the puromycin, a peptidyl acceptor which can not be extended, the growing peptide chain is attached to the puromycin which is attached to the RNA. Thus, the protein molecule is attached to the genetic material that encodes it. Normal in vitro selection procedures can now be done to isolate functional peptides. Once the selection procedure for peptide function is complete traditional nucleic acid manipu lation procedures are performed to amplify the nucleic acid that codes for the selected functional peptides. After amplifi cation of the genetic material, new RNA is transcribed with puromycin at the 3'-end, new peptide is translated and another functional round of selection is performed. Thus, protein selection can be performed in an iterative manner just like nucleic acid selection techniques. The peptide which is trans lated is controlled by the sequence of the RNA attached to the puromycin. This sequence can be anything from a random sequence engineered for optimum translation (i.e. no stop codons etc.) or it can be a degenerate sequence of a known RNA molecule to look for improved or altered function of a known peptide. The conditions for nucleic acid amplification and in vitro translation are well known to those of ordinary skill in the art and are preferably performed as in Roberts and Szostak (Roberts R. W. and Szostak J. W. Proc. Natl. Acad. Sci. USA, 94(23)12997-302 (1997)).  247. Another preferred method for combinatorial methods designed to isolate peptides is described in Cohen et al. (Cohen B. A., et al., Proc. Natl. Acad. Sci. USA 95(24): 14272-7 (1998)). This method utilizes and modifies two hybrid technology. Yeast two-hybrid systems are useful for the detection and analysis of protein:protein interactions. The two-hybrid system, initially described in the yeast Saccharo myces cerevisiae, is a powerful molecular genetic technique for identifying new regulatory molecules, specific to the pro tein of interest (Fields and Song, Nature 340:245-6 (1989)). Cohen et al., modified this technology so that novel interac tions between synthetic or engineered peptide sequences could be identified which bind a molecule of choice. The
benefit of this type of technology is that the selection is done in an intracellular environment. The method utilizes a library of peptide molecules that attached to an acidic activation domain. A peptide of choice, for example an extracellular portion, is attached to a DNA binding domain of a transcrip tional activation protein, such as Gal 4. By performing the Two-hybrid technique on this type of system, molecules that bind the extracellular portion can be identified.  248. Using methodology well known to those of skill in the art, in combination with various combinatorial libraries, one can isolate and characterize those small mol ecules or macromolecules, which bind to or interact with the
desired target. The relative binding affinity of these com pounds can be compared and optimum compounds identified using competitive binding studies, which are well known to those of skill in the art.
 249. Techniques for making combinatorial libraries and screening combinatorial libraries to isolate molecules which bind a desired target are well known to those of skill in the art. Representative techniques and methods can be found in but are not limited to U.S. Pat. Nos. 5,084,824, 5,288,514, 5,449,754, 5,506,337, 5,539,083, 5,545,568, 5,556,762, 5,565,324, 5,565,332, 5,573,905, 5,618,825, 5,619,680, 5,627,210, 5,646,285, 5,663,046, 5,670,326, 5,677,195, 5,683,899, 5,688,696, 5,688,997, 5,698,685, 5,712,146, 5,721,099, 5,723,598, 5,741,713, 5,792,431, 5,807,683, 5,807,754, 5,821,130, 5,831,014, 5,834,195, 5,834,318, 5,834,588, 5,840,500, 5,847,150, 5,856,107, 5.856,496. 5,859,190, 5,864,010, 5,874,443, 5,877,214, 5,880,972, 5,886,126, 5,886,127, 5,891,737, 5,916,899, 5,919,955, 5,925,527, 5,939,268, 5,942,387, 5,945,070, 5,948,696, 5,958,702, 5,958,792, 5,962,337, 5,965,719, 5,972,719, 5,976,894, 5,980,704, 5,985,356, 5,999,086, 6,001,579, 6,004,617, 6,008,321, 6,017,768, 6,025,371, 6,030,917, 6,040,193, 6,045,671, 6,045,755, 6,060,596, and 6,061,636.  250. Combinatorial libraries can be made from a
wide array of molecules using a number of different synthetic techniques. For example, libraries containing fused 2,4-pyri midinediones (U.S. Pat. No. 6,025,371) dihydrobenzopyrams (U.S. Pat. Nos. 6,017,768 and 5,821,130), amide alcohols (U.S. Pat. No. 5,976,894), hydroxy-amino acid amides (U.S. Pat. No. 5,972,719) carbohydrates (U.S. Pat. No. 5,965,719), 1,4-benzodiazepin-2,5-diones (U.S. Pat. No. 5,962,337), cyclics (U.S. Pat. No. 5,958,792), biaryl amino acid amides (U.S. Pat. No. 5,948,696), thiophenes (U.S. Pat. No. 5,942, 387), tricyclic Tetrahydroquinolines (U.S. Pat. No. 5,925, 527), benzofurans (U.S. Pat. No. 5,919,955), isoquinolines (U.S. Pat. No. 5,916,899), hydantoin and thiohydantoin (U.S. Pat. No. 5,859,190), indoles (U.S. Pat. No. 5,856,496), imi dazol-pyrido-indole and imidazol-pyrido-benzothiophenes (U.S. Pat. No. 5,856,107) substituted 2-methylene-2,3-dihy drothiazoles (U.S. Pat. No. 5,847,150), quinolines (U.S. Pat. No. 5,840,500), PNA (U.S. Pat. No. 5,831,014), containing tags (U.S. Pat. No. 5,721,099), polyketides (U.S. Pat. No. 5,712,146), morpholino-subunits (U.S. Pat. Nos. 5,698,685 and 5,506,337), sulfamides (U.S. Pat. No. 5,618,825), and benzodiazepines (U.S. Pat. No. 5,288,514).  251. As used herein combinatorial methods and libraries included traditional screening methods and libraries as well as methods and libraries used in interative processes.  b) Computer Assisted Drug Design  252. The disclosed compositions can be used as targets for any molecular modeling technique to identify either the structure of the disclosed compositions or to iden
US 2009/0215172 A1
tify potential or actual molecules, such as small molecules, which interact in a desired way with the disclosed composi tions. The nucleic acids, peptides, and related molecules dis closed herein can be used as targets in any molecular model ing program or approach.  253. It is understood that when using the disclosed compositions in modeling techniques, molecules, such as macromolecular molecules, will be identified that have par ticular desired properties such as inhibition or stimulation or the target molecule’s function. The molecules identified and isolated when using the disclosed compositions are also dis closed. Thus, the products produced using the molecular modeling approaches that involve the disclosed composi tions, are also considered herein disclosed.
 254. Thus, one way to isolate molecules that bind a molecule of choice is through rational design. This is achieved through structural information and computer mod eling. Computer modeling technology allows visualization of the three-dimensional atomic structure of a selected molecule
and the rational design of new compounds that will interact with the molecule. The three-dimensional construct typically depends on data from x-ray crystallographic analyses or NMR imaging of the selected molecule. The molecular dynamics require force field data. The computer graphics systems enable prediction of how a new compound will link to the target molecule and allow experimental manipulation of the structures of the compound and target molecule to perfect binding specificity. Prediction of what the molecule compound interaction will be when small changes are made in one or both requires molecular mechanics software and computationally intensive computers, usually coupled with user-friendly, menu-driven interfaces between the molecular design program and the user.  255. Examples of molecular modeling systems are the CHARM m and QUANTA programs, Polygen Corpora tion, Waltham, Mass. CHARMm performs the energy mini mization and molecular dynamics functions. QUANTA per forms the construction, graphic modeling and analysis of molecular structure. QUANTA allows interactive construc tion, modification, visualization, and analysis of the behavior of molecules with each other.
 256. A number of articles review computer model ing of drugs interactive with specific proteins, such as Rotivinen, et al., 1988 Acta Pharmaceutica Fennica 97, 159
166; Ripka, New Scientist 54-57 (Jun. 16, 1988); McKinaly and Rossmann, 1989 Annu. Rev. Pharmacol. Toxiciol. 29,
111-122; Perry and Davies, QSAR: Ouantitative Structure Activity Relationships in Drug Design pp. 189-193 (Alan R. Liss, Inc. 1989); Lewis and Dean, 1989 Proc. R. Soc. Lond. 236, 125-140 and 141-162; and, with respect to a model enzyme for nucleic acid components, Askew, et al., 1989 J. Am. Chem. Soc. 111, 1082-1090. Other computer programs that screen and graphically depict chemicals are available from companies such as BioDesign, Inc., Pasadena, Calif., Allelix, Inc, Mississauga, Ontario, Canada, and Hypercube, Inc., Cambridge, Ontario. Although these are primarily designed for application to drugs specific to particular pro teins, they can be adapted to design of molecules specifically interacting with specific regions of DNA or RNA, once that region is identified.  257. Although described above with reference to design and generation of compounds which could alter bind ing, one could also screen libraries of known compounds, including natural products or synthetic chemicals, and bio
Aug. 27, 2009 logically active materials, including proteins, for compounds which alter substrate binding or enzymatic activity.  12. Kits  258. Disclosed herein are kits that are drawn to reagents that can be used in practicing the methods disclosed herein. For example, the kits can comprise reagents for gen erating libraries of cyclic peptides. The kits can include any reagent or combination of reagent discussed herein or that would be understood to be required or beneficial in the prac tice of the disclosed methods. For example, the kits could include the recognition sequences, such as those found in SEQ ID NOS: 1-13, as well as the buffers and enzymes required to use the sequences as intended. [0299|| 13. Compositions with Similar Functions  259. It is understood that the compositions disclosed herein have certain functions, such as cyclizing peptides. Disclosed herein are certain structural requirements for per forming the disclosed functions, and it is understood that there are a variety of structures which can perform the same function which are related to the disclosed structures, and that
these structures will ultimately achieve the same result, for example cyclization. These compositions are also contem plated herein. D. Methods of Making the Compositions  260. The compositions disclosed herein and the compositions necessary to perform the disclosed methods can be made using any method known to those of skill in the art for that particular reagent or compound unless otherwise specifically noted. [0302) 1. Nucleic Acid Synthesis  261. For example, the nucleic acids, such as, the oligonucleotides to be used in vectors can be made using standard chemical synthesis methods or can be produced using enzymatic methods or any other known method. Such methods can range from standard enzymatic digestion fol lowed by nucleotide fragment isolation (see for example, Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Edition (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989) Chapters 5, 6) to purely synthetic methods, for example, by the cyanoethyl phosphoramidite method using a Milligen or Beckman System 12lus DNA synthesizer (for example, Model 8700 automated synthesizer of Milligen-Biosearch, Burlington, Mass. or ABI Model 380B). Synthetic methods useful for making oligonucle otides are also described by Ikuta et al., Ann. Rev. Biochem. 53:323-356 (1984), (phosphotriester and phosphite-triester methods), and Narang et al., Methods Enzymol., 65:610-620 (1980), (phosphotriester method). Protein nucleic acid mol ecules can be made using known methods such as those described by Nielsen et al., Bioconjug. Chem. 5:3-7 (1994).  2. Peptide Synthesis  262. One method of producing the disclosed pep tides, such as SEQ ID NO: 1, is to link two or more peptides or polypeptides together by protein chemistry techniques. For example, peptides or polypeptides can be chemically synthe sized using currently available laboratory equipment using either Fmoc (9-fluorenylmethyloxycarbonyl) or Boc (tert butyloxycarbonyl) chemistry. (Applied Biosystems, Inc., Foster City, Calif.). One skilled in the art can readily appre ciate that a peptide or polypeptide corresponding to the dis closed proteins, for example, can be synthesized by standard chemical reactions. For example, a peptide or polypeptide can be synthesized and not cleaved from its synthesis resin
Aug. 27, 2009
US 2009/0215172 A1 27 ods. In many embodiments of the invention, the polymer used is a biopolymer containing amino acids, i.e., a polypeptide. Polymers that may be employed in the subject methods may not contain any peptide bonds. However, in certain embodi ments, the polymers may contain peptide bonds in between
 283. Compounds disclosed herein may also be used for the treatment of precancer conditions such as cervical and anal dysplasias, other dysplasias, severe dysplasias, hyper plasias, atypical hyperplasias, and neoplasias.
the first and second monomers of one or both ends of the
polymer to be cyclized.  278. A polymer of interest may be at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12 monomers, or more than 12 monomers
in length, usually up to about 20, 30, 40, 50 or 100 or 1000 or more monomers in length. Accordingly, a peptide employed in the subject methods may contain at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12 amino acids, or more than 12 amino acids,
usually up to about 20, 30, 40 or 50 amino acids (e.g., non naturally occurring amino acids, naturally occurring amino acids or a mixture thereof. Polymers of particular interest are 2-50, 3-40, 4-30, 3-8, 5-20 or 6-10 monomers in length, and typically range from 500-5000 Da, 600-4000 Da, 700-2000 Da in molecular weight.  279. The compositions can be used for example as targets in combinatorial chemistry protocols or other screen ing protocols to isolate molecules that possess desired func tional properties, as discussed above.  280. The disclosed compositions can also be used diagnostic tools related to diseases. The disclosed composi tions can be used as discussed herein as either reagents in micro arrays or as reagents to probe or analyze existing microarrays. The disclosed compositions can be used in any known method for isolating or identifying single nucleotide polymorphisms. The compositions can also be used in any known method of screening assays, related to chip/micro arrays. The compositions can also be used in any known way ofusing the computer readable embodiments of the disclosed compositions, for example, to study relatedness or to perform molecular modeling analysis related to the disclosed compo sitions.
 2. Method of Treating Cancer  281. The disclosed compositions can be used to treat any disease where uncontrolled cellular proliferation occurs such as cancers. A non-limiting list of different types of cancers is as follows: lymphomas (Hodgkins and non Hodgkins), leukemias, carcinomas, carcinomas of solid tis sues, squamous cell carcinomas, adenocarcinomas, sarco mas, gliomas, high grade gliomas, blastomas, neuroblastomas, plasmacytomas, histiocytomas, melanomas, adenomas, hypoxic tumours, myelomas, AIDS-related lym phomas or sarcomas, metastatic cancers, or cancers in gen
 284. The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how the compounds, composi tions, articles, devices and/or methods claimed herein are
made and evaluated, and are intended to be purely exemplary and are not intended to limit the disclosure. Efforts have been
made to ensure accuracy with respect to numbers (e.g., amounts, temperature, etc.), but some errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, temperature is in ‘’C. or is at ambient tem perature, and pressure is at or near atmospheric. 1. Example 1 A Minimal Gene Set for In Vivo Production of
Cyclic Peptide Libraries  a) Bacterial Strains, Plasmids, Materials and Instru mentation
 285. Chemically competent TOP10, DH.50 and BL21(DE3) E. coli were available from Invitrogen. DUET vectors pCDF, pFSF, pFT, and paCYC were purchased from Novagen. Restriction endonucleases were obtained from NEB. DNA ligase was from Takara. Synthetic oligonucle otides were obtained from the DNA/Peptide Core Facility at the University of Utah and used without additional purifica tion. PCR was performed using PlatinumTaq HiFi DNA poly merase from Invitrogen. Isolation of plasmid DNA was by the QIAprep Spin Miniprep Kit Protocol from Qiagen. Extrac tion of plasmid DNA from agarose gels was done using the QIAquick Gel Extraction Kit, also available from Qiagen.  b) Preparation of Pat A-G Overexpression Con StructS
 a. Source DNA. A Palau reef sample of the ascidian, Lissoclinum patella, was used to amplify the whole pat clus ter, as previously described. pat was cloned into the pGR2.1 TOPO vector (Invitrogen) to create TOPO-pat (Schmidt, E.
W.; Nelson, J. T.; Rasko, D. A.; Sudek, S.; Eisen, J. A.;
 282. A representative but non-limiting list of can cers that the disclosed compositions can be used to treat is the following: lymphoma, B cell lymphoma, T cell lymphoma, mycosis fungoides, Hodgkin’s Disease, myeloid leukemia, bladder cancer, brain cancer, nervous system cancer, head and neck cancer, squamous cell carcinoma of head and neck, kidney cancer, lung cancers such as small cell lung cancer and non-small cell lung cancer, neuroblastoma/glioblastoma, ovarian cancer, pancreatic cancer, prostate cancer, skin can cer, liver cancer, melanoma, squamous cell carcinomas of the mouth, throat, larynx, and lung, colon cancer, cervical cancer, cervical carcinoma, breast cancer, and epithelial cancer, renal cancer, genitourinary cancer, pulmonary cancer, esophageal carcinoma, head and neck carcinoma, large bowel cancer, hematopoietic cancers; testicular cancer; colon and rectal cancers, prostatic cancer, or pancreatic cancer.
Haygood, M. G.; Ravel, J. Proc. Nat. Acad. Sci. USA 2005, 102, 73.15-7320).  286. Cloning of genes. The following general strat egy was used. PCR primers were designed to contain BspHI/ Not? or BspHI/Eagl restriction sites for ligation into the DUET vector Nco?/Not? sites (Table 3). For cloning into the second DUET multiple cloning site, KpnI/Ndel sites were included in primers. PCR products were obtained from TOPO-pat using standard conditions, then ligated directly into pCR2.1-TOPO vector and transformed into TOP10 E. coli cells according to the manufacturer’s protocol. These cells were grown in LB media with ampicillin (50 pg/mL). Products were subsequently subcloned into DUET vectors using suitable restriction endonucleases and transformed into DH5O. All cloned products were completely sequenced to verify the integrity of inserts.
Aug. 27, 2009
US 2009/0215172 A1 28
TABLE 3 Insert
Gene Pat A
Size (kb) Forward Primer 2. 1
Destination Rest . Vector Site
(SEQ ID NO : 21)
BspHI (SEQ ID NO : 22) Pat C
(SEQ ID NO : 23)
(SEQ ID NO : 24)
0.2 CCAACCAACATATGAACAAGAAGAACATTCTACCCC pRSF-Duet (SEQ ID NO : 25)
(SEQ ID NO : 26) Pat G
(SEQ ID NO : 27) Gene
(SEQ ID NO : 28) Pat B
(SEQ ID NO : 29) Pat C
(SEQ ID NO : 30) Pat D
(SEQ ID NO : 31) Pat E
(SEQ ID NO : 32) Pat F
(SEQ ID NO : 33) Pat G
(SEQ ID NO : 34)
 c) Expression and Purification of Secondary Metabolites  287. Expression plasmids were transformed into E. coli BL21(DE3) and grown in minimal auto-inducing media using the method of Studier (Studier, F. W. Protein Expr. Purif. 2005, 41, 207-34). Antibiotics were present at the fol lowing concentrations for the corresponding plasmids: pACYC-DUET, chloramphenicol, 12.5 pig?m.I.; pCDF DUET, streptomycin, 50 pg/mL, pFSF-DUET, kanamycin, 30 pig?mD; pHT-DUET, ampicillin, 50 pg/mL. After 18 hours, cultures were harvested by centrifugation for 20 min at 5,000 9.
 288. Purification of recombinant cyclic peptides was achieved essentially as described above, Briefly, HP20SS (~10 g) resin was added to each 50 mL of culture supernatant and shaken vigorously for 1 h. The resin was filtered to remove media and then washed with 25% methanol in water
(100 mL). Crude compound fractions were eluted by sequen tial washes of methanol (2×50 mL) and acetone (2×50 mL). The organic fractions were concentrated by rotary evapora tion and re-suspended in a minimal volume of ethyl acetate.
The organic fraction was washed with ddPLO (3×10 mL). The organic layer was then concentrated once again and the resulting crude extract subjected to analysis. [0339) d) Analysis of Secondary Metabolite Production by HPLC-MS and NMR
 289. HPLC-electrospray ionization-MS analysis was performed on a ThermoEinnigan LCQ Classic ion-trap mass spectrometer. For HPLC, an analytical C18 column (Gemini, Phenomenex) was used with a methanol-water gra dient. An initial 50:50 mixture of methanol and water (each containing 0.1% formic acid) was subjected to a gradient to 95% methanol over 15 minutes, followed by 10 minutes at 95% methanol. Electrospray ionization MS was performed in the positive mode. For positive controls, authentic standards of ulithiacyclamide were injected or co-injected on the HPLC-ESI-MS instrument. Negative controls consisted either of blank runs or runs from fermentations lacking the pate gene.  e) Synthesis of Patedm  290. A cloning strategy was designed in which all eight amino acids were simultaneously swapped for new
Aug. 27, 2009
US 2009/0215172 A1 29 amino acids, while the “recognition” sequences flanking the patellamide coding sequences were maintained intact. pKSF pate was used as a template for the QuickChange Multi Site-Directed Mutagenesis kit (Stratagene) following the manufacturer’s protocols. The following primers were used
ately frozen at —80° C. and later lyophilized. The average yield was ~10 mg dried cells per liter culture volume.  (3) Extraction and Purification  294. Lyophilized cyanobacterial pellets were
to affect mutation: EBSBf:
nolic extract was dried, yielding a crude extract that was used for initial electron spray ionization mass spectrometry (ESI MS). For Fourier-transform MS (FT-MS), the crude extract was purified with a Cls ZipTip (Millipore).  295. A portion of the crude methanolic extract (23 mg) was further purified by partitioning between ethylacetate and water. The aqueous part was fractionated over a HP20SS column with 25, 50, 75 and 100% acetone. As determined by ESI-MS, the 25 and 50% acetone (aq.) fractions contained the 1099 peak and were combined. This combined fraction was run on a Phenomenex C.1s analytical column with the follow ing protocol (all solvents contained 0.01% trifluoroacetic acid): 5 min of water, 5-35 min gradient from 0-100% aceto
(SEQ ID NO : 15) 5 GCAT CACTTTTTGCGCTTATGATGGTGTGGAGCCATCTGAGGGCGGAC
[0343) (SEQ ID NO : 16) 5 TTATTCACCATCGTAAGCAGGCCAGTCACCGCGTCCGCCCTGAGATGG CTCCACAC CATCATAAGCGCAAAAAGTGATGC .
 291. Clones were sequenced to find plasmids con taining intact patedm with no mutations. In addition, a mutant, designated patedm", was found in which a key rec
ognition sequence amino acid, Pº, was mutated to Q. prSF
patedm and pKSF-patedm" were cloned into a strain of E. coli BL21 (DE3) containing the minimal cyclization gene set (pACYC-patA, pCDF-pat?)-pató, and pHT-pate-patrº). This strain was cultivated and extracted as described above.
2. Example 2 Trichamide, a Cyclic Peptide from the Bloom-Form ing Cyanobacterium Trichodesmium erythraeum Predicted from the Genome Sequence [0345) a) Materials and Methods  (1) Bioinformatics  292. Most of the T. erythraeum IMS101 genome was shotgun sequenced by the Joint Genome Institute (JGI) and is available in GenBank (www.ncbi.nlm.nih.gov). The contig with accession number NZ_AABKO4000003 con tains the pat homologs listed before. Nucleotides 785,500 to 803,500 of this contig were downloaded and manually anno tated in Artemis (Sanger Institute). Predicted ORFs were compared to the JGI auto-annotation and putative functions assigned by BLASTP on GenBank. [0348) (2) Culturing  293. T. erythraeum IMS101 [Prufert-Bebout et al. 1993. Appl. Environ. Microbiol. 59: 1367-1375] was obtained. The culture is non-axenic, i.e. does contain other
heterotrophic bacteria. Cultures were grown in R medium at 25°C. under 12 hour light-dark photocycle with slow stirring as well as daily inversion of the culture flasks. R medium: 25% ddPL0 and 75% natural sea water from Scripps pier are mixed and amended with 8 puM KH2PO4, 2.5 puM EDTA, 0.1 HM ferric citrate, 0.1 p.M MnCl2, 10 nM Na2MoC), 10 nM ZnSO, 0.1 nM CoCl2, 0.1 nM. NiCl2, and 0.1 nM. Na2SeOa. All components are 0.2 pum filter-sterilized. T. erythraeum requires a 10% inoculum to start cultures; accordingly, 800 ml of culture were used in 8 liters of R medium. After 12-14
days, the culture was vacuum filtered through a 5 pum poly carbonate filter. T, erythraeum colonies remain on the filter, while most other bacteria do not. The cell material was rinsed
off the filters into a 50 ml Falcon tube with ddPI2O, immedi
extracted 3x with a -100-foldexcess of methanol. The metha
nitrile, 10 min of 100% acetonitrile. Fractions were collected
in minute intervals. Only fractions eluting at 16-17 and 17-18 minutes contained a 1099 peak as determined by ESI-MS. These fractions did not contain a single compound, since additional peaks beside 1099 were present in the MS. The amount of material in the two HPLC fractions was too low to II leaSUIre.
 296. In an improved procedure, a methanolic extract (57 mg) was partially purified by step gradient on a column containing 7 g Cls, using solvents containing 0.01% trifluo roacetic acid. Fractions were eluted with water, followed by 25%, 50% and 100% acetonitrile (aq). The 100% elution fraction was further purified on a Phenomenex C.1s column as described before. A single peak with the correct diode array profile cleanly eluted at 16.6 min. By ESI-MS analysis, this HPLC-peak contained the 1099 ion. The concentration of trichamide was below a measurable limit and was thus esti
mated by comparison of the diode array absorbance at 240mm with those for standards of ulithiacyclamide at varying con centrations. This intensity depends mainly upon the concen tration of thiazole, since both ulithiacyclamide and tricha mide have no other chromophores at this wavelength. By this method, the total amount of trichamide isolated was esti
mated to be 25-50 pig. [0354) (4) Mass Spectrometry  297. Crude extracts and partially purified fractions were monitored by ESI-MS and by FT-MS on a Ther moFinnigan LTQ-FT at 100,000 resolution (i.e. mass 400). FT-MS/MS experiments were run with collision-induced dis sociation (CID) and infrared multiphoton dissociation (IRMPD) techniques. Predicted masses were calculated using the following values: C=12, H=1.007825, N=14. 003074, O=15.994.914, S=31.97207.
 b) Results and Discussion  (1) Biosynthetic Genes  298. Using genomic data available from GenBank, a 12.5 kb gene cluster proposed to be responsible for the bio synthesis of trichamide (hence named tri cluster) has been annotated. The % GC is 40, higher than the average % GC of T. erythraeum at 34. On both sides it is bordered by trNA synthetase genes, potentially implicating horizontal gene transfer. The T. erythraeum genome is not closed, currently residing in 52 contigs at GenBank. The contig containing the tri genes (GenBank accession: NZ_AABK04000003) is 842 kb long and also contains a number of ribosomal proteins. A BLAST analysis of the ribosomal proteins finds similarities
Aug. 27, 2009
US 2009/0215172 A1 30 in other cyanobacteria, so it is assumed that this contig is indeed from T. erythraeum and not from a possible contami nation by heterotrophic bacteria.  299. The tri cluster contains 11 ORFs designated tria-K(FIG. 4 and Table 4). Four of these (tribCEF) are short and have sequence identity only to conserved hypothetical proteins, while tri? is only hypothetical with no significant sequence identities. Some of these ORFs may not be actively transcribed.
 302. TriD has high similarity to the N-terminal part of PatG and to oxidases. It was predicted that this part of PatC would oxidize the intermediate thiazoline rings into thiaz oles.
 303. BLASTP analysis of TriB and K gives homol ogy to subtilisin-like proteases. They have high similarity to PatA and the C-terminal part of PatG. It was predicted that these proteases would be involved in the maturation of Pate by cleaving the product from leader and trailer sequence and
the triciusier proteins and their homologs. Homolog length (GenBank % identity/ Predicted Protein
% similarity Function 57/70
794.178.381 of NZ AABK04000003 ZP OO672894
hypothetical (NP 942321) TriC
hypothetical (BAB73591) TriD
(ZP 00345329) TriP
hypothetical (ZP 00675293) TriG
ZP OO672893 ZP OO672892
(AAY21151) Pató, Cerminal
[0360) 300. The product of triC is the putative precursor protein. It was identified by two 5 amino acid motifs (GPGPS, SYDGD) (SEQ ID NOS: 17 and 18) (*SEthat closely resemble the proposed cyclization signal found before and after the patellamide A and C sequences in the precursor protein of patellamide biosynthesis, Pate (FIG. 5). Analogous to patellamide biosynthesis, these motifs would define the borders of the eleven amino acid peptide, GDGLHPRLCSC (SEQ ID NO: 19). TriG also contains a leader sequence of 43 amino acids without similarities in GenBank except that the first 5 amino acids are identical to those of Pate.
 301. TriA has high similarity to patl), which is pro posed to be involved in heterocyclization of cysteine and/or threonine/serine into thiazoline and oxazoline rings. The putative function was assigned on the basis of low sequence identity to previously characterized proteins: for the N-termi nal part the adenylating enzyme MccB from microcin bio synthesis [Gonzalez-Pastor et al. J. Bacteriol. 177: 7131 7140], for the C-terminal part a possible hydrolase, Sag|D from Streptomyces iniae [Fuller et al. Infect. Immun. 70: 5730-5739].
assume the same function in trichamide biosynthesis. It is interesting to note that TriPI and TriK have 48% identity to each other.
 304. Trij has 50/72% similarity/identity to Patb. There is no other homolog to either of the two proteins in GenBank. Patb is not required for biosynthesis but seems to improve patellamide yield in heterologous expression experi ments with the pat cluster. The high identity between Trij and Patb over their entire length and presence in both clusters does suggest that they serve a role in peptide biosynthesis.  305. There are few differences between the pat and tri clusters: PatC has two domains: one for oxidation and one
for proteolytic cleavage. In T. erythraeum these functional ities are separated into two proteins, TriD and TriPI, respec tively. The only pat gene without a homolog in the tri cluster (excluding very short putative ORFs) is patrº, which has no significant homologies in GenBank. Overall, the pat and tri clusters have striking similarities. The biosynthetic genes have between 45-60% identity, and both gene clusters consist of a heterocyclization enzyme, an oxidase, two proteases and path/trij, a gene of unknown functionality. Also, while there is variability in the length of the precursor protein, both in
Aug. 27, 2009
US 2009/0215172 A1 32 Bacterial cell wall synthesis: New insights from localization studies. Microbiol. Mol. Biol. Rev. 69: 585-607). It is pos sible that the significant similarities between the two pro teases allow them to form a dimer, which catalyzes both the hydrolysis of two peptide bonds and the cyclization in con cert. It is interesting to note that the biosynthetic cluster of the linear peptide goadsporin [Onaka H., M. Nakaho, K. Hayashi, Y. Igarashi, T. Furumai. 2005. Cloning and charac terization of the goadsporin biosynthetic gene cluster from Streptomyces sp. TP-AO584. Microbiology 151: 3923-3933] does not contain the two subtilisin-like proteases found in the tri and pat clusters, in agreement with an involvement of TriPIK in cyclization. Recently Milne et al. published a com putational study in which preorganization of patellamides were predicted to lead to cyclization and an enzyme would thus not be required [Milne B. F., P. F. Long, A. Starcevic, D. Hranueli, M. Jaspars. 2006. Spontaneity in the patellamide biosynthetic pathway. Org. Biomol. Chem. DOI: 10.1039/ b515938el. The differences in size and sequence and the maintenance of dedicated proteases in patellamides and tri chamide argue against this possibility. Finally, the absence of a Patrº homologin T. erythraeum and the requirement of Patf in patellamide biosynthesis implicate Patrº in oxazoline for mation, which is not part of the trichamide pathway. [0373) 312. Patellamide and trichamide biosynthesis can be examples of a more common pathway to small peptides. Besides the aforementioned goadsporin from Streptomyces sp. TP-AO584, at the time of this writing clustered ORFs with 35-40% identity to TriA and D are present in the genomes of phylogenetically distant bacteria; plut 0880 and 0878 in Pelodictyon luteolum, Chlorobia (GenBank accession: CP000096), swol?)RAFT 1502 and 1501 in Syntrophomo nas wolfei, Chlostridia (GenBank accession: NZ_AAJ GOOOOOOO2), and blrá538 and 4539 in Bradyrhizobium japonicum, Rhizobiales  (GenBank accession: BAO00040).  (4) Trichamide Function  313. Trichamide is hydrophilic, partitioning to the aqueous fraction relative to ethyl acetate. In addition, it is found only in the cells and is not excreted in significant quantities to the growth medium. These properties suggest an antipredation defense function, rather than anticompetitor or communication functions. To test biological activities, T. erythraeum crude methanolic extracts were tested for general cytotoxicity (HCT-116 at 10 pg/ml and CEM-TART at 5 and 50 pg/ml) and anti-HIV (1 and 10 pg/ml), antifungal (Can dida albicans at 10 pg/ml) or antimicrobial (Staphylococcus aureus and Enterococcus faecium at 10 pg/ml) effects. No significant activity was found in these assays (data not shown). A number of algal blooms have neurotoxic effects and neurotoxicity of environmental Trichodesmium sp. in mice has previously been reported. The crude methanolic extract of T, erythraeum IMS101 also exhibited neurotoxicity in a mouse assay, but purified trichamide was not the active component. Guo and Tester have found that healthy Tri chodesmium sp. cells do not affect the copepod Acartia tonsa, while aged or lysed Trichodesmium cells are toxic. This result is consistent with the properties of trichamide, which suggest that the compound is maintained inside healthy cells, but would be released into seawater from lysed cells.
3. Example 3 Rapid Recombination of Secondary Metabolic Path ways in Marine Symbiotic Bacteria  a) Methods  314. Collection and processing of samples. Ascid ians were collected in Palau in 2002, the Madang region of Papua New Guinea in 2003, and the Milne Bay region of Papua New Guinea in 2005. Ascidians were monitored for the presence of Prochloron spp. by visual inspection and light microscopy. Prochloron cells have a characteristic large (10 20 ), spherical shape and have a deep green color due to the presence of both chlorophylls a and b and the lack of accessory pigments. Prochloron-containing ascidians were stored frozen (for chemical analysis), or in RNALater or ethanol for later DNA analysis. Some samples, with the exception of size-limited samples, could be enriched for Prochloron cells by simple expression of the bacteria from the organisms followed by gentle centrifugation. Such enriched Prochloron samples were stored in RNALater or processed to obtain purified DNA as previously described. RNALater stored, whole ascidian samples were ground in liquid nitro gen and processed using the Qiagen DNA Spin Kit. The presence of purified DNA was monitored by agarose gel electrophoresis.  315. Analysis of pate variability. Samples were diluted into 3 concentrations (1x, Vox and Vioox), then PCR amplification of the 3 concentrations was done using pate specific primers and HiFi Platinum Taq Polymerase (Invitro gen). Products were visualized with agarose gel electrophore sis. Bands of the appropriate size were excised and gel-ex tracted using the QIAquick Gel Extraction Kit (Qiagen), and amplified pate were direct sequenced by the Sequencing Core Facility of the University of Utah. Sequences were analyzed using Sequencher and BLAST searches. Multiple pate vari ants from the same strain were de-convoluted by visual inspection, leading to the initial identification of pate 2 pate 6. The presence of the new pate genes was confirmed by PCR using specific primers. [0379| 316. Pathway analysis. PCR amplification was used to test the conservation of regions flanking new pate genes. Oligonucleotides pate F and patrºR were used to amplify the two-gene fragment, pate-patrº, while pat?)F and pateR were used to amplify patl)-pate. Other primers were used also to amplify shorter fragments linking pat?) to pate. All products with the right size were direct sequenced in both directions and compared to the patellamide cluster DNA sequence.  317. Taxonomic analysis. Specific primers were used to amplify a portion of the cao gene, as previously described, and the products were treated essentially the same way as the pate products.  318. Quantitative pathway analysis. Quantitative PCR was carried out on samples 05-019 and 03-005 using
Light Cycler FastStart DNA Master” SYBR green I
(Roche) and analyzed by the standard curve method. Specific primers were designed for pate 1, pate2, and pate 3. Samples and controls were run in duplicate.  319. Chemical analysis of the samples. nine samples (05-019, 05-023, 05-028, 05-042, 03-001, 03-002, 03-009, 03-012,03-020) were processed in essentially the same man ner. First, a piece of ~10 g of the whole organism was diced and extracted twice with methanol (50 mL). Extracts were then combined, dried on a rotary evaporator, and partitioned between ethyl acetate and water. Following rotary evapora
Aug. 27, 2009
US 2009/0215172 A1 34 Varian) was used with a methanol-water gradient. An initial 50-50 mixture of methanol and water (each containing 0.1% formic acid) was subjected to a gradient to 95% methanol over 15 minutes, followed by 10 minutes at 95% methanol. ESI-MS was performed in the positive mode, and selective reaction monitoring (SRM) was applied to patellamide peaks at m/z=743 and 763.
[0397) b) Results and Discussion  (1) Prochloron Preparation and Purity [0399| 328. Prochlorom cells were prepared from whole L. patella and determined to be >95% pure, as previously described. This 95% purity represents a conservative esti mate.
 (2) Chemical Analysis of L. patella “Reef’’ and “Omodes”
 329. It was previously reported that the “reef.” sample contained patellamides A and C (Ireland, C. M., Durso, A. R., Newman, R. A. & Hacker, M. P. (1982) J. Org. Chem. 47, 1807-1811) in nearly equimolar amounts. Other patellamides were not detected as major products in the crude extract. The “Omodes” sample did not contain detectable patellamides, and this was one of the criteria used to select “reef.” for whole genome sequencing.  (3) Identification of pat Genes  330. Previously, an exhaustive PCR-based search for NRPS adenylation domains yielded only a single NRPS gene, prinA (GenBank accession number AY390470). Detailed analysis of PrnA revealed that it has the wrong domain architecture for patellamide biosynthesis. Further more, it has been found in some patellamide-producing strains but not others. It was suggested that these results could indicate that prin.A is not responsible for patellamide produc tion; alternatively, prinA-like genes could be highly variable and thus were not detected in all peptide-producers. The preliminary analysis of the draft genome sequence of P didemni showed that prinA contained the only NRPS adeny lation domain identified, bearing out the PCR data. Thus, a
co-occur by chance, this gene was identified as a candidate for the patellamide precursor peptide, pate (FIG. 16). No other oligopeptide 8-mers with identical sequence to patellamide A or C could be identified in GenBank, and the entire pate precursor peptide was not closely related to any other known or predicted CDS. Because of the low probability that these sequences could co-occur by chance, this gene was identified as a candidate for the patellamide precursor peptide, pate (FIG. 16). The presence of two peptide products on a single CDS suggests that synergy may be important to the patella mide mechanism of action (Chatterjee, C., Paul, M., Xie, L. & van der Donk, W.A. (2005) Chem. Rev. 105, 633-684).  331. Surrounding pate, there were several other CDS with intriguing sequences, comprising the patA-G genes in a ~11 kbp cluster (FIG. 17; Table 6). In particular, a protease (patA), a possible adenylating enzyme-hydrolase hybrid (pat)), and an oxidoreductase-protease hybrid (pató) immediately surround pate. Three other CDS with very low or no similarity to other proteins of known function (path, patC, and patrº) are also found in this cluster. On one side, this cluster ends with a gene that can be clearly assigned to pri mary metabolism (a DNA photolyase homolog), while on the other side a putative structural gene was identified extending approximately 1 kbp upstream of patA. These genes and the organization of the cluster are reminiscent of lantibiotic and microcin biosynthetic machinery, which has been character ized in other bacteria (Garneau, S., Martin, N. I. & Vederas, J. C. (2002) Biochemie 84, 577-592; Jack, R. W. & Jung, G. (2000) Curr. Opin. Chem. Biol. 4,310-317). In particular, the microcin B17 peptide contains heterocycles (Yorgey, P., Lee, J., Kordel, J., Vivas, E., Warner, P., Jebaratnam, D. & Kolter,
R. (1994) Proc. Natl. Acad. Sci. USA 91,4519-4523), while microcin J25 (Wilson, K.A., Kalkunt, M., Ottesen, J., Yuzen kova, J., Chait, B. T., Landick, R., Muir, T., Severinov, K. &
1187. Thiazoline oxidase/ subtilisin-like ji Olease
ribosomal synthesis of patellamides was a strong possibility. We performed a TBLASTN search of the draft genome sequence, querying for all eight possible peptides that could lead to the formation of the cyclic patellamide A. A single coding sequence (CDS) was identified, and strikingly this CDS also contained the required sequence for patellamide C. Because of the low probability that these sequences could
 332. Using PCR, four fosmid clones containing pate were identified in a 576-clone arrayed library (FIG. 18). From analysis of the fosmid end sequences, three of these (designated 21A, 28C, and 55F) were found to contain the complete pathway. Additionally, the region encompassing
Aug. 27, 2009
US 2009/0215172 A1
patA-patC, including putative regulatory regions, was ampli fied by PCR from whole “reef genomic DNA and cloned into the pGR2.1-TOPOvector Invitrogen). 1 L cultures from these fosmids and PCR clones were extracted and partially purified. Positive controls were established by adding patellamides A and C (0.4 mg/L each) directly to E. coli culture broths con taining vectors, then extracting these cultures in the same way that other samples were processed. An HPLC-ESI MS approach was used to identify patellamides in our extracts.  333. Two standards were used to set up MS condi tions. In the first, pure patellamides A and C, positively iden
tified by NMR ('H and **C) and mass spectrometry, were
used for direct infusion and HPLC-MS experiments. In the second, a standard containing an initial 0.4 mg/L of each patellamide was used for HPLC-MS. From both standards, molecular ions for patellamides A and C could readily be recognized in the mass spectrum (FIG. 18). Partially purified samples from fosmid and PCR clones were then injected. In all cases, blank or negative runs followed the injection of standards and did not contain the relevant ions. Ions of the
appropriate mass could be identified at the correct elution time from these samples, but the signal-to-noise ratio was not sufficient to conclusively prove the presence of patellamides. To confirm that these peaks resulted from patellamides, SRM was employed, a commonly used technique in which sought ions are captured and fragmented by MS-MS. The mass spec trometer then scans only for a single daughter ion. This tech nique is both extremely sensitive and less subject to error because three pieces of data are obtained from a single experi ment (elution time; presence of the parention; and fragmen tation to a very specific daughter ion). Using this technique, patellamide A could be observed in the standard by monitor ing for a major daughter ion at m/z=725 (FIG. 18). In addi tion, patellamide C was seen in the standard by monitoring for the daughter ion at m/z =680, although with much less sensi tivity than for patellamide A. The patellamide A peak at m/z=725 was observed in PCR clones and in fosmid extracts
in a peak centered at 20.7 min (FIG. 18), indicating that patellamide A can be heterologously produced in E. coli. In particular, a 2 L fermentation of a PCR clone led to a very clear identification of patellamide A, as shown in FIG. 18. It is estimated that at most 20 pg/L of patellamide A are pro duced under these conditions.
 334. These data unambiguously confirm that the patA-G gene cluster is responsible for patellamide biosynthe sis in P didemni. Because patellamide A is produced by clones containing the -11 kbp PCR product, we have also correctly identified the limits of the biosynthetic gene cluster.  (5) Correlation of the Presence of the pat Gene Clus ter with Patellamide Production
[0410) 335. While the patpathway could be amplified using DNA from the patellamide-producing “reef’’ strain, no prod ucts were amplified from the non-producing “Omodes” strain. DNA quantity and quality from these two strains were identical, as judged by multiple PCR techniques, denaturing gradient gel electrophoresis of 16S rRNA, UV spectroscopy, and quantitative gel electrophoresis. Thus, the patellamide cluster was found in a producing strain but was not present in a non-producer. Because these two strains appear to be very similar by sequencing of several gene classes (chlorophyll a oxidase, 16S rRNA, and the prin NRPS operon), it is possible that pat and similar clusters in Prochloron originate via hori zontal gene transfer, as has been proposed for other lantibiotic pathways (Fomenko, D. E., Metlitskaya, A. Z., Peduzzi, J.,
Goulard, C., Katrukha, G. S., Gening, L. V., Rebuffat, S. & Khmel, I.A. (2004) Antimicrob Agents Chemother 47, 2868 2874). In fact, the cao and 16S rRNA genes are identical between Prochloron strains, while prin is >98% identical. Further research is required to determine the origin and role of these pathways in Prochloron.  (6) Pate, a Precursor Peptide Encoding Patella mides A and C
 336, pate encodes a peptide of 71 amino acids, the first 37 of which are proposed to serve as a leader sequence for processing (FIG. 16). Of the remaining 34 amino acids, 16 directly encode the patellamide C and A sequences, while 18 make up motifs that we propose direct the cyclization of patellamides. The patellamide C peptide is located 8 amino acids upstream of the patellamide A sequence. Prior to both peptides, there is a 5-amino acid conserved region consisting of the consensus G(L/V)E(A/P)S (SEQ ID NO: 40). The sequence AYDGE (SEQ ID NO. 12) terminates the patella mide A sequence and directly precedes the stop codon. Between the two patellamides, the 8 amino acid sequence AYDGVEPS (SEQ ID NO: 11) appears to encode for both a start and stop cyclization sequence, with the consensus stop sequence being AYDG(E/V) (SEQ ID NO: 41). These sequences are of biotechnological interest because they imply that diverse sequences could be synthesized to take advantage of these consensus regions, leading to the biosynthesis of a library of patellamides. It should be emphasized that the roles of these start/stop roles are putative further characterization is required. However, the microcin B17 prepeptide has been shown to be essential for proper post-translational modifica tion (Madison, L. L., Vivas, E. I., Li, Y.-M., Walsh, C. T. & Kolter, R. (1997) Mol. Microbiol. 23, 161-168). Conserved residues in leader sequences are known to be important in the modification of some lantibiotics (van der Meer, J. R., Rollema, H. S., Siezen, R. J., Beerthuyzen, M. M., Kuipers, O. P. & de Vos, W. M. (1994) J. Biol. Chem. 269,3555-3562; Xie, L., Miller, L. M., Chatterjee, C., Averin, O., Kelleher, N. L. & van der Donk, W.A. (2004) Science 303, 679-682), and
a consensus sequence (GAEPR) (SEQ ID NO. 42) found in these prepeptides bears a striking resemblance to the Pate start consensus motif, G(LN)E(A/P)S (SEQID NO.40). Class Ilantibiotics appear to usually possess a Pro residue at the -2 position, although in the case of nisin this Pro could be sub stituted with Gly and Val without impacting production. Another general feature of class I lantibiotic leader peptides also found in Pate is a high proportion of charged residues. Lantibiotics also contain C-terminal propetide sequences that are cleaved by proteases, often in tandem with secretion from the cell.
 (7) The Patellamide Post-Translational Machinery  337. The pat cluster encompasses 7 genes, pat/A-G, which are all transcribed in the same direction and may com prise an operon. Sequence analysis of these genes allows the proposal of a biosynthetic pathway to patellamides (FIG. 19). Pata, PatD, and PatC (Table 6) are most similar to predicted proteins found clustered in the genome of Trichodesmium erythraeum IMS101 (GenBank accession number AABK00000000). In addition, Patb also is most related to a T. erythraeum gene, although the T. erythraeum Patb homolog is not closely clustered with the PatA, Pat?), and PatC homologs. The significance of this clustering in T. eryth raeum is discussed in the next section.
[0415) 338. The Pat/A N-terminal region is similar to sub tilisin-like proteases, which are usually involved in the rec
Aug. 27, 2009
US 2009/0215172 A1 36 ognition of signature sequences in hormone precursor pep tides and the cleavage of these peptides near a signature motif (Schnell, N., Entian, K.-D., Schneider, U., Götz, F., Zlhner, H., Kellner, R. & Jung, G. (1988) Nature 333,276-278; van der Meer, J. R., Polman, J., Beerthuyzen, M. M., Siezen, R. J., Kuipers, O. P. & De Vos, W. M. (1993) J. Bacteriol. 175, 2578-2588). The C-terminal region of the predicted protein shares no domain homology with proteins of known function, although it is related to hypothetical protein Orf.4 from the cyanobacterium Fremyella diplosiphon (Balabas, B. E., Montgomery, B. L., Ong, L. E. & Kehoe, D. M. (2003) Mol. Microbiol. 50, 781-793). The predicted protein has a proline rich region (aa 343-401), although the significance of this motifis unknown. Over its entire length, it is 59% identical to T erythraeum subtilisin-like serine protease ZP 00326030. 1. Because of the protease sequence homology, it is proposed that Pat/A is involved in cleavage of the Pate precursor pep tide.
[0416) 339. Pat?), like PatA, appears to contain 2 domains. The N-terminal domain (Pat?)Il) shares weak similarity with adenylating enzymes, such as acyl-CoA ligases, and with MccB, the adenylating enzyme responsible for the biosynthe sis of the microcins C51 and C7 (Gonzalez-Pastor, J. E., San Millan, J. L., Castilla, M.A. & Moreno, F. (1995).J. Bacteriol. 177, 7131-7140). The PatD C-terminus (Pat?)2) is similar to Ycac)-like conserved domains of unknown function, but also
to Sag|D from Streptococcus iniae that may serve as a hydro lase (Fuller, J. D., Camus, A. C., Duncan, C. L., Nizet, V., Bast, D. J., Thune, R. L., Low, D. E. & De Azavedo, J. C. (2002) Infect. Immun, 70, 5730-5739). Pat?)2 shows similar ity to TtufA, a protein involved in the synthesis of the ribo somally derived trifolitoxin (Breil, B., Borneman, J. & Trip lett, E. W. (1996) J. Bacteriol. 178, 4150-4156). The entire Pat?) peptide sequence is similar to only a handful of proteins, including the T. erythraeum homolog, a protein annotated as AknN (a hydrolase) from Streptomyces galilaeus, and Orf12, a predicted protein of unknown function from the granaticin biosynthetic pathway (Ichinose, K., Bedford, D. J., Tomus, D., Bechthold, A., Bibb, M. J., Revill, W. P., Floss, H. G. &
Hopwood, D. A. (1998) Chem. Biol. 5, 647–659). Two pos sible roles are thus proposed for Pat?). Pat?)2 may be involved in the cyclization of the cysteine and threonine residues of Pate, leading to thiazoline and oxazoline ring formation. Pat?)1 could activate cleaved patellamide precursors as ade nylates, which would then cyclize to form the final patella mide structures. Alternatively, the ATP-binding region could have an as-yet unknown function. For example, it is known that the microcin B17 heterocyclization complex includes an ATP-requiring enzyme, Mcb?), which is of unknown function (Milne, J. C., Roy, R.S., Eliot, A.C., Kelleher, N.L., Wokhlu, A., Nickels, B. & Walsh, C.T. (1999) Biochemistry 38,4768 4781). PatD1 does not show significant sequence homology to Mcb?), but it is often the case in microcin machinery that distantly related peptides serve similar functions.  340. PatC is a large, multi-domain predicted pro tein. An N-terminal domain has homology to NAD(P)H oxi doreductases (PatG1). Intriguingly, the amino-terminal region is distantly related to Mcbc from microcin B 17 bio synthesis. Mcbc functions to oxidize thiazoline rings to the thiazole oxidation state, and it is likely that this is also the function of this region of PatC. This domain is also similar to an oxidase in the pathway to trifolitoxin, another thiazole containing microcin. The C-terminal half of PatG (PatC2) is highly similar to PatA, containing subtilisin-like protease and
F. diplosiphon Orfº-like regions. From this domain architec ture, it appears that PatC is involved in oxidation and matu ration of Pate.
 341. Path, PatC, and Patrº do not have obvious roles in patellamide biosynthesis. In addition, the protein respon sible for epimerization is not evident from the sequence analysis, although it seems likely that epimerization could occur in tandem with heterocycle oxidation. The stereo centers adjacent to thiazole rings are highly labile and could also be subject to non-enzymatic epimerization. The D-Ala residues are not derived from Ser, as they are in some lanti biotic pathways (Banerjee, S. & Hansen, J. N. (1988).J. Biol. Chem. 263, 9508-9514). [0419 (8) Related Pathways [0420) 342. The closest homologs of the pat cluster are CDS of unknown function from the draft genome sequence of T. erythraeum IMS101. patA, pat?), and pató are most similar to four clustered CDS found in T. erythraeum (see Table 6). In fact, the patG homolog in T. erythraeum is split into two separate CDS, comprising an oxidoreductase and a protease. In addition, a short peptide is present in this cluster that shares some structural features with pate. Furthermore, a trans posase is found within the T. erythraeum gene cluster, possi bly indicating that this cluster may move between strains by horizontal transfer. Several other CDS of unknown function
and not homologous to pat genes lie within the identified cluster. In addition, streptolysin S has been known as an important mediator of pathogensis in the “flesh-eating” Streptococcus spp. since its discovery 50 years ago (Kline, T. C. & Lewin, R. A. (1999) Symbiosis 26, 193-198), yet its structure has not been elucidated. The presence of a Mcbc like oxidase and a PatD2-like hydrolase (Sag|D) in the strep tolysin S biosynthetic gene cluster (43) indicate that streptol ysin S likely contains thiazole rings. Indeed, the predicted streptolysin Sprepropeptide contains numerous cysteine resi dues that could be cyclized.  (9) Symbiosis and Secondary Metabolism  343. Some didemnid ascidians (including L. patella) contained bioactive secondary metabolites, while others did not contain these compounds. All of the ascidians contained Prochloron, but not all of the Prochloron contain
pat-like pathways. Patellamides are often produced in large amounts (up to several percent of animal dry weight), and presumably some selection pressure must be necessary to maintain such a large-scale synthesis. Interestingly, because many Prochloron strains lack these pathways, other unknown selection mechanisms must be important to maintain symbio sis, and there are no obvious visible morphological differ ences between peptide producer and nonproducer ascidians. [0423) 344. Nutrient exchange has been demonstrated to be important for some didemnid-Prochloron associations. Pho tosynthesis by Prochloron has been shown to provide 60-100% of the organic carbon theoretically needed by the host (Koike, I., Yamamuro, M. & Pollard, P. C. (1993) Aust. J. Mar. Fresh. Res. 44, 173-182), and there is evidence for nitrogen cycling between host and symbiont in addition to nitrogen fixation. L. patella actively optimizes growth condi tions for its symbiont by moving to regions with proper illu mination and by modifying the structure of the tunic covering the upper surface of the colony (Swift, H. & Robertson, D. L. (1991) Symbiosis 10, 95-113).
US 2009/0215172 A1
Aug. 27, 2009 38
SEQUENCE LISTING <160: NUMBER OF SEQ ID NOS : 53 <210: SEQ ID NO 1 <211: LENGTH : 20 <2 12: TYPE : PRT
Ser Tyr Asp Gly Val Asp Ala Ser Xaa Ser Tyr Asp Asp 1
1. An isolated peptide comprising an amino acid segment comprising the amino acid sequence of SEQ ID NO: 1, an amino acid sequence at least about 90% identical to the amino acid sequence of SEQ ID NO: 1, or the amino acid sequence of SEQID NO: 1 having one or more conservative amino acid substitutions.
2. The isolated peptide of claim 1, wherein “Nº” and “Nº”
of SEQ ID NO: 1 represent coding sequences.
3. The isolated peptide of claim 2, wherein the sequences have a length of less than 100 residues. 4. The isolated peptide of claim 2, wherein the sequences have a length of less than 50 residues. 5. The isolated peptide of claim 2, wherein the sequences have a length of less than 20 residues. 6. The isolated peptide of claim 2, wherein the sequences have a length of less than 10 residues.