SlideShare a Scribd company logo
1 of 40
The annotation of Plant Proteins in
           UniProtKB
                     Michel Schneider

     Plant protein annotation program, Swiss-Prot group
               Swiss Institute of Bioinformatics
                     Geneva, Switzerland
                 Michel.Schneider@isb-sib.ch
1. The UniProt consortium and its products

2. Content of an entry in UniProtKB and manual curation

3. Complete proteomes and reference proteomes

4. Synchronization between UniProtKB and TAIR

5. Some statistics




        “Pioneers at the Heart of Science” 1998 – 2008
                         PAG XX, San Diego, January 15, 2012
The UniProt consortium




     “Pioneers at the Heart of Science” 1998 – 2008
                      PAG XX, San Diego, January 15, 2012
The missions of the UniProt consortium
Provide the scientific community with a resource of protein
sequence and functional annotation which has to be …


 comprehensive

 high quality

 and freely accessible


         “Pioneers at the Heart of Science” 1998 – 2008
                          PAG XX, San Diego, January 15, 2012
Four components to fulfill specific demands
                                   UniProtKB
                             Protein Knowledgebase
      UniRef
                              UniProtKB/Swiss-Prot                      UniMes
 Sequence clusters
                                   Reviewed                        Metagenomic and
    UniRef100
                                    (533’657 entries)
     UniRef90                                                        environmental
                       Manual curation                             sample sequences
     UniRef50
                                UniProtKB/Trembl
                                  Unreviewed
                                   (19 million entries)

                Automated annotation

      UniParc – Sequence archive contains current and obsolete sequences
                               (29.6 million sequences)

            “Pioneers at the Heart of Science” 1998 – 2008
                             PAG XX, San Diego, January 15, 2012
UniProtKB, the expertly curated
component of UniProt


 The high-quality curated protein knowledge database

     where data becomes structured knowledge




       “Pioneers at the Heart of Science” 1998 – 2008
                        PAG XX, San Diego, January 15, 2012
UniProtKB, the expertly curated
component of UniProt




                                                  Shigeo Fukuda
     “Pioneers at the Heart of Science” 1998 – 2008
                      PAG XX, San Diego, January 15, 2012
Protein sequence
             One gene - One species




© 2009 SIB
Protein and gene names
         Taxonomic information




                                   Protein sequence
                                  One gene - One species




© 2009 SIB
Protein and gene names
         Taxonomic information




                                                                Sequence annotation:
                                                            PTMs, alternative splicing products,
                                   Protein sequence        mutagenesis, transmembrane domains,
                                  One gene - One species              signal peptide…




© 2009 SIB
Protein and gene names
                                                                    General annotation:
         Taxonomic information                                  Function, Subcellular location,
                                                                       Catalytic activity,
                                                           Tissue specificity, Disruption phenotype…




                                                                                   Sequence annotation:
                                                                               PTMs, alternative splicing products,
                                   Protein sequence                           mutagenesis, transmembrane domains,
                                  One gene - One species                                 signal peptide…




© 2009 SIB
Protein and gene names
                                                                    General annotation:
         Taxonomic information                                  Function, Subcellular location,
                                                                       Catalytic activity,
                                                           Tissue specificity, Disruption phenotype…




                                                                                   Sequence annotation:
             References                                                        PTMs, alternative splicing products,
                                   Protein sequence                           mutagenesis, transmembrane domains,
                                  One gene - One species                                 signal peptide…




© 2009 SIB
Protein and gene names
                                                                    General annotation:
         Taxonomic information                                  Function, Subcellular location,
                                                                       Catalytic activity,
                                                           Tissue specificity, Disruption phenotype…




                                                                                   Sequence annotation:
             References                                                        PTMs, alternative splicing products,
                                   Protein sequence                           mutagenesis, transmembrane domains,
                                  One gene - One species                                 signal peptide…




                                                                                              Keywords
                                                                                                  -
                                                                                            Gene Ontology




© 2009 SIB
Protein and gene names
                                                                    General annotation:
         Taxonomic information                                  Function, Subcellular location,
                                                                       Catalytic activity,
                                                           Tissue specificity, Disruption phenotype…




                                                                                   Sequence annotation:
             References                                                        PTMs, alternative splicing products,
                                   Protein sequence                           mutagenesis, transmembrane domains,
                                  One gene - One species                                 signal peptide…




                                                                                              Keywords
   Cross-references                                                                               -
                                                                                            Gene Ontology
     (~ 130 databases)




© 2009 SIB
Origin of the sequences in UniProtKB


 International Nucleotide Sequence Database Collection
  (INSDC)
 Ensembl or EnsemblGenomes
 RefSeq
 Direct submissions (protein sequences)
 Literature
 Protein Data Bank


        “Pioneers at the Heart of Science” 1998 – 2008
                         PAG XX, San Diego, January 15, 2012
The process of manual sequence curation
    1. Select entry/gene (priorities)

    2. Identify entries from same gene and homologs
       using BLAST against UniProtKB

    3. Merge entries from the same gene and same
       species into a single record

    4. Select a canonical sequence


        “Pioneers at the Heart of Science” 1998 – 2008
                         PAG XX, San Diego, January 15, 2012
Critical analysis and report of sequence discrepancies
QPCT_ARATH (Q84WV9) Glutaminyl-peptide cyclotransferase (At4g25720)




               “Pioneers at the Heart of Science” 1998 – 2008
                                PAG XX, San Diego, January 15, 2012
Critical analysis and report of sequence discrepancies
QPCT_ARATH (Q84WV9) Glutaminyl-peptide cyclotransferase (At4g25720)




               “Pioneers at the Heart of Science” 1998 – 2008
                                PAG XX, San Diego, January 15, 2012
“Pioneers at the Heart of Science” 1998 – 2008
                 PAG XX, San Diego, January 15, 2012
Literature-based curation
 Identify relevant papers through searching literature
  databases




 Read full text of papers and extract and summarize
  relevant information




        “Pioneers at the Heart of Science” 1998 – 2008
                         PAG XX, San Diego, January 15, 2012
Literature-based curation




     “Pioneers at the Heart of Science” 1998 – 2008
                      PAG XX, San Diego, January 15, 2012
Literature-based curation




     “Pioneers at the Heart of Science” 1998 – 2008
                      PAG XX, San Diego, January 15, 2012
Literature-based curation




     “Pioneers at the Heart of Science” 1998 – 2008
                      PAG XX, San Diego, January 15, 2012
Controlled vocabularies
• Keywords provide a summary of the entry content
• We annotate using the Gene Ontology (GO)




      “Pioneers at the Heart of Science” 1998 – 2008
                       PAG XX, San Diego, January 15, 2012
UniProtKB, complete proteome
sequence sets
  • Genome completely sequenced

  • Proteins mapped to the genome

  2’902 complete proteomes

  Fully manually reviewed (e.g. S. cerevisiae)
  Partially manually reviewed (e.g. A. thaliana)
  Unreviewed (e.g. Chlorella variabilis)
       “Pioneers at the Heart of Science” 1998 – 2008
                        PAG XX, San Diego, January 15, 2012
UniProtKB, reference proteome
sequence sets
A reference proteome is the complete proteome of a
representative, well-studied model organism or an organism
of interest for biomedical research.

509 reference proteomes




       “Pioneers at the Heart of Science” 1998 – 2008
                        PAG XX, San Diego, January 15, 2012
UniProtKB, complete proteome
sequence sets




    “Pioneers at the Heart of Science” 1998 – 2008
                     PAG XX, San Diego, January 15, 2012
Arabidopsis thaliana



The building of the complete proteome sequence set:

• Based on the re-annotation of complete genome by TAIR:

  27’416 protein coding genes



       “Pioneers at the Heart of Science” 1998 – 2008
                        PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
   cDNAs, ESTs,
   genomic sequences


                                        Nucleic acid
                                         databases

    UniProtKB/TrEMBL
       Unreviewed
       (40’574 entries)



   UniProtKB/Swiss-Prot
        Reviewed
       (10’340 entries)


release 2011_03 - Mar 08, 2011



                       “Pioneers at the Heart of Science” 1998 – 2008
                                        PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
cDNAs, ESTs,
genomic sequences                                                       Genome re-annotation
                                                                         35’386 gene products

                                  Nucleic acid
                                   databases

UniProtKB/TrEMBL                                                        Temporary TrEMBL set
                                                                            33’341 entries
   Unreviewed
   (40’574 entries)



UniProtKB/Swiss-Prot
     Reviewed
   (10’340 entries)




                 “Pioneers at the Heart of Science” 1998 – 2008
                                  PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
cDNAs, ESTs,
genomic sequences                                                       Genome re-annotation
                                                                         35’386 gene products

                                  Nucleic acid
                                   databases

UniProtKB/TrEMBL                                                        Temporary TrEMBL set
                                                                             33’341 entries
   Unreviewed
   (40’574 entries)
                                                          11’508 sequences

UniProtKB/Swiss-Prot        Compare translations from the same gene, merge if 100 %
                              identical, report sequence discrepancies, align with
     Reviewed
   (10’340 entries)
                                             orthologs and paralogs




                 “Pioneers at the Heart of Science” 1998 – 2008
                                  PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
cDNAs, ESTs,
genomic sequences                                                      Genome re-annotation


                                 Nucleic acid
                                  databases

UniProtKB/TrEMBL                                                       Temporary TrEMBL set
   Unreviewed



UniProtKB/Swiss-Prot       Compare translations from the same gene, merge if 100 %
                             identical, report sequence discrepancies, align with
     Reviewed
                                            orthologs and paralogs
                                                                                  Feedback to TAIR
                                                                                      90 gene models


       correct gene models or add new isoforms
           283 corrections at the Heart of Science” 1998 – 2008
                “Pioneers
                                 PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
cDNAs, ESTs,
genomic sequences                                                     Genome re-annotation


                                Nucleic acid
                                 databases

UniProtKB/TrEMBL                                                      Temporary TrEMBL set
   Unreviewed



                                   Cleaned set of new TrEMBL entries
UniProtKB/Swiss-Prot
                                                (21’656 entries)
     Reviewed




               “Pioneers at the Heart of Science” 1998 – 2008
                                PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
    cDNAs, ESTs,
    genomic sequences                                                           Genome re-annotation


                                          Nucleic acid
                                           databases

    UniProtKB/TrEMBL                                                            Temporary TrEMBL set
       Unreviewed
       (44’628 entries)


                                             Cleaned set of new TrEMBL entries
   UniProtKB/Swiss-Prot
                                                          (21’656 entries)
        Reviewed
                                                              +
        (10’875 entries)
                                                    UniProtKB/Swiss-Prot
                                                  Reviewed (10’865 entries)
release 2011_12 - Dec 14, 2011

                                            Arabidopsis thaliana, cv. Columbia
                                            Complete proteome: 32’521 entries
                        “Pioneers at the Heart of Science” 1998 – 2008
                                          PAG XX, San Diego, January 15, 2012
1001 Arabidopsis genomes

• Deposited to INSDC ?

• Fully Annotated ? With CDS ?

• Should we still merge all the identical sequences together?

• If they are not merged but kept separate, how to get
  relevant Blast results?


        “Pioneers at the Heart of Science” 1998 – 2008
                         PAG XX, San Diego, January 15, 2012
Some UniProtKB/Swiss-Prot Statistics
concerning plant entries
(UniProt release 2011_12 - Dec 14, 2011)


• 31,959 entries of Viridiplantae
• from 1,924 species
• 10’875 entries from Arabidopsis thaliana (with 1,219 isoforms)
• 2,823 entries from Oryza sativa sp. Japonica
• 11,897 plant entries with an EC number
• 966 different complete EC numbers
• 5,744 putative transporters or proteins involved in transport
           “Pioneers at the Heart of Science” 1998 – 2008
                              PAG XX, San Diego, January 15, 2012
Summary
UniProtKB/Swiss-Prot, the manually curated knowledgebase:

• Protein sequence database covering all kingdoms of life (533’657
  sequence entries; 12’664 species)
• Manually annotated
• Non-redundant: all products of one gene in one species in a single entry
• Highly cross-referenced (links to ~130 databases).

Plant protein annotation:

• Complete proteome for Arabidopsis thaliana

• Synchronization with TAIR

         “Pioneers at the Heart of Science” 1998 – 2008
                            PAG XX, San Diego, January 15, 2012
We need your feedback and your collaboration !

                   help@uniprot.org




      “Pioneers at the Heart of Science” 1998 – 2008
                       PAG XX, San Diego, January 15, 2012
Acknowledgements
SIB
Ioannis Xenarios, Lydie Bougueleret, Andrea Auchincloss, Kristian Axelsen, Delphine Baratin, Marie-Claude Blatter,
Brigitte Boeckmann, Jerven Bolleman, Laurent Bollondi, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Edouard de
Castro, Lorenzo Cerutti, Elisabeth Coudert, Béatrice Cuche, Mikael Doche, Dolnide Dornevil, Severine Duvaud, Anne
Estreicher, Livia Famiglietti, Marc Feuermann, Sebastien Gehant, Elisabeth Gasteiger, Vivienne Gerritsen, Arnaud Gos,
Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, Janet James, Florence Jungo, Guillaume Keller,
Vicente Lara, Philippe Lemercier, Damien Lieberherr, Xavier Martin, Patrick Masson, Anne Morgat, Salvo Paesano, Ivo
Pedruzzi, Sandrine Pilbout, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole Redaschi, Catherine Rivoire, Bernd
Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Eleanor Stanley, André Stutz, Shyamala
Sundaram, Michael Tognolli, Laure Verbregue and Anne-Lise Veuthey

EBI
Rolf Apweiler, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam-Faruque, Ricardo Antunes,
Benoit Bely, Mark Bingley, David Binns, Lawrence Bower, Wei Mun Chan, Emily Dimmer, Francesco Fazzini, Alexander
Fedotov, John Garavelli, Leyla Garcia Castro, Rachael Huntley, Julius Jacobsen, Michael Kleen, Duncan Legge, Wudong
Liu, Jie Luo, Sandra Orchard, Samuel Patient, Klemens Pichler, Diego Poggioli, Nikolas Pontikos, Steven Rosanoff, Tony
Sawford, Harminder Sehra, Edward Turner, Matt Corbett, Mike Donnelly and Pieter van Rensburg

PIR
Cathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Winona C. Barker, Chuming Chen, Yongxing Chen, Pratibha Dubey,
Hongzhan Huang, Kati Laiho, Raja Mazumder, Peter McGarvey, Darren A. Natale, Thanemozhi G. Natarajan, Jules
Nchoutmboube, Natalia V. Roberts, Baris E. Suzek, Uzoamaka Ugochukwu, C. R. Vinayaka, Qinghua Wang, Yuqi Wang,
Lai-Su Yeh and Jian Zhang




                                      www.uniprot.org
UniProt is mainly supported by the National Institutes of
Health (NIH) grant 1 U41 HG006104-01. Additional support for
the EBI's involvement in UniProt comes from the NIH grant
2P41 HG02273-07. Swiss-Prot activities at the SIB are
supported by the Swiss Federal Government through the
Federal Office of Education and Science and the European
Commission contracts SLING (226073), Gen2Phen (200754)
and MICROME (222886). PIR activities are also supported by
the NIH grants 5R01GM080646-04, 3R01GM080646-04S2,
1G08LM010720-01, and 3P20RR016472-09S2, and NSF grant
DBI-0850319.



       “Pioneers at the Heart of Science” 1998 – 2008
                        PAG XX, San Diego, January 15, 2012

More Related Content

Viewers also liked

European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...ExternalEvents
 
GenBank Coding Sequences
GenBank Coding SequencesGenBank Coding Sequences
GenBank Coding SequencesBenoit Leclerc
 
Science Big, Science Connected
Science Big, Science ConnectedScience Big, Science Connected
Science Big, Science ConnectedDeepak Singh
 
UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?Jerven Bolleman
 
Types of PCR ((APEH Daniel O.))
Types of  PCR ((APEH Daniel O.))Types of  PCR ((APEH Daniel O.))
Types of PCR ((APEH Daniel O.))Daniel Apeh
 
Types of pcr
Types of pcr Types of pcr
Types of pcr Asma Gul
 
PCR types and applications
PCR types and applicationsPCR types and applications
PCR types and applicationsKarthi Kumar
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databasesPranavathiyani G
 
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001Zohaib HUSSAIN
 
Site directed mutagenesis
Site directed mutagenesisSite directed mutagenesis
Site directed mutagenesisArunima Sur
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 
PCR, Real Time PCR
PCR, Real Time PCRPCR, Real Time PCR
PCR, Real Time PCRdineshnbagr
 

Viewers also liked (20)

EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
 
Protein Data Bank
Protein Data BankProtein Data Bank
Protein Data Bank
 
Biological databases
Biological databasesBiological databases
Biological databases
 
GenBank Coding Sequences
GenBank Coding SequencesGenBank Coding Sequences
GenBank Coding Sequences
 
Science Big, Science Connected
Science Big, Science ConnectedScience Big, Science Connected
Science Big, Science Connected
 
UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?
 
Types of PCR ((APEH Daniel O.))
Types of  PCR ((APEH Daniel O.))Types of  PCR ((APEH Daniel O.))
Types of PCR ((APEH Daniel O.))
 
Types of pcr
Types of pcr Types of pcr
Types of pcr
 
Site directed mutagenesis by pcr
Site directed mutagenesis by pcrSite directed mutagenesis by pcr
Site directed mutagenesis by pcr
 
PCR types and applications
PCR types and applicationsPCR types and applications
PCR types and applications
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
 
PCR
PCRPCR
PCR
 
Gene silencing last
Gene silencing lastGene silencing last
Gene silencing last
 
Real time PCR
Real time PCRReal time PCR
Real time PCR
 
Gene silencing
Gene silencing Gene silencing
Gene silencing
 
Site directed mutagenesis
Site directed mutagenesisSite directed mutagenesis
Site directed mutagenesis
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
PCR, Real Time PCR
PCR, Real Time PCRPCR, Real Time PCR
PCR, Real Time PCR
 

Similar to The annotation of plant proteins in UniProtKB

Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningReece Hart
 
Biopharma Solution
Biopharma SolutionBiopharma Solution
Biopharma SolutionSujin Prabhu
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOPascale Gaudet
 
Proteomics course 1
Proteomics course 1Proteomics course 1
Proteomics course 1utpaltatu
 
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Sage Base
 
Omics in plant breeding
Omics in plant breedingOmics in plant breeding
Omics in plant breedingpoornimakn04
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Vall d'Hebron Institute of Research (VHIR)
 
Specificity Assessment At Santaris Pharma
Specificity Assessment At Santaris PharmaSpecificity Assessment At Santaris Pharma
Specificity Assessment At Santaris PharmaMorten Lindow
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformaticsNeil Saunders
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xFOODCROPS
 
Genomics and proteomics II
Genomics and proteomics IIGenomics and proteomics II
Genomics and proteomics IINikolay Vyahhi
 
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...BioMAP® Systems
 
The Phenoscape Knowledgebase
The Phenoscape KnowledgebaseThe Phenoscape Knowledgebase
The Phenoscape Knowledgebasebalhoff
 
Reference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the FutureReference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the FutureBarry Smith
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectFundación Ramón Areces
 
Computational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringComputational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringPablo Carbonell
 
Proteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeProteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeSumanthBT1
 

Similar to The annotation of plant proteins in UniProtKB (20)

Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
 
Biopharmaceutical
BiopharmaceuticalBiopharmaceutical
Biopharmaceutical
 
Biopharma Solution
Biopharma SolutionBiopharma Solution
Biopharma Solution
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHO
 
Proteomics course 1
Proteomics course 1Proteomics course 1
Proteomics course 1
 
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
 
Omics in plant breeding
Omics in plant breedingOmics in plant breeding
Omics in plant breeding
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...
 
Specificity Assessment At Santaris Pharma
Specificity Assessment At Santaris PharmaSpecificity Assessment At Santaris Pharma
Specificity Assessment At Santaris Pharma
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010x
 
Genomics and proteomics II
Genomics and proteomics IIGenomics and proteomics II
Genomics and proteomics II
 
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...
 
Surp09 Signaling
Surp09 SignalingSurp09 Signaling
Surp09 Signaling
 
The Phenoscape Knowledgebase
The Phenoscape KnowledgebaseThe Phenoscape Knowledgebase
The Phenoscape Knowledgebase
 
Reference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the FutureReference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the Future
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome Project
 
TDikow Hennig 2011
TDikow Hennig 2011TDikow Hennig 2011
TDikow Hennig 2011
 
Computational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringComputational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein Engineering
 
Proteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeProteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programme
 

More from EBI

UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 
InterPro and InterProScan 5.0
InterPro and InterProScan 5.0InterPro and InterProScan 5.0
InterPro and InterProScan 5.0EBI
 
The European Nucleotide Archive
The European Nucleotide ArchiveThe European Nucleotide Archive
The European Nucleotide ArchiveEBI
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesEBI
 
Automatic Annotation in UniProtKB
Automatic Annotation in UniProtKBAutomatic Annotation in UniProtKB
Automatic Annotation in UniProtKBEBI
 
The Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation DatabaseThe Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation DatabaseEBI
 
Train online
Train onlineTrain online
Train onlineEBI
 

More from EBI (7)

UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
InterPro and InterProScan 5.0
InterPro and InterProScan 5.0InterPro and InterProScan 5.0
InterPro and InterProScan 5.0
 
The European Nucleotide Archive
The European Nucleotide ArchiveThe European Nucleotide Archive
The European Nucleotide Archive
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
Automatic Annotation in UniProtKB
Automatic Annotation in UniProtKBAutomatic Annotation in UniProtKB
Automatic Annotation in UniProtKB
 
The Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation DatabaseThe Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation Database
 
Train online
Train onlineTrain online
Train online
 

Recently uploaded

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 

Recently uploaded (20)

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 

The annotation of plant proteins in UniProtKB

  • 1. The annotation of Plant Proteins in UniProtKB Michel Schneider Plant protein annotation program, Swiss-Prot group Swiss Institute of Bioinformatics Geneva, Switzerland Michel.Schneider@isb-sib.ch
  • 2. 1. The UniProt consortium and its products 2. Content of an entry in UniProtKB and manual curation 3. Complete proteomes and reference proteomes 4. Synchronization between UniProtKB and TAIR 5. Some statistics “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 3. The UniProt consortium “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 4. The missions of the UniProt consortium Provide the scientific community with a resource of protein sequence and functional annotation which has to be …  comprehensive  high quality  and freely accessible “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 5. Four components to fulfill specific demands UniProtKB Protein Knowledgebase UniRef UniProtKB/Swiss-Prot UniMes Sequence clusters Reviewed Metagenomic and UniRef100 (533’657 entries) UniRef90 environmental Manual curation sample sequences UniRef50 UniProtKB/Trembl Unreviewed (19 million entries) Automated annotation UniParc – Sequence archive contains current and obsolete sequences (29.6 million sequences) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 6. UniProtKB, the expertly curated component of UniProt The high-quality curated protein knowledge database where data becomes structured knowledge “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 7. UniProtKB, the expertly curated component of UniProt Shigeo Fukuda “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 8. Protein sequence One gene - One species © 2009 SIB
  • 9. Protein and gene names Taxonomic information Protein sequence One gene - One species © 2009 SIB
  • 10. Protein and gene names Taxonomic information Sequence annotation: PTMs, alternative splicing products, Protein sequence mutagenesis, transmembrane domains, One gene - One species signal peptide… © 2009 SIB
  • 11. Protein and gene names General annotation: Taxonomic information Function, Subcellular location, Catalytic activity, Tissue specificity, Disruption phenotype… Sequence annotation: PTMs, alternative splicing products, Protein sequence mutagenesis, transmembrane domains, One gene - One species signal peptide… © 2009 SIB
  • 12. Protein and gene names General annotation: Taxonomic information Function, Subcellular location, Catalytic activity, Tissue specificity, Disruption phenotype… Sequence annotation: References PTMs, alternative splicing products, Protein sequence mutagenesis, transmembrane domains, One gene - One species signal peptide… © 2009 SIB
  • 13. Protein and gene names General annotation: Taxonomic information Function, Subcellular location, Catalytic activity, Tissue specificity, Disruption phenotype… Sequence annotation: References PTMs, alternative splicing products, Protein sequence mutagenesis, transmembrane domains, One gene - One species signal peptide… Keywords - Gene Ontology © 2009 SIB
  • 14. Protein and gene names General annotation: Taxonomic information Function, Subcellular location, Catalytic activity, Tissue specificity, Disruption phenotype… Sequence annotation: References PTMs, alternative splicing products, Protein sequence mutagenesis, transmembrane domains, One gene - One species signal peptide… Keywords Cross-references - Gene Ontology (~ 130 databases) © 2009 SIB
  • 15. Origin of the sequences in UniProtKB  International Nucleotide Sequence Database Collection (INSDC)  Ensembl or EnsemblGenomes  RefSeq  Direct submissions (protein sequences)  Literature  Protein Data Bank “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 16. The process of manual sequence curation 1. Select entry/gene (priorities) 2. Identify entries from same gene and homologs using BLAST against UniProtKB 3. Merge entries from the same gene and same species into a single record 4. Select a canonical sequence “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 17. Critical analysis and report of sequence discrepancies QPCT_ARATH (Q84WV9) Glutaminyl-peptide cyclotransferase (At4g25720) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 18. Critical analysis and report of sequence discrepancies QPCT_ARATH (Q84WV9) Glutaminyl-peptide cyclotransferase (At4g25720) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 19. “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 20. Literature-based curation  Identify relevant papers through searching literature databases  Read full text of papers and extract and summarize relevant information “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 21. Literature-based curation “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 22. Literature-based curation “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 23. Literature-based curation “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 24. Controlled vocabularies • Keywords provide a summary of the entry content • We annotate using the Gene Ontology (GO) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 25. UniProtKB, complete proteome sequence sets • Genome completely sequenced • Proteins mapped to the genome 2’902 complete proteomes Fully manually reviewed (e.g. S. cerevisiae) Partially manually reviewed (e.g. A. thaliana) Unreviewed (e.g. Chlorella variabilis) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 26. UniProtKB, reference proteome sequence sets A reference proteome is the complete proteome of a representative, well-studied model organism or an organism of interest for biomedical research. 509 reference proteomes “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 27. UniProtKB, complete proteome sequence sets “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 28. Arabidopsis thaliana The building of the complete proteome sequence set: • Based on the re-annotation of complete genome by TAIR: 27’416 protein coding genes “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 29. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Nucleic acid databases UniProtKB/TrEMBL Unreviewed (40’574 entries) UniProtKB/Swiss-Prot Reviewed (10’340 entries) release 2011_03 - Mar 08, 2011 “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 30. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Genome re-annotation 35’386 gene products Nucleic acid databases UniProtKB/TrEMBL Temporary TrEMBL set 33’341 entries Unreviewed (40’574 entries) UniProtKB/Swiss-Prot Reviewed (10’340 entries) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 31. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Genome re-annotation 35’386 gene products Nucleic acid databases UniProtKB/TrEMBL Temporary TrEMBL set 33’341 entries Unreviewed (40’574 entries) 11’508 sequences UniProtKB/Swiss-Prot Compare translations from the same gene, merge if 100 % identical, report sequence discrepancies, align with Reviewed (10’340 entries) orthologs and paralogs “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 32. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Genome re-annotation Nucleic acid databases UniProtKB/TrEMBL Temporary TrEMBL set Unreviewed UniProtKB/Swiss-Prot Compare translations from the same gene, merge if 100 % identical, report sequence discrepancies, align with Reviewed orthologs and paralogs Feedback to TAIR 90 gene models correct gene models or add new isoforms 283 corrections at the Heart of Science” 1998 – 2008 “Pioneers PAG XX, San Diego, January 15, 2012
  • 33. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Genome re-annotation Nucleic acid databases UniProtKB/TrEMBL Temporary TrEMBL set Unreviewed Cleaned set of new TrEMBL entries UniProtKB/Swiss-Prot (21’656 entries) Reviewed “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 34. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Genome re-annotation Nucleic acid databases UniProtKB/TrEMBL Temporary TrEMBL set Unreviewed (44’628 entries) Cleaned set of new TrEMBL entries UniProtKB/Swiss-Prot (21’656 entries) Reviewed + (10’875 entries) UniProtKB/Swiss-Prot Reviewed (10’865 entries) release 2011_12 - Dec 14, 2011 Arabidopsis thaliana, cv. Columbia Complete proteome: 32’521 entries “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 35. 1001 Arabidopsis genomes • Deposited to INSDC ? • Fully Annotated ? With CDS ? • Should we still merge all the identical sequences together? • If they are not merged but kept separate, how to get relevant Blast results? “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 36. Some UniProtKB/Swiss-Prot Statistics concerning plant entries (UniProt release 2011_12 - Dec 14, 2011) • 31,959 entries of Viridiplantae • from 1,924 species • 10’875 entries from Arabidopsis thaliana (with 1,219 isoforms) • 2,823 entries from Oryza sativa sp. Japonica • 11,897 plant entries with an EC number • 966 different complete EC numbers • 5,744 putative transporters or proteins involved in transport “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 37. Summary UniProtKB/Swiss-Prot, the manually curated knowledgebase: • Protein sequence database covering all kingdoms of life (533’657 sequence entries; 12’664 species) • Manually annotated • Non-redundant: all products of one gene in one species in a single entry • Highly cross-referenced (links to ~130 databases). Plant protein annotation: • Complete proteome for Arabidopsis thaliana • Synchronization with TAIR “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 38. We need your feedback and your collaboration ! help@uniprot.org “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 39. Acknowledgements SIB Ioannis Xenarios, Lydie Bougueleret, Andrea Auchincloss, Kristian Axelsen, Delphine Baratin, Marie-Claude Blatter, Brigitte Boeckmann, Jerven Bolleman, Laurent Bollondi, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Edouard de Castro, Lorenzo Cerutti, Elisabeth Coudert, Béatrice Cuche, Mikael Doche, Dolnide Dornevil, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Sebastien Gehant, Elisabeth Gasteiger, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, Janet James, Florence Jungo, Guillaume Keller, Vicente Lara, Philippe Lemercier, Damien Lieberherr, Xavier Martin, Patrick Masson, Anne Morgat, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole Redaschi, Catherine Rivoire, Bernd Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Eleanor Stanley, André Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue and Anne-Lise Veuthey EBI Rolf Apweiler, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam-Faruque, Ricardo Antunes, Benoit Bely, Mark Bingley, David Binns, Lawrence Bower, Wei Mun Chan, Emily Dimmer, Francesco Fazzini, Alexander Fedotov, John Garavelli, Leyla Garcia Castro, Rachael Huntley, Julius Jacobsen, Michael Kleen, Duncan Legge, Wudong Liu, Jie Luo, Sandra Orchard, Samuel Patient, Klemens Pichler, Diego Poggioli, Nikolas Pontikos, Steven Rosanoff, Tony Sawford, Harminder Sehra, Edward Turner, Matt Corbett, Mike Donnelly and Pieter van Rensburg PIR Cathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Winona C. Barker, Chuming Chen, Yongxing Chen, Pratibha Dubey, Hongzhan Huang, Kati Laiho, Raja Mazumder, Peter McGarvey, Darren A. Natale, Thanemozhi G. Natarajan, Jules Nchoutmboube, Natalia V. Roberts, Baris E. Suzek, Uzoamaka Ugochukwu, C. R. Vinayaka, Qinghua Wang, Yuqi Wang, Lai-Su Yeh and Jian Zhang www.uniprot.org
  • 40. UniProt is mainly supported by the National Institutes of Health (NIH) grant 1 U41 HG006104-01. Additional support for the EBI's involvement in UniProt comes from the NIH grant 2P41 HG02273-07. Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the Federal Office of Education and Science and the European Commission contracts SLING (226073), Gen2Phen (200754) and MICROME (222886). PIR activities are also supported by the NIH grants 5R01GM080646-04, 3R01GM080646-04S2, 1G08LM010720-01, and 3P20RR016472-09S2, and NSF grant DBI-0850319. “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012

Editor's Notes

  1. Alignment of sequences deduced from 2 genomic DNAs, one cDNA and one ESTAnnotation of erroneous gene model predictions
  2. Annotation of isoforms
  3. Information about how to reconstruct all isoformsAccess to the sequences of all isoformsCan apply various tools
  4. The sequencing of 1001 Arabidopsis genomes is raising several questions and we have to find new solutionsIf not merged, one solution for the blast is to use UniRef, but only valid for functional annotation and not for finding if an homologous protein is already known in a given species