SlideShare a Scribd company logo
1 of 37
4/21/2024 8:54 PM
Introduction to Bioinformatics
databases: Nucleic Acid
Databases
Dinesh
Gupta
ICGEB
4/21/2024 8:54 PM
Biological databases: why?
• Need for storing and communicating
large datasets has grown
• Make biological data available to
scientists.
• To make biological data available in
computer-readable form.
4/21/2024 8:54 PM
Different classifications of
databases
• Type of data
– nucleotide sequences
– protein sequences
– proteins sequence patterns or motifs
– macromolecular 3D structure
– gene expression data
– metabolic pathways
4/21/2024 8:54 PM
Different classifications of databases….
• Primary or derived databases
– Primary databases: experimental results
directly into database
– Secondary databases: results of analysis of
primary databases
– Aggregate of many databases
• Links to other data items
• Combination of data
• Consolidation of data
4/21/2024 8:54 PM
Different classifications of databases….
• Technical design
– Flat-files
– Relational database (SQL)
– Exchange/publication technologies (FTP,
HTML, CORBA, XML,...)
4/21/2024 8:54 PM
Different classifications of databases….
• Availability
– Publicly available, no restrictions
– Available, but with copyright
– Accessible, but not downloadable
– Academic, but not freely available
– Proprietary, commercial; possibly free for
academics
4/21/2024 8:54 PM
Where do I get DB of my interest ?
4/21/2024 8:54 PM
4/21/2024 8:54 PM
http://www3.oup.co.uk/nar/database/c/
4/21/2024 8:54 PM
Nucleotide sequence databases
• EMBL, GenBank, and DDBJ are the three
primary nucleotide sequence
databases
• EMBL www.ebi.ac.uk/embl/
• GenBank
www.ncbi.nlm.nih.gov/Genbank/
• DDBJ www.ddbj.nig.ac.jp
4/21/2024 8:54 PM
Genbank
• An annotated collection of all publicly
available nucleotide and proteins
• Set up in 1979 at the LANL (Los Alamos).
• Maintained since 1992 NCBI (Bethesda).
• http://www.ncbi.nlm.nih.gov
4/21/2024 8:54 PM
4/21/2024 8:54 PM
4/21/2024 8:54 PM
EMBL Nucleotide Sequence
Database
• An annotated collection of all publicly available
nucleotide and protein sequences
• Created in 1980 at the European Molecular
Biology Laboratory in Heidelberg.
• Maintained since 1994 by EBI- Cambridge.
• http://www.ebi.ac.uk/embl.html
4/21/2024 8:54 PM
4/21/2024 8:54 PM
http://www3.ebi.ac.uk/Services/DBStats/
4/21/2024 8:54 PM
DDBJ–DNA Data Bank of Japan
• An annotated collection of all publicly available
nucleotide and protein sequences
• Started, 1984 at the National Institute of
Genetics (NIG) in Mishima.
• Still maintained in this institute a team led by
Takashi Gojobori.
• http://www.ddbj.nig.ac.jp
4/21/2024 8:54 PM
4/21/2024 8:54 PM
4/21/2024 8:54 PM
Other NCBI nucleic acids DBs
• EST database: A collection of expressed sequence tags, or short, single-pass sequence
reads from mRNA (cDNA).
• GSS database: A database of genome survey sequences, or short, single-pass genomic
sequences.
• HomoloGene: A gene homology tool that compares nucleotide sequences between pairs of
organisms in order to identify putative orthologs.
• HTG database: A collection of high-throughput genome sequences from large-scale
genome sequencing centers, including unfinished and finished sequences.
• SNPs database: A central repository for both single-base nucleotide substitutions and
short deletion and insertion polymorphisms.
• RefSeq: A database of non-redundant reference sequences standards, including genomic
DNA contigs, mRNAs, and proteins for known genes. Multiple collaborations, both within
NCBI and with external groups, supports data-gathering efforts.
• STS database: A database of sequence tagged sites, or short sequences that are
operationally unique in the genome.
• UniSTS: A unified, non-redundant view of sequence tagged sites (STSs).
• UniGene: A collection of ESTs and full-length mRNA sequences organized into clusters,
each representing a unique known or putative human gene annotated with mapping and
expression information and cross-references to other sources.
4/21/2024 8:54 PM
4/21/2024 8:54 PM
4/21/2024 8:54 PM
Sequence submission
• Data mainly direct submissions from the
authors.
• Submissions through the Internet:
– Web forms.
– Email.
• Sequences shared/exchanged between
the 3 centers on a daily basis:
– The sequence content of the banks is
identical.
4/21/2024 8:54 PM
Derived databases
• CUTG Codon usage tabulated from GenBank
http://www.kazusa.or.jp/codon/
• Genetic Codes Deviations from the standard genetic code in various
organisms and organelles
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c
• TIGR Gene Indices Organism-specific databases of EST and gene
sequences http://www.tigr.org/tdb/tgi.shtml
• UniGene Unified clusters of ESTs and full-length mRNA sequences
http://www.ncbi.nlm.nih.gov/UniGene/
• ASAP Alternative spliced isoforms
http://www.bioinformatics.ucla.edu/ASAP
• Intronerator Introns and alternative splicing in C.elegans and
C.briggsae http://www.cse.ucsc.edu/~kent/intronerator/
4/21/2024 8:54 PM
4/21/2024 8:54 PM
4/21/2024 8:54 PM
4/21/2024 8:54 PM
4/21/2024 8:54 PM
4/21/2024 8:54 PM
4/21/2024 8:54 PM
Nucleic acid structure
databases
• NDB Nucleic acid-containing structures
http://ndbserver.rutgers.edu/
• NTDB Thermodynamic data for nucleic acids
http://ntdb.chem.cuhk.edu.hk/
• RNABase RNA-containing structures from PDB and
NDB http://www.rnabase.org/
• SCOR Structural classification of RNA: RNA motifs by
structure, function and tertiary interactions
• http://scor.lbl.gov/
4/21/2024 8:54 PM
4/21/2024 8:54 PM
4/21/2024 8:54 PM
4/21/2024 8:54 PM
4/21/2024 8:54 PM
Database searching tips
• Look for links to Help or Examples
• Try Boolean searches
• Be careful with UK/US spelling differences
– leukaemia vs leukemia
– haemoglobin vs hemoglobin
– colour vs color
4/21/2024 8:54 PM
Exercises
• Study the statistics of the three primary nucleic acid
databases: Are they matching ?
• Look for a gene of your interest in the three primary
nucleic acid databases: compare the information given in
each one of them.
• Read NAR DB paper and NAR DB index site: search for
different nucleic acid databases based on different
search terms.
• Self study:
– http://www3.oup.co.uk/nar/database/c/
– Download NAR database paper (NARDB2004) from:
ftp://cbag.sc.mahidol.ac.th/pub/Course_Materials/dinesh

More Related Content

Similar to Nucleic_Acid_Databases, Bioinformatics, genome

biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptxscience lover
 
Hands on training_biological_databases.ppt
Hands on training_biological_databases.pptHands on training_biological_databases.ppt
Hands on training_biological_databases.pptSoumen Barman
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its toolsGaurav Diwakar
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 
Bioinformatics Introduction
Bioinformatics IntroductionBioinformatics Introduction
Bioinformatics IntroductionDavid Montaner
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Nucleic Acid Sequence Databases
Nucleic Acid Sequence DatabasesNucleic Acid Sequence Databases
Nucleic Acid Sequence Databasesfarwa fayaz
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Amit Sheth
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2Razzaqe
 
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptBioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptNaglaaFathy42
 
Bioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzcBioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzcAdiM27
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2Razzaqe
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Databasenist-spin
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyChrist College, Rajkot
 

Similar to Nucleic_Acid_Databases, Bioinformatics, genome (20)

biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Hands on training_biological_databases.ppt
Hands on training_biological_databases.pptHands on training_biological_databases.ppt
Hands on training_biological_databases.ppt
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
Biological data base
Biological data baseBiological data base
Biological data base
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics Introduction
Bioinformatics IntroductionBioinformatics Introduction
Bioinformatics Introduction
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Nucleic Acid Sequence Databases
Nucleic Acid Sequence DatabasesNucleic Acid Sequence Databases
Nucleic Acid Sequence Databases
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
 
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptBioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.ppt
 
Bioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzcBioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzc
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
 
PDF文档.pdf
PDF文档.pdfPDF文档.pdf
PDF文档.pdf
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASy
 

More from MohamedHasan816582

Bioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsBioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsMohamedHasan816582
 
Next Generation Sequence Analysis and genomics
Next Generation Sequence Analysis and genomicsNext Generation Sequence Analysis and genomics
Next Generation Sequence Analysis and genomicsMohamedHasan816582
 
Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationMohamedHasan816582
 
genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptMohamedHasan816582
 
Databases, bioinformatics, sequence analysis
Databases, bioinformatics, sequence analysisDatabases, bioinformatics, sequence analysis
Databases, bioinformatics, sequence analysisMohamedHasan816582
 
Genes, Genomics, and Chromosomes computational biology introduction .ppt
Genes, Genomics, and Chromosomes computational biology introduction .pptGenes, Genomics, and Chromosomes computational biology introduction .ppt
Genes, Genomics, and Chromosomes computational biology introduction .pptMohamedHasan816582
 

More from MohamedHasan816582 (11)

Bioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsBioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt Bioinformatics
 
Next Generation Sequence Analysis and genomics
Next Generation Sequence Analysis and genomicsNext Generation Sequence Analysis and genomics
Next Generation Sequence Analysis and genomics
 
Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generation
 
genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.ppt
 
Databases, bioinformatics, sequence analysis
Databases, bioinformatics, sequence analysisDatabases, bioinformatics, sequence analysis
Databases, bioinformatics, sequence analysis
 
Genes, Genomics, and Chromosomes computational biology introduction .ppt
Genes, Genomics, and Chromosomes computational biology introduction .pptGenes, Genomics, and Chromosomes computational biology introduction .ppt
Genes, Genomics, and Chromosomes computational biology introduction .ppt
 
protein.pptx
protein.pptxprotein.pptx
protein.pptx
 
lecture 1.pptx
lecture 1.pptxlecture 1.pptx
lecture 1.pptx
 
protein Lec.1.ppt
protein Lec.1.pptprotein Lec.1.ppt
protein Lec.1.ppt
 
proteome.pdf
proteome.pdfproteome.pdf
proteome.pdf
 
proteome.pptx
proteome.pptxproteome.pptx
proteome.pptx
 

Recently uploaded

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 

Recently uploaded (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

Nucleic_Acid_Databases, Bioinformatics, genome

  • 1. 4/21/2024 8:54 PM Introduction to Bioinformatics databases: Nucleic Acid Databases Dinesh Gupta ICGEB
  • 2. 4/21/2024 8:54 PM Biological databases: why? • Need for storing and communicating large datasets has grown • Make biological data available to scientists. • To make biological data available in computer-readable form.
  • 3. 4/21/2024 8:54 PM Different classifications of databases • Type of data – nucleotide sequences – protein sequences – proteins sequence patterns or motifs – macromolecular 3D structure – gene expression data – metabolic pathways
  • 4. 4/21/2024 8:54 PM Different classifications of databases…. • Primary or derived databases – Primary databases: experimental results directly into database – Secondary databases: results of analysis of primary databases – Aggregate of many databases • Links to other data items • Combination of data • Consolidation of data
  • 5. 4/21/2024 8:54 PM Different classifications of databases…. • Technical design – Flat-files – Relational database (SQL) – Exchange/publication technologies (FTP, HTML, CORBA, XML,...)
  • 6. 4/21/2024 8:54 PM Different classifications of databases…. • Availability – Publicly available, no restrictions – Available, but with copyright – Accessible, but not downloadable – Academic, but not freely available – Proprietary, commercial; possibly free for academics
  • 7. 4/21/2024 8:54 PM Where do I get DB of my interest ?
  • 10. 4/21/2024 8:54 PM Nucleotide sequence databases • EMBL, GenBank, and DDBJ are the three primary nucleotide sequence databases • EMBL www.ebi.ac.uk/embl/ • GenBank www.ncbi.nlm.nih.gov/Genbank/ • DDBJ www.ddbj.nig.ac.jp
  • 11. 4/21/2024 8:54 PM Genbank • An annotated collection of all publicly available nucleotide and proteins • Set up in 1979 at the LANL (Los Alamos). • Maintained since 1992 NCBI (Bethesda). • http://www.ncbi.nlm.nih.gov
  • 14. 4/21/2024 8:54 PM EMBL Nucleotide Sequence Database • An annotated collection of all publicly available nucleotide and protein sequences • Created in 1980 at the European Molecular Biology Laboratory in Heidelberg. • Maintained since 1994 by EBI- Cambridge. • http://www.ebi.ac.uk/embl.html
  • 17. 4/21/2024 8:54 PM DDBJ–DNA Data Bank of Japan • An annotated collection of all publicly available nucleotide and protein sequences • Started, 1984 at the National Institute of Genetics (NIG) in Mishima. • Still maintained in this institute a team led by Takashi Gojobori. • http://www.ddbj.nig.ac.jp
  • 20. 4/21/2024 8:54 PM Other NCBI nucleic acids DBs • EST database: A collection of expressed sequence tags, or short, single-pass sequence reads from mRNA (cDNA). • GSS database: A database of genome survey sequences, or short, single-pass genomic sequences. • HomoloGene: A gene homology tool that compares nucleotide sequences between pairs of organisms in order to identify putative orthologs. • HTG database: A collection of high-throughput genome sequences from large-scale genome sequencing centers, including unfinished and finished sequences. • SNPs database: A central repository for both single-base nucleotide substitutions and short deletion and insertion polymorphisms. • RefSeq: A database of non-redundant reference sequences standards, including genomic DNA contigs, mRNAs, and proteins for known genes. Multiple collaborations, both within NCBI and with external groups, supports data-gathering efforts. • STS database: A database of sequence tagged sites, or short sequences that are operationally unique in the genome. • UniSTS: A unified, non-redundant view of sequence tagged sites (STSs). • UniGene: A collection of ESTs and full-length mRNA sequences organized into clusters, each representing a unique known or putative human gene annotated with mapping and expression information and cross-references to other sources.
  • 23. 4/21/2024 8:54 PM Sequence submission • Data mainly direct submissions from the authors. • Submissions through the Internet: – Web forms. – Email. • Sequences shared/exchanged between the 3 centers on a daily basis: – The sequence content of the banks is identical.
  • 24. 4/21/2024 8:54 PM Derived databases • CUTG Codon usage tabulated from GenBank http://www.kazusa.or.jp/codon/ • Genetic Codes Deviations from the standard genetic code in various organisms and organelles http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c • TIGR Gene Indices Organism-specific databases of EST and gene sequences http://www.tigr.org/tdb/tgi.shtml • UniGene Unified clusters of ESTs and full-length mRNA sequences http://www.ncbi.nlm.nih.gov/UniGene/ • ASAP Alternative spliced isoforms http://www.bioinformatics.ucla.edu/ASAP • Intronerator Introns and alternative splicing in C.elegans and C.briggsae http://www.cse.ucsc.edu/~kent/intronerator/
  • 31. 4/21/2024 8:54 PM Nucleic acid structure databases • NDB Nucleic acid-containing structures http://ndbserver.rutgers.edu/ • NTDB Thermodynamic data for nucleic acids http://ntdb.chem.cuhk.edu.hk/ • RNABase RNA-containing structures from PDB and NDB http://www.rnabase.org/ • SCOR Structural classification of RNA: RNA motifs by structure, function and tertiary interactions • http://scor.lbl.gov/
  • 36. 4/21/2024 8:54 PM Database searching tips • Look for links to Help or Examples • Try Boolean searches • Be careful with UK/US spelling differences – leukaemia vs leukemia – haemoglobin vs hemoglobin – colour vs color
  • 37. 4/21/2024 8:54 PM Exercises • Study the statistics of the three primary nucleic acid databases: Are they matching ? • Look for a gene of your interest in the three primary nucleic acid databases: compare the information given in each one of them. • Read NAR DB paper and NAR DB index site: search for different nucleic acid databases based on different search terms. • Self study: – http://www3.oup.co.uk/nar/database/c/ – Download NAR database paper (NARDB2004) from: ftp://cbag.sc.mahidol.ac.th/pub/Course_Materials/dinesh