SlideShare a Scribd company logo
FASTA
(FAST-All)
 FASTA stands for fast-all” or “FastA”.
 It was the first database similarity search tool developed, preceding the development of
BLAST.
 FASTA is another sequence alignment tool which is used to search similarities between
sequences of DNA and proteins.
 FASTA uses a “hashing” strategy to find matches for a short stretch of identical residues
with a length of k. The string of residues is known as ktuples or ktups, which are
equivalent to words in BLAST, but are normally shorter than the words.
 Typically, a ktup is composed of two residues for protein sequences and six residues for
DNA sequences.
 The query sequence is thus broken down into sequence patterns or words known as k-
tuples and the target sequences are searched for these k-tuples in order to find the
similarities between the two.
 FASTA is a fine tool for similarity searches.
 These methods are not guaranteed to find the optimal alignment or true homologs, but are
50–100 times faster than dynamic programming.
 FastA - Compares a DNA query sequence to a DNA
database, or a protein query to a protein database,
detecting the sequence type automatically.
Versions 2 and 3 are in common use, version 3
having a highly improved score normalization
method. It significantly reduces the overlap between
the score distributions.
 FASTX - Compares a DNA query to a protein
database. It may introduce gaps only between
codons.
 FASTY - Compares a DNA query to a protein
database, optimizing gap location, even within
codons.
 TFASTA - Compares a protein query to a DNA
database.
• It is used for the identification of the species.
• Used for the establishment of the phylogeny
• For DNA mapping
• FASTA is also used for understanding the
biochemical functions of the protein.
• Study the evolution of the species, from where
that specific species evolved, or identify the
ancestors.
• Calculation of the molecular weight
• Identification of mutations in the sequences by
comparing those sequences with the reference
sequences.
 Basic steps Step1: Set a word size, usually 6 for DNA and 2 for protein. Hashing: FASTA
locates regions of the query sequence and matching regions in the database sequences
that have high densities of exact word matches (without gaps). The length of the
matched word is called the k-tuple parameter.
 Step 2: Scoring: The ten highest scoring regions are rescored using the BLOSUM50
scoring matrix. The score for such a pair of regions is saved as the init1 score.
 Step 3: Introduction of Gaps: FASTA determines if any of the initial regions from
different diagonals may be joined together to form an approximate alignment with gaps.
Only non-overlapping regions may be joined. The score for the joined regions is the
sum of the scores of the initial regions minus a joining penalty for each gap. The score
of the highest scoring region, at the end of this step, is saved as the init n. FASTA
 (4) Step 4: Alignment: After computing the initial scores, FASTA determines the best
segment of similarity between the query sequence and the search set sequence, using a
variation of the SmithWaterman algorithm. The score for this alignment is the opt score.
 Step 5: Random Sequence Simulation: In order to evaluate the significance of such
alignment FASTA empirically estimates the score distribution from the alignment of
many random pairs of sequences. More precisely, the characters of the query sequences
are reshuffled (to maintain bias due to length and character composition) and searched
against a random subset of the database. This empirical distribution is extrapolated,
assuming it is an extreme value distribution, and each alignment to the real query is
assigned a Z-score and an E-score. Modifications: In step4, use a band around init1
 FASTA calculates significance “on the fly”.
This can be problematic if the dataset is
small. To identify an unknown protein
sequence use either of these: FastA3,
Ssearch3 or tFastX3. FASTA3 has improved
methods of aligning sequences and of
calculating the statistical significance of
alignment.
 There is no standard filename extension for a
text file containing FASTA formatted
sequences. The table below shows each
extension and its respective meaning.
 Developed by Steven Altschul and Samuel
Karlin in 1990.
• Compares nucleotide/aminoacid
sequences
• Is a heuristic method.
• Is a fast but approximate method of
alignment.
• Locates local alignments/short matches
called words
blastp: compares a protein sequence against a
protein sequence database.
blastn: compares a nucleotide sequence against a
nucleotide sequence database.
blastx: compares a six frame translation of a
nucleotide sequence against a protein database
tblastn: compares a protein sequence against a
six frame translation of a nucleotide database
tblastx: compares a six frame translation of a
nucleotide sequence against a six frame
translation of a nucleotide database
 Blast searches begin with a query sequence
that will be matched against sequence
databases specified by the user.
•Begins by breaking down the query sequence
into a series of short overlapping “words”
•Default word size for BLAST N is 28 nucleotides
•Default word size for BLAST P is 3 amino acids
•Results obtained depend on the scoring matrix
used.
•BLOSUM 62 matrix is the default scoring matrix
for BLASTP
 Basic steps Step1: Set a word size, usually 11 for DNA and
3 for protein. Given query sequence, compile the list of
possible words, which form with words in high scoring
word pairs (Filter out low complexity regions)
 Step 2: Scan database for exact matching with the list of
words complied in step 1. e.g. qlnfsagw -> (ql, ln, nf, fs,
sa, ag, gw) Extend the list (using some threshold T) Step 3:
Scan through the string and whenever a word in the list is
found try to extend it in both directions (no gaps) to get to
a score beyond a threshold S. While extending use a
parameter L that defines how long an extension will be
tried to raise the score over S.
 Modification of step 3: -Original BLAST: Extension is
continued as long as the score continued to increase. -
Another version -BLAST2 (gapped BLAST): - Lower value of
T is used. - After extension try to combine (allowing gaps)
- Find maximal scoring segment. This program uses the
BLASTP or BLASTN algorithms for aligning two sequences.
 BLAST calculates probabilities and this can fail if
some assumptions are invalid for that search. There
are versions of BLAST for searching nucleic acid and
protein databases, which can be used to translate
DNA sequences prior to comparing them to protein
sequence databases in 1997. Recent improvement in
BLAST is GAPPED-BLAST (three times faster than the
original BLAST) and PSI-BLAST (position-specific-
iterated BLAST). The GAPPED-BLAST algorithm allows
gaps to be introduced into the alignments. That
means that similar regions are not broken into
several segments (as in the older versions). This
method reflects biological relationships much better
than ordinary BLAST.
BLAST AND FASTA.pptx
BLAST AND FASTA.pptx
BLAST AND FASTA.pptx

More Related Content

What's hot

Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
Hafiz Muhammad Zeeshan Raza
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
Alphonsa Joseph
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
KAUSHAL SAHU
 
Cath
CathCath
Cath
Ramya S
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
SHEETHUMOLKS
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
Hafiz Muhammad Zeeshan Raza
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
PrashantSharma807
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
Yogesh Joshi
 
Protein database
Protein databaseProtein database
Protein database
Khalid Hakeem
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
Siva Dharshini R
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
Ariful Islam Sagar
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
benazeer fathima
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
Rida Khalid
 
BLAST
BLASTBLAST
Biological databases
Biological databasesBiological databases
Biological databases
Tamanna Syeda
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
Sangeeta Das
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES nadeem akhter
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
Thapar Institute of Engineering & Technology, Patiala, Punjab, India
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignmentavrilcoghlan
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
Ashwini
 

What's hot (20)

Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
 
Cath
CathCath
Cath
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Protein database
Protein databaseProtein database
Protein database
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
BLAST
BLASTBLAST
BLAST
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 

Similar to BLAST AND FASTA.pptx

Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformaticsatmapandey
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
alizain9604
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
barathvaj
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
PagudalaSangeetha
 
Sequence database
Sequence databaseSequence database
Sequence database
Dr.M.Prasad Naidu
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
AnkitTiwari354
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
ruchibioinfo
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
ericndunek
 
BLAST
BLASTBLAST
FastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHMFastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHM
Muunda Mudenda
 
BLAST
BLASTBLAST
BLAST
rishabhaks
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
PILLAI ASWATHY VISWANATH
 
BLAST : features, types,algorithm, working etc.
BLAST : features, types,algorithm,  working  etc.BLAST : features, types,algorithm,  working  etc.
BLAST : features, types,algorithm, working etc.
Cherry
 
Blast fasta
Blast fastaBlast fasta
Blast fastayaghava
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
Sobia
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
Meghaj Mallick
 
Ayush PPt Tblast-1.pptx
Ayush PPt Tblast-1.pptxAyush PPt Tblast-1.pptx
Ayush PPt Tblast-1.pptx
AyushMeshram14
 
FASTA
FASTAFASTA
Sequence alignment.pptx
Sequence alignment.pptxSequence alignment.pptx
Sequence alignment.pptx
PagudalaSangeetha
 

Similar to BLAST AND FASTA.pptx (20)

Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
 
Sequence database
Sequence databaseSequence database
Sequence database
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
 
BLAST
BLASTBLAST
BLAST
 
FastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHMFastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHM
 
BLAST
BLASTBLAST
BLAST
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Blast
BlastBlast
Blast
 
BLAST : features, types,algorithm, working etc.
BLAST : features, types,algorithm,  working  etc.BLAST : features, types,algorithm,  working  etc.
BLAST : features, types,algorithm, working etc.
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Ayush PPt Tblast-1.pptx
Ayush PPt Tblast-1.pptxAyush PPt Tblast-1.pptx
Ayush PPt Tblast-1.pptx
 
FASTA
FASTAFASTA
FASTA
 
Sequence alignment.pptx
Sequence alignment.pptxSequence alignment.pptx
Sequence alignment.pptx
 

Recently uploaded

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Denish Jangid
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Sourabh Kumar
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
ricssacare
 
plant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated cropsplant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated crops
parmarsneha2
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 

Recently uploaded (20)

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
 
plant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated cropsplant breeding methods in asexually or clonally propagated crops
plant breeding methods in asexually or clonally propagated crops
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 

BLAST AND FASTA.pptx

  • 2.  FASTA stands for fast-all” or “FastA”.  It was the first database similarity search tool developed, preceding the development of BLAST.  FASTA is another sequence alignment tool which is used to search similarities between sequences of DNA and proteins.  FASTA uses a “hashing” strategy to find matches for a short stretch of identical residues with a length of k. The string of residues is known as ktuples or ktups, which are equivalent to words in BLAST, but are normally shorter than the words.  Typically, a ktup is composed of two residues for protein sequences and six residues for DNA sequences.  The query sequence is thus broken down into sequence patterns or words known as k- tuples and the target sequences are searched for these k-tuples in order to find the similarities between the two.  FASTA is a fine tool for similarity searches.  These methods are not guaranteed to find the optimal alignment or true homologs, but are 50–100 times faster than dynamic programming.
  • 3.  FastA - Compares a DNA query sequence to a DNA database, or a protein query to a protein database, detecting the sequence type automatically. Versions 2 and 3 are in common use, version 3 having a highly improved score normalization method. It significantly reduces the overlap between the score distributions.  FASTX - Compares a DNA query to a protein database. It may introduce gaps only between codons.  FASTY - Compares a DNA query to a protein database, optimizing gap location, even within codons.  TFASTA - Compares a protein query to a DNA database.
  • 4.
  • 5.
  • 6. • It is used for the identification of the species. • Used for the establishment of the phylogeny • For DNA mapping • FASTA is also used for understanding the biochemical functions of the protein. • Study the evolution of the species, from where that specific species evolved, or identify the ancestors. • Calculation of the molecular weight • Identification of mutations in the sequences by comparing those sequences with the reference sequences.
  • 7.  Basic steps Step1: Set a word size, usually 6 for DNA and 2 for protein. Hashing: FASTA locates regions of the query sequence and matching regions in the database sequences that have high densities of exact word matches (without gaps). The length of the matched word is called the k-tuple parameter.  Step 2: Scoring: The ten highest scoring regions are rescored using the BLOSUM50 scoring matrix. The score for such a pair of regions is saved as the init1 score.  Step 3: Introduction of Gaps: FASTA determines if any of the initial regions from different diagonals may be joined together to form an approximate alignment with gaps. Only non-overlapping regions may be joined. The score for the joined regions is the sum of the scores of the initial regions minus a joining penalty for each gap. The score of the highest scoring region, at the end of this step, is saved as the init n. FASTA  (4) Step 4: Alignment: After computing the initial scores, FASTA determines the best segment of similarity between the query sequence and the search set sequence, using a variation of the SmithWaterman algorithm. The score for this alignment is the opt score.  Step 5: Random Sequence Simulation: In order to evaluate the significance of such alignment FASTA empirically estimates the score distribution from the alignment of many random pairs of sequences. More precisely, the characters of the query sequences are reshuffled (to maintain bias due to length and character composition) and searched against a random subset of the database. This empirical distribution is extrapolated, assuming it is an extreme value distribution, and each alignment to the real query is assigned a Z-score and an E-score. Modifications: In step4, use a band around init1
  • 8.  FASTA calculates significance “on the fly”. This can be problematic if the dataset is small. To identify an unknown protein sequence use either of these: FastA3, Ssearch3 or tFastX3. FASTA3 has improved methods of aligning sequences and of calculating the statistical significance of alignment.
  • 9.  There is no standard filename extension for a text file containing FASTA formatted sequences. The table below shows each extension and its respective meaning.
  • 10.  Developed by Steven Altschul and Samuel Karlin in 1990. • Compares nucleotide/aminoacid sequences • Is a heuristic method. • Is a fast but approximate method of alignment. • Locates local alignments/short matches called words
  • 11.
  • 12. blastp: compares a protein sequence against a protein sequence database. blastn: compares a nucleotide sequence against a nucleotide sequence database. blastx: compares a six frame translation of a nucleotide sequence against a protein database tblastn: compares a protein sequence against a six frame translation of a nucleotide database tblastx: compares a six frame translation of a nucleotide sequence against a six frame translation of a nucleotide database
  • 13.  Blast searches begin with a query sequence that will be matched against sequence databases specified by the user. •Begins by breaking down the query sequence into a series of short overlapping “words” •Default word size for BLAST N is 28 nucleotides •Default word size for BLAST P is 3 amino acids •Results obtained depend on the scoring matrix used. •BLOSUM 62 matrix is the default scoring matrix for BLASTP
  • 14.  Basic steps Step1: Set a word size, usually 11 for DNA and 3 for protein. Given query sequence, compile the list of possible words, which form with words in high scoring word pairs (Filter out low complexity regions)  Step 2: Scan database for exact matching with the list of words complied in step 1. e.g. qlnfsagw -> (ql, ln, nf, fs, sa, ag, gw) Extend the list (using some threshold T) Step 3: Scan through the string and whenever a word in the list is found try to extend it in both directions (no gaps) to get to a score beyond a threshold S. While extending use a parameter L that defines how long an extension will be tried to raise the score over S.  Modification of step 3: -Original BLAST: Extension is continued as long as the score continued to increase. - Another version -BLAST2 (gapped BLAST): - Lower value of T is used. - After extension try to combine (allowing gaps) - Find maximal scoring segment. This program uses the BLASTP or BLASTN algorithms for aligning two sequences.
  • 15.  BLAST calculates probabilities and this can fail if some assumptions are invalid for that search. There are versions of BLAST for searching nucleic acid and protein databases, which can be used to translate DNA sequences prior to comparing them to protein sequence databases in 1997. Recent improvement in BLAST is GAPPED-BLAST (three times faster than the original BLAST) and PSI-BLAST (position-specific- iterated BLAST). The GAPPED-BLAST algorithm allows gaps to be introduced into the alignments. That means that similar regions are not broken into several segments (as in the older versions). This method reflects biological relationships much better than ordinary BLAST.