SlideShare a Scribd company logo
SEQUENCEANALYSIS
Dr. Gobind Ram
Assistant Professor
P.G. Department of
Biotechnology
Lyallpur Khalsa College,
Jalandhar
Why Bioinformatics is Important?
• Applications areas include
– Medicine
– Pharmaceutical drug
design
– Toxicology
– Molecular evolution
– Biological computing
models
Genomics
Molecular
evolution
Biophysics
Molecular
biology
computer
science
Bioinformatics
Mathematics
ctgccgatagc
MKLVDDYTR
o
i
d1
1
1
s e
Where do the data come from?
literature
Information
Sequence alignment
Alignment: Comparing two (pairwise) or more
(multiple) sequences. Searching for a series of
identical or similar characters in the sequences.
-Similarity : Same Physicochemical properties.
- Identity :- Identical
MVNLTSDEKTAVLALWNKVDVEDCGGE
|| || ||||| ||| || || ||
MVHLTPEEKTAVNALWGKVNVDAVGGE
Sequence alignment-why???
• The basis for comparison of proteins and genes
using the similarity of their sequences is that the
the proteins or genes are related by evolution;
they have a common ancestor.
• Random mutations in the sequences accumulate
over time, so that proteins or genes that have a
common ancestor far back in time are not as similar
as proteins or genes that diverged from each other
more recently.
Alignment
• A way of arranging the objects or alphabets to
find out the similarity and difference existing
between them.
• In case of bioinformatics, it is the arrangement
of sequence (DNA,RNA or protein) to find out
the regions of similarity and difference by
virtue of which homology can be predicted.
ALIGNMENT
Local alignment Global alignment
Pairwise sequence
alignment
Multiple sequence
alignment
Why perform to pair wise sequence
alignment?
Finding homology between two sequences
Example : Protein prediction(Sequence or
Structure).
similar sequence (or structure)
similar function
Local Vs. Global
• Global alignment compares through out the sequence
and gives best overall alignment but may fail to find out
the local region of similarity among sequence which
exactly contain the domain and motif information.
• Local alignment find regions of ungapped sequence
with high level of similarity. Best for finding the motif
although two sequences are different.
Local alignment – finds regions of high similarity in
parts of the sequences
Global alignment – finds the best alignment across
the entire two sequences
Local vs. Global
Three types of nucleotide changes:
1. Substitution – a replacement of one (or more)
sequence characters by another:
2. Insertion - an insertion of one (or more) sequence
characters:
3. Deletion – a deletion of one (or more) sequence
characters:
T
A
Evolutionary changes in sequences
Insertion + Deletion  Indel
AAGA AACA

AAG
GA
A
A
Choosing an alignment:
• Many different alignments between two
sequences are possible:
AAGCTGAATTCGAA
AGGCTCATTTCTGA
A-AGCTGAATTC--GAA
AG-GCTCA-TTTCTGA-
How one can determine which is the best alignment?
AAGCTGAATT-C-GAA
AGGCT-CATTTCTGA-
. . .
Exercise
• Match: +1
• Mismatch: -2
• Indel: -1
AAGCTGAATT-C-GAA
AGGCT-CATTTCTGA-
A-AGCTGAATTC--GAA
AG-GCTCA-TTTCTGA-
Compute the scores of each of the following alignments
Scoring scheme:
-2
-2
-2
1
-2
-2
1
-2
-2
1
-2
-2
1
-2
-2
-2
A
C
G
T
A C G T
Substitution matrix
Gap penalty (opening = extending)
Open Reading Frames(ORFs)
•6 possible ORFs
–frames 1,2,and 3 in 5’ to 3’direction
–frames 1,2, and 3 in 5’ to 3’ direction
of complimentary strand.
The different reading frames give
entirely different proteins.
Each gene uses a single reading frame, so
once the ribosome gets started, it just has
to count off groups of 3 bases to produce
the proper protein.
PAM matrices
• Family of matrices PAM 80, PAM 120, PAM 250, …
• The number with a PAM matrix (the n in PAMn) represents
the evolutionary distance between the sequences on which
the matrix is based
• The (ith,jth) cell in a PAMn matrix denotes the probability that
amino-acid i will be replaced by amino-acid j in time n:
Pi→j,n .
• Greater n numbers denote greater distances
BLOSUM matrices
• Different BLOSUMn matrices are calculated independently
from BLOCKS (ungapped, manually created local alignments)
• BLOSUMn is based on a cluster of BLOCKS of sequences
that share at least n percent identity
• The (ith,jth) cell in a BLOSUM matrix denotes the log of odds
of the observed frequency and expected frequency of amino
acids i and j in the same position in the data: log(Pij/qi*qj)
• Higher n numbers denote higher identity between the
sequences on which the matrix is based
BLAST
(Basic Local Alignment Search Tool)
• The BLAST program was designed by Eugene
Myers, Stephen Altschul, Warren Gish, David J.
Lipman and Webb Miller at the NIH and was
published in J. Mol. Biol. in 1990.
• OBJECTIVE: Find high scoring ungapped segment
among related sequences
• Most widely used bioinformatics programs as the
algorithm emphasizes speed over sensitivity.
• An algorithm for comparing primary biological
sequence information to find out the similarity
existing between these two.
• Emphasizes on regions of local alignment to
detect relationship among sequences which
shares only isolated regions of similarity.
• Not only a tool for visualizing alignment but
also give a view to compare structure and
function.
Steps for BLAST
 Searches for exact matches of a small fixed length
between query sequence in the database called Seed.
 BLAST tries to extend the match in both direction
starting at the seed ungapped alignment occur---- High
Scoring Segment Pair (HSP).
 The highest scored HSP’s are presented as final report.
They are called Maximum Scoring Pairing
BLAST performs a gapped alignment
between query sequence and database
sequence using a variation of Smith-
Watermann Algorithm statistically
significant alignments are then displayed
to user
BLAST PROGRAMS
• BLASTP: protein query sequence against a protein
database, allowing for gaps.
• BLASTN: DNA query sequence against a DNA database,
allowing for gaps.
• BLASTX: DNA query sequence, translated into all six
reading frames, against a protein database, allowing for
gaps.
• TBLASTN: protein query sequence against a DNA
database, translated into all six reading frames, allowing
for gaps.
• TBLASTX: DNA query sequence, translated into all six
reading frames, against a DNA database, translated into
all six reading frames (No gaps allowed)
PSI-BLAST
(position-specific scoring matrix)
• Used to find distant relatives of a protein.
• First, a list of all closely related proteins is
created. These proteins are combined into a
general "profile" sequence.
• Now this profile used as a query and again the
search performed to get the more distantly
related sequence.
• PSI-BLAST is much more sensitive in picking
up distant evolutionary relationships than a
standard protein-protein BLAST.
Statistical Significance
Matrix
• A key element in evaluating the quality of a
pairwise sequence alignment is the
"substitution matrix", which assigns a score for
aligning any possible pair of residues.
• BLAST includes BLOSUM & PAM matrix.
BLOSUM62 Scoring Matrix
One-Letter Code for Amino Acid Alphabet (L = 20)
ACDEFGHIKLMNPQRSTVWY
S Henikoff & JG Henikoff (1993) Proteins 17:49
A C D E F G H I K L M N P Q R S T V W Y
A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2
C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2
D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3
E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2
F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3
G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3
H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2
I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1
K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2
L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1
M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1
N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2
P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3
Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1
R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2
S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2
T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2
V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1
W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2
Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
A C D E F G H I K L M N P Q R S T V W Y
A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2
C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2
D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3
E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2
F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3
G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3
H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2
I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1
K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2
L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1
M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1
N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2
P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3
Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1
R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2
S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2
T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2
V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1
W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2
Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
A C D E F G H I K L M N P Q R S T V W Y
A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2
C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2
D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3
E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2
F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3
G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3
H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2
I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1
K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2
L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1
M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1
N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2
P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3
Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1
R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2
S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2
T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2
V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1
W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2
Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
A C D E F G H I K L M N P Q R S T V W Y
A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2
C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2
D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3
E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2
F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3
G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3
H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2
I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1
K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2
L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1
M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1
N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2
P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3
Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1
R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2
S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2
T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2
V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1
W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2
Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
 
, log ab
a b
q
X a b
p p

Log-odds Score
BLOSUM62 Scoring Matrix
One-Letter Code for Amino Acid Alphabet (L = 20)
ACDEFGHIKLMNPQRSTVWY
A C D E F G H I K L M N P Q R S T V W Y
A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2
C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2
D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3
E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2
F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3
G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3
H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2
I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1
K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2
L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1
M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1
N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2
P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3
Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1
R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2
S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2
T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2
V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1
W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2
Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
The Score Matrix
ACDEFGH
HICDYGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
ACDEFGH
HICDYGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
Gaps
Similarity
Identity
 
,
i j
X A B
ACDEFGH
HICDYGH
A
B
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
Paths in the Score Matrix
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
A C D E F G H
H -2 -3 -1 0 -3 -2 8
I -1 -1 -3 -3 0 -4 -3
C 0 9 -3 -4 -2 -3 -3
D -2 -3 6 2 -3 -1 -1
Y -2 -2 -3 -2 3 -3 2
G 0 -3 -1 -2 -3 6 -2
H -2 -3 -1 0 -3 -2 8
-ACDEFGH
HICD-YGH
Deletion
Insertion
Matches
O
T
Alignments are in a one-
to-one correspondence
with score matrix paths.
Low Complexity Regions
• Amino acid or DNA sequence regions that offer very
low information due to their highly biased content
– histidine-rich domains in amino acids
– poly-A tails in DNA sequences
– poly-G tails in nucleotides
– runs of purines
– runs of pyrimidines
– runs of a single amino acid, etc.
E-value
• Depends on database size
• Indicates probability of a database
match expected as result of random
chance
• Lower E-value, more significant
sequence, less likely Db result of
random chance
E=m x n x p
E=E-value
m=total no. of residues in Database
n=no. of residues in query sequence
p= probability that high scoring pair is result of
random chance
• E-value 0.01 and 10-50 Homology
• E-value 0.01 and 10 not significant to
remote homology
• E-value>10 distantly related
Bit Score
• Measure sequence similarity which is independent of
query sequence length and database size but based on Raw
Pairwise Alignment
• High bit score , high significantly match
• S’ (λ S-lnk)/ln2
S’=bit score
λ =grumble distributation constt.
K=constt.associated with scoring matrix
(λ and k are two statistical parameters)
Low Complexity Regions (LCR)
Masking:
(I) Hard masking
(II) Soft Masking
Program for Masking
(i) SEG :high frequency region declared LCR
(ii) RepeatMasker: score for a sequence region above
certain threshold region declared LCR. Residue
masked with N’s and X’s
Mask repetitive sequences
MNPQQQQQQRST = MNPXXXXXXRST
X will not match anything in the database.
It does preserve position, however.
BLAST result page
• BLAST result page divided into 3 parts.
• Part1 contains the information regarding version, database
used, reference and length of the query sequence.
• Part-2 is the conserved regions and graphical representation
of the alignment where each line represents the alignment of
query sequence with one database sequence.
• It shows the result in 5 different color depending upon the bit
score.
• Part-3 contains the list of database sequence having
similarity obtained while database search and detail view of
alignment along with bitscore, e-value, identities, positives
and gaps.
Part-1
Part-2
Part-3
BLAST Preferred
• BLAST uses substitution matrix to find
matching while FASTA identifies identical
matching words using hashing procedure. By
default FASTA scans smaller window sizes
.Thus it gives more sensitive results than
BLAST with better coverage rates of
homologs but usually slower than BLAST
• BLAST use low complexity masking means it
may have higher specificity than FASTA
therefore false positives are reduced
• BLAST sometimes give multiple best scoring
alignments from the same sequence, FASTA
returns only one final alignment
REFRENCES
 Jin Xiong(2006). Essential Bioinformatics.
Cambridge University Press.
Mount D. W. (2004). Bioinformatics &
Genome Analysis. Cold Spring Harbor
Laboratory Press.
URL:-
WWW.ncbi.nlm.nih.gov
THANKS

More Related Content

What's hot

Cellular totipotency in plants
Cellular totipotency in plantsCellular totipotency in plants
Cellular totipotency in plants
Richa Khatiwada
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
Vidya Kalaivani Rajkumar
 
Nucleotide excision repair
Nucleotide excision repairNucleotide excision repair
Nucleotide excision repair
shru1604
 
co and post translation modification
co and post translation modificationco and post translation modification
co and post translation modification
KAUSHAL SAHU
 
Stem cell culture, its application
Stem cell culture, its applicationStem cell culture, its application
Stem cell culture, its application
KAUSHAL SAHU
 
Vulva development - C.elegans
Vulva development - C.elegansVulva development - C.elegans
Vulva development - C.elegans
naren
 
Bioinformatics role in Pharmaceutical industries
Bioinformatics role in Pharmaceutical industriesBioinformatics role in Pharmaceutical industries
Bioinformatics role in Pharmaceutical industries
Muzna Kashaf
 
Molecular marker and gene mapping
Molecular marker and gene  mappingMolecular marker and gene  mapping
Molecular marker and gene mapping
Ibad khan
 
Markers and reporter genes
Markers and reporter genesMarkers and reporter genes
Markers and reporter genes
RajDip Basnet
 
Cell determination and differentiation
Cell determination and differentiationCell determination and differentiation
Cell determination and differentiation
vishnupriya456
 
Chromatin remodeling
Chromatin remodelingChromatin remodeling
Chromatin remodeling
salvia16
 
Applications of Proteomics in plant biotic stress
Applications of Proteomics in plant biotic stressApplications of Proteomics in plant biotic stress
Applications of Proteomics in plant biotic stress
Bahauddin zakariya university,Multan
 
Somatic cell hybridization
Somatic cell hybridizationSomatic cell hybridization
Somatic cell hybridization
GhulamRasoolchannar
 
Organ culture- animal tissue culture
Organ culture- animal tissue cultureOrgan culture- animal tissue culture
Organ culture- animal tissue culture
kathantrivedi3
 
Primary and Secondary Cell Line
Primary and Secondary Cell LinePrimary and Secondary Cell Line
Primary and Secondary Cell Line
AniruddhaBanerjee31
 
Chromatin modulation and role in gene regulation
Chromatin modulation and role in gene regulationChromatin modulation and role in gene regulation
Chromatin modulation and role in gene regulation
Zain Khadim
 
Genome analysis
Genome analysisGenome analysis
Genomic imprinting
Genomic imprintingGenomic imprinting
Genomic imprinting
muzamil ahmad
 
Senescence and immortalization
Senescence and immortalizationSenescence and immortalization
Senescence and immortalization
bharathichellam
 

What's hot (20)

Fasta
FastaFasta
Fasta
 
Cellular totipotency in plants
Cellular totipotency in plantsCellular totipotency in plants
Cellular totipotency in plants
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
Nucleotide excision repair
Nucleotide excision repairNucleotide excision repair
Nucleotide excision repair
 
co and post translation modification
co and post translation modificationco and post translation modification
co and post translation modification
 
Stem cell culture, its application
Stem cell culture, its applicationStem cell culture, its application
Stem cell culture, its application
 
Vulva development - C.elegans
Vulva development - C.elegansVulva development - C.elegans
Vulva development - C.elegans
 
Bioinformatics role in Pharmaceutical industries
Bioinformatics role in Pharmaceutical industriesBioinformatics role in Pharmaceutical industries
Bioinformatics role in Pharmaceutical industries
 
Molecular marker and gene mapping
Molecular marker and gene  mappingMolecular marker and gene  mapping
Molecular marker and gene mapping
 
Markers and reporter genes
Markers and reporter genesMarkers and reporter genes
Markers and reporter genes
 
Cell determination and differentiation
Cell determination and differentiationCell determination and differentiation
Cell determination and differentiation
 
Chromatin remodeling
Chromatin remodelingChromatin remodeling
Chromatin remodeling
 
Applications of Proteomics in plant biotic stress
Applications of Proteomics in plant biotic stressApplications of Proteomics in plant biotic stress
Applications of Proteomics in plant biotic stress
 
Somatic cell hybridization
Somatic cell hybridizationSomatic cell hybridization
Somatic cell hybridization
 
Organ culture- animal tissue culture
Organ culture- animal tissue cultureOrgan culture- animal tissue culture
Organ culture- animal tissue culture
 
Primary and Secondary Cell Line
Primary and Secondary Cell LinePrimary and Secondary Cell Line
Primary and Secondary Cell Line
 
Chromatin modulation and role in gene regulation
Chromatin modulation and role in gene regulationChromatin modulation and role in gene regulation
Chromatin modulation and role in gene regulation
 
Genome analysis
Genome analysisGenome analysis
Genome analysis
 
Genomic imprinting
Genomic imprintingGenomic imprinting
Genomic imprinting
 
Senescence and immortalization
Senescence and immortalizationSenescence and immortalization
Senescence and immortalization
 

Similar to Sequence Analysis.ppt

PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...
PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...
PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...
JIA-MING CHANG
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
Prof. Wim Van Criekinge
 
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx
felicidaddinwoodie
 
bioinfo_6th_20070720
bioinfo_6th_20070720bioinfo_6th_20070720
bioinfo_6th_20070720sesejun
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarity
BITS
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Prof. Wim Van Criekinge
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge
Prof. Wim Van Criekinge
 
Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015
Prof. Wim Van Criekinge
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
Prof. Wim Van Criekinge
 
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleGenetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Ahmed Gad
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
Pritom Chaki
 
Global and local alignment in Bioinformatics
Global and local alignment in BioinformaticsGlobal and local alignment in Bioinformatics
Global and local alignment in Bioinformatics
Mahmudul Alam
 
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docx
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docxwealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docx
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docx
melbruce90096
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
BLAST
BLASTBLAST
Kyle Jensen's MIT Ph.D. Thesis Proposal
Kyle Jensen's MIT Ph.D. Thesis ProposalKyle Jensen's MIT Ph.D. Thesis Proposal
Kyle Jensen's MIT Ph.D. Thesis Proposal
Kyle Jensen
 
Msa & rooted/unrooted tree
Msa & rooted/unrooted treeMsa & rooted/unrooted tree
Msa & rooted/unrooted tree
Samiul Ehsan
 

Similar to Sequence Analysis.ppt (20)

PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...
PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...
PSLDoc: Protein subcellular localization prediction based on gapped-dipeptide...
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx
 
bioinfo_6th_20070720
bioinfo_6th_20070720bioinfo_6th_20070720
bioinfo_6th_20070720
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarity
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge
 
Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
 
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleGenetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step Example
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
Global and local alignment in Bioinformatics
Global and local alignment in BioinformaticsGlobal and local alignment in Bioinformatics
Global and local alignment in Bioinformatics
 
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docx
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docxwealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docx
wealth age region37 50 M24 88 U14 64 A13 63 U13 66 .docx
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
Similarity
SimilaritySimilarity
Similarity
 
Chicago stats talk
Chicago stats talkChicago stats talk
Chicago stats talk
 
BLAST
BLASTBLAST
BLAST
 
Kyle Jensen's MIT Ph.D. Thesis Proposal
Kyle Jensen's MIT Ph.D. Thesis ProposalKyle Jensen's MIT Ph.D. Thesis Proposal
Kyle Jensen's MIT Ph.D. Thesis Proposal
 
Msa & rooted/unrooted tree
Msa & rooted/unrooted treeMsa & rooted/unrooted tree
Msa & rooted/unrooted tree
 

Recently uploaded

FAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS imagesFAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS images
Alex Henderson
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
SAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesSAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniques
rodneykiptoo8
 
Plant Biotechnology undergraduates note.pptx
Plant Biotechnology undergraduates note.pptxPlant Biotechnology undergraduates note.pptx
Plant Biotechnology undergraduates note.pptx
yusufzako14
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxGLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
SultanMuhammadGhauri
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 

Recently uploaded (20)

FAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS imagesFAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS images
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
SAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesSAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniques
 
Plant Biotechnology undergraduates note.pptx
Plant Biotechnology undergraduates note.pptxPlant Biotechnology undergraduates note.pptx
Plant Biotechnology undergraduates note.pptx
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxGLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 

Sequence Analysis.ppt

  • 1. SEQUENCEANALYSIS Dr. Gobind Ram Assistant Professor P.G. Department of Biotechnology Lyallpur Khalsa College, Jalandhar
  • 2. Why Bioinformatics is Important? • Applications areas include – Medicine – Pharmaceutical drug design – Toxicology – Molecular evolution – Biological computing models Genomics Molecular evolution Biophysics Molecular biology computer science Bioinformatics Mathematics
  • 3. ctgccgatagc MKLVDDYTR o i d1 1 1 s e Where do the data come from? literature Information
  • 4. Sequence alignment Alignment: Comparing two (pairwise) or more (multiple) sequences. Searching for a series of identical or similar characters in the sequences. -Similarity : Same Physicochemical properties. - Identity :- Identical MVNLTSDEKTAVLALWNKVDVEDCGGE || || ||||| ||| || || || MVHLTPEEKTAVNALWGKVNVDAVGGE
  • 5. Sequence alignment-why??? • The basis for comparison of proteins and genes using the similarity of their sequences is that the the proteins or genes are related by evolution; they have a common ancestor. • Random mutations in the sequences accumulate over time, so that proteins or genes that have a common ancestor far back in time are not as similar as proteins or genes that diverged from each other more recently.
  • 6. Alignment • A way of arranging the objects or alphabets to find out the similarity and difference existing between them. • In case of bioinformatics, it is the arrangement of sequence (DNA,RNA or protein) to find out the regions of similarity and difference by virtue of which homology can be predicted.
  • 7.
  • 8. ALIGNMENT Local alignment Global alignment Pairwise sequence alignment Multiple sequence alignment
  • 9. Why perform to pair wise sequence alignment? Finding homology between two sequences Example : Protein prediction(Sequence or Structure). similar sequence (or structure) similar function
  • 10. Local Vs. Global • Global alignment compares through out the sequence and gives best overall alignment but may fail to find out the local region of similarity among sequence which exactly contain the domain and motif information. • Local alignment find regions of ungapped sequence with high level of similarity. Best for finding the motif although two sequences are different.
  • 11. Local alignment – finds regions of high similarity in parts of the sequences Global alignment – finds the best alignment across the entire two sequences Local vs. Global
  • 12. Three types of nucleotide changes: 1. Substitution – a replacement of one (or more) sequence characters by another: 2. Insertion - an insertion of one (or more) sequence characters: 3. Deletion – a deletion of one (or more) sequence characters: T A Evolutionary changes in sequences Insertion + Deletion  Indel AAGA AACA  AAG GA A A
  • 13. Choosing an alignment: • Many different alignments between two sequences are possible: AAGCTGAATTCGAA AGGCTCATTTCTGA A-AGCTGAATTC--GAA AG-GCTCA-TTTCTGA- How one can determine which is the best alignment? AAGCTGAATT-C-GAA AGGCT-CATTTCTGA- . . .
  • 14. Exercise • Match: +1 • Mismatch: -2 • Indel: -1 AAGCTGAATT-C-GAA AGGCT-CATTTCTGA- A-AGCTGAATTC--GAA AG-GCTCA-TTTCTGA- Compute the scores of each of the following alignments Scoring scheme: -2 -2 -2 1 -2 -2 1 -2 -2 1 -2 -2 1 -2 -2 -2 A C G T A C G T Substitution matrix Gap penalty (opening = extending)
  • 15. Open Reading Frames(ORFs) •6 possible ORFs –frames 1,2,and 3 in 5’ to 3’direction –frames 1,2, and 3 in 5’ to 3’ direction of complimentary strand. The different reading frames give entirely different proteins. Each gene uses a single reading frame, so once the ribosome gets started, it just has to count off groups of 3 bases to produce the proper protein.
  • 16. PAM matrices • Family of matrices PAM 80, PAM 120, PAM 250, … • The number with a PAM matrix (the n in PAMn) represents the evolutionary distance between the sequences on which the matrix is based • The (ith,jth) cell in a PAMn matrix denotes the probability that amino-acid i will be replaced by amino-acid j in time n: Pi→j,n . • Greater n numbers denote greater distances
  • 17. BLOSUM matrices • Different BLOSUMn matrices are calculated independently from BLOCKS (ungapped, manually created local alignments) • BLOSUMn is based on a cluster of BLOCKS of sequences that share at least n percent identity • The (ith,jth) cell in a BLOSUM matrix denotes the log of odds of the observed frequency and expected frequency of amino acids i and j in the same position in the data: log(Pij/qi*qj) • Higher n numbers denote higher identity between the sequences on which the matrix is based
  • 18. BLAST (Basic Local Alignment Search Tool) • The BLAST program was designed by Eugene Myers, Stephen Altschul, Warren Gish, David J. Lipman and Webb Miller at the NIH and was published in J. Mol. Biol. in 1990. • OBJECTIVE: Find high scoring ungapped segment among related sequences • Most widely used bioinformatics programs as the algorithm emphasizes speed over sensitivity.
  • 19. • An algorithm for comparing primary biological sequence information to find out the similarity existing between these two. • Emphasizes on regions of local alignment to detect relationship among sequences which shares only isolated regions of similarity. • Not only a tool for visualizing alignment but also give a view to compare structure and function.
  • 20. Steps for BLAST  Searches for exact matches of a small fixed length between query sequence in the database called Seed.  BLAST tries to extend the match in both direction starting at the seed ungapped alignment occur---- High Scoring Segment Pair (HSP).  The highest scored HSP’s are presented as final report. They are called Maximum Scoring Pairing
  • 21. BLAST performs a gapped alignment between query sequence and database sequence using a variation of Smith- Watermann Algorithm statistically significant alignments are then displayed to user
  • 22. BLAST PROGRAMS • BLASTP: protein query sequence against a protein database, allowing for gaps. • BLASTN: DNA query sequence against a DNA database, allowing for gaps. • BLASTX: DNA query sequence, translated into all six reading frames, against a protein database, allowing for gaps. • TBLASTN: protein query sequence against a DNA database, translated into all six reading frames, allowing for gaps. • TBLASTX: DNA query sequence, translated into all six reading frames, against a DNA database, translated into all six reading frames (No gaps allowed)
  • 23. PSI-BLAST (position-specific scoring matrix) • Used to find distant relatives of a protein. • First, a list of all closely related proteins is created. These proteins are combined into a general "profile" sequence. • Now this profile used as a query and again the search performed to get the more distantly related sequence. • PSI-BLAST is much more sensitive in picking up distant evolutionary relationships than a standard protein-protein BLAST.
  • 25. Matrix • A key element in evaluating the quality of a pairwise sequence alignment is the "substitution matrix", which assigns a score for aligning any possible pair of residues. • BLAST includes BLOSUM & PAM matrix.
  • 26. BLOSUM62 Scoring Matrix One-Letter Code for Amino Acid Alphabet (L = 20) ACDEFGHIKLMNPQRSTVWY S Henikoff & JG Henikoff (1993) Proteins 17:49 A C D E F G H I K L M N P Q R S T V W Y A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2 C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2 P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3 Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2 S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2 T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2 V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2 Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7 A C D E F G H I K L M N P Q R S T V W Y A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2 C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2 P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3 Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2 S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2 T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2 V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2 Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7 A C D E F G H I K L M N P Q R S T V W Y A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2 C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2 P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3 Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2 S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2 T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2 V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2 Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7 A C D E F G H I K L M N P Q R S T V W Y A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2 C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2 P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3 Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2 S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2 T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2 V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2 Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7   , log ab a b q X a b p p  Log-odds Score
  • 27. BLOSUM62 Scoring Matrix One-Letter Code for Amino Acid Alphabet (L = 20) ACDEFGHIKLMNPQRSTVWY A C D E F G H I K L M N P Q R S T V W Y A 4 0 -2 -1 0 -2 -1 -1 -1 -1 -2 -2 -1 -1 -1 1 0 0 -3 -2 C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 E -1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 F 0 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 G -2 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 H -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 M -2 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2 P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3 Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2 S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2 T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2 V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2 Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
  • 28. The Score Matrix ACDEFGH HICDYGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 ACDEFGH HICDYGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH Gaps Similarity Identity   , i j X A B ACDEFGH HICDYGH A B A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8
  • 29. Paths in the Score Matrix -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH A C D E F G H H -2 -3 -1 0 -3 -2 8 I -1 -1 -3 -3 0 -4 -3 C 0 9 -3 -4 -2 -3 -3 D -2 -3 6 2 -3 -1 -1 Y -2 -2 -3 -2 3 -3 2 G 0 -3 -1 -2 -3 6 -2 H -2 -3 -1 0 -3 -2 8 -ACDEFGH HICD-YGH Deletion Insertion Matches O T Alignments are in a one- to-one correspondence with score matrix paths.
  • 30. Low Complexity Regions • Amino acid or DNA sequence regions that offer very low information due to their highly biased content – histidine-rich domains in amino acids – poly-A tails in DNA sequences – poly-G tails in nucleotides – runs of purines – runs of pyrimidines – runs of a single amino acid, etc.
  • 31. E-value • Depends on database size • Indicates probability of a database match expected as result of random chance • Lower E-value, more significant sequence, less likely Db result of random chance
  • 32. E=m x n x p E=E-value m=total no. of residues in Database n=no. of residues in query sequence p= probability that high scoring pair is result of random chance
  • 33. • E-value 0.01 and 10-50 Homology • E-value 0.01 and 10 not significant to remote homology • E-value>10 distantly related
  • 34. Bit Score • Measure sequence similarity which is independent of query sequence length and database size but based on Raw Pairwise Alignment • High bit score , high significantly match • S’ (λ S-lnk)/ln2 S’=bit score λ =grumble distributation constt. K=constt.associated with scoring matrix (λ and k are two statistical parameters)
  • 35. Low Complexity Regions (LCR) Masking: (I) Hard masking (II) Soft Masking Program for Masking (i) SEG :high frequency region declared LCR (ii) RepeatMasker: score for a sequence region above certain threshold region declared LCR. Residue masked with N’s and X’s
  • 36. Mask repetitive sequences MNPQQQQQQRST = MNPXXXXXXRST X will not match anything in the database. It does preserve position, however.
  • 37. BLAST result page • BLAST result page divided into 3 parts. • Part1 contains the information regarding version, database used, reference and length of the query sequence. • Part-2 is the conserved regions and graphical representation of the alignment where each line represents the alignment of query sequence with one database sequence. • It shows the result in 5 different color depending upon the bit score. • Part-3 contains the list of database sequence having similarity obtained while database search and detail view of alignment along with bitscore, e-value, identities, positives and gaps.
  • 41.
  • 42. BLAST Preferred • BLAST uses substitution matrix to find matching while FASTA identifies identical matching words using hashing procedure. By default FASTA scans smaller window sizes .Thus it gives more sensitive results than BLAST with better coverage rates of homologs but usually slower than BLAST
  • 43. • BLAST use low complexity masking means it may have higher specificity than FASTA therefore false positives are reduced • BLAST sometimes give multiple best scoring alignments from the same sequence, FASTA returns only one final alignment
  • 44. REFRENCES  Jin Xiong(2006). Essential Bioinformatics. Cambridge University Press. Mount D. W. (2004). Bioinformatics & Genome Analysis. Cold Spring Harbor Laboratory Press. URL:- WWW.ncbi.nlm.nih.gov