GengLee's Blog

😏 做個迷人的混蛋 桀驁而不失鋒範

0%

Calculation of KaKs Value

The concept and meaning of Ka,Ks


Ks (synonymous substitutions): A base mutation in a codon does not change the amino acid encoded by the codon.
Ka (nonsynonymous substitutions): A base mutation in a codon changes the amino acid encoded by that codon.


DN/dS is equivalent to Ka/Ks, and its meaning is as follows:

Neutral Evolution (drift): dN/dS ratio = 1 implies there has been equal numbers of synonymous (dna substitutions that do not affect the protein sequence) and non-synonymous changes (dna substitutions that do affect the protein sequence) during the time between ancestral to the modern versions of the protein.

Positive Selection (adaptive evolution): dN/dS ratio > 1 implies there has been more non-synonymous changes than synonymous changes. There has been evolutionary pressure to escape from the ancestral state - i.e. positive selection pressure. This can occur for example in paralogues that are required to serve a novel function, or in proteins of parasites that need to escape host immune recognition (e.g. changes to avoid MHC-1 binding to evade T-cell attack).

Negative Selection (conservation): dN/dS ratio < 1 implies there has been more synonymous changes than non-synonymous changes. There has been evolutionary pressure to conserve the ancestral state - i.e. negative selection pressure. This can occur for example in orthologues that are required to maintain (conserve) some function encoded in the protein sequence, since changes from this state would lead to disruption of function.

Calculate Ka,Ks


The software KaKs_Calculator is used for calculation, and the input file of the software is an AXT format file at nucleic acid level. The following describes the preparation of axt files and examples of the use of the software.

Suppose there is a protein sequence of the following pair of homologous genes and its corresponding cds sequence.

Protein sequence (example_pep.fas)

>ENSP00000004982
MEIPVPVQPSWLRRASAPLPGLSAPGRLFDQRFGEGLLEAELAALCPTTLAPYYLRAPSVALPVAQVPTD
PGHFSVLLDVKHFSPEEIAVKVVGEHVEVHARHEERPDEHGFVAREFHRRYRLPPGVDPAAVTSALSPEG
VLSIQAAPASAQAPPPAAAK
>ENSMUSP00000039172
MEIPVPVQPSWLRRASAPLPGFSAPGRLFDQRFGEGLLEAELASLCPAAIAPYYLRAPSVALPTAQVSTD
SGYFSVLLDVKHFLPEEISVKVVDDHVEVHARHEERPDEHGFIAREFHRRYRLPPGVDPAAVTSALSPEG
VLSIQATPASAQAQLPSPPAAK

cds sequence(example_cds.fas)

>ENSP00000004982
ATGGAGATCCCTGTGCCTGTGCAGCCGTCTTGGCTGCGCCGCGCCTCGGCCCCGTTGCCCGGACTTTCGG
CGCCCGGACGCCTCTTTGACCAGCGCTTCGGCGAGGGGCTGCTGGAGGCCGAGCTGGCTGCGCTCTGCCC
CACCACGCTCGCCCCCTACTACCTGCGCGCACCCAGCGTGGCGCTGCCCGTCGCCCAGGTGCCGACGGAC
CCCGGCCACTTTTCGGTGCTGCTAGACGTGAAGCACTTCTCGCCGGAGGAAATTGCTGTCAAGGTGGTGG
GCGAACACGTGGAGGTGCACGCGCGCCACGAGGAGCGCCCGGATGAGCACGGATTCGTCGCGCGCGAGTT
CCACCGTCGCTACCGCCTGCCGCCTGGCGTGGATCCGGCTGCCGTGACGTCCGCGCTGTCCCCCGAGGGC
GTCCTGTCCATCCAGGCCGCACCAGCGTCGGCCCAGGCCCCACCGCCAGCCGCAGCCAAGTAG
>ENSMUSP00000039172
ATGGAGATCCCCGTGCCTGTGCAGCCTTCTTGGCTGCGCCGTGCTTCAGCTCCTTTACCAGGTTTCTCTG
CTCCGGGACGCCTCTTTGACCAGCGTTTCGGCGAAGGGCTGCTTGAGGCAGAGCTGGCTTCACTGTGCCC
TGCTGCGATCGCCCCCTACTATCTGCGCGCCCCCAGTGTGGCGTTACCCACAGCCCAGGTGTCCACGGAC
TCTGGGTATTTTTCCGTGCTGCTGGATGTGAAGCACTTCTTGCCAGAGGAAATCTCTGTCAAGGTGGTTG
ACGACCATGTGGAGGTCCATGCTCGGCACGAGGAGCGCCCGGATGAACACGGATTCATTGCTCGAGAGTT
CCACCGCCGATACCGCCTGCCTCCTGGTGTGGACCCTGCTGCTGTGACCTCAGCACTGTCTCCTGAGGGT
GTCCTGTCCATCCAGGCCACACCAGCGTCGGCCCAGGCCCAACTTCCGTCACCACCTGCTGCCAAGTAG

Step 1:

Protein sequence matching (alignment)

mafft --auto example_pep.fas > example_pep_aln.fas

Step 2:

Convert protein sequence matching into nucleic acid sequence matching

perl pal2nal.pl example_pep_aln.fas example_cds.fas -output fasta > example_cds_aln.fas

Step 3:

Transfer nucleic acid sequences to AXT format

python FastaIntoAXT.py example_cds_aln.fas > example_cds_aln.axt

FastaIntoAXT.py script

import sys
def parseFasta(filename):
fas = {}
idlis = []
id = None
with open(filename, 'r') as fh:
for line in fh:
if line[0] == '>':
header = line[1:].rstrip()
id = header.split()[0]
idlis.append(id)
fas[id] = []
else:
fas[id].append(line.rstrip())
for id, seq in fas.iteritems():
fas[id] = ''.join(seq)
return fas, idlis
ALN, IDlis = parseFasta(sys.argv[1])
outid = "-".join(IDlis)
outseq = "\n".join([ALN[IDlis[0]],ALN[IDlis[1]]])
print ">" + outid
print outseq

Step 4:

Calculate Ka,Ks

``
Sequence Method Ka Ks Ka/Ks P-Value(Fisher) Length S-Sites N-Sites Fold-Sites(0:2:4) Substitutions S-Substitutions N-Substitutions Fold-S-Substitutions(0:2:4) Fold-N-Substitutions(0:2:4) Divergence-Time Substitution-Rate-Ratio(rTC:rAG:rTA:rCG:rTG:rCA/rCA) GC(1:2:3) ML-Score AICc Akaike-Weight Model
ENSP00000004982-ENSMUSP00000039172 YN 0.0571682 0.669484 0.0853913 6.30263e-22 480 150.779 329.221 NA82 64 18 NA NA 0.249511 6.00228:6.00228:1:1:1:1 0.659465(0.759259:0.503086:0.716049) NA NA NANA

-------------The End-------------
Buy Me A Coffee