使用JavaScript将核苷酸转换为氨基酸 [英] Converting nucleotides to amino acids using JavaScript

查看:150
本文介绍了使用JavaScript将核苷酸转换为氨基酸的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个Chrome扩展程序,将一串长度为 nlen 的核苷酸转换为相应的氨基酸。



我之前在Python中做过类似的事情,但由于我还是Java的新手,我很难将同样的逻辑从Python转换为JavaScript。我到目前为止的代码如下:

  function translateInput(n_seq){
//要翻译的代码去这里

//输入核苷酸序列的长度
var nlen = n_seq.length

//最初声明空氨基酸字符串
var aa_seq =

//迭代三个字符/核苷酸的每个块
//以匹配正确的密码子
for(var i = 0; i< nlen; i ++ ){




aa_seq.concat(密码子)
}

//返回最终的氨基酸串
返回aa_seq
}

我知道我想迭代三个字符一段时间,将它们与正确的氨基酸匹配,然后将氨基酸连续连接到氨基酸的输出串(aa_seq),一旦循环完成就返回该字符串。



我也试过创建一个密码子与氨基酸关系的字典,并想知道是否有办法使用类似的东西作为工具将三个字符密码子与各自的氨基酸相匹配酸:

  codon_dictionary = {
A:[GCA,GCC,GCG, GCT],
C:[TGC,TGT],
D:[GAC,GAT],
E:[ GAA,GAG],
F:[TTC,TTT],
G:[GGA,GGC,GGG,GGT ],
H:[CAC,CAT],
I:[ATA,ATC,ATT],
K :[AAA,AAG],
L:[CTA,CTC,CTG,CTT,TTA,TTG],
M:[ATG],
N:[AAC,AAT],
P:[CCA,CCC,CCG,CCT ],
Q:[CAA,CAG],
R:[AGA,AGG,CGA,CGC,CGG, CGT],
S:[AGC,AGT,TCA,TCC,TCG,TCT],
T:[ACA ,ACC,ACG,ACT],
V:[GTA,GTC,GTG,GTT],
W:[ TGG],
Y:[TAC,TAT],
};

编辑:
核苷酸输入字符串的一个例子是AAGCATAGAAATCGAGGG,相应的输出字符串KHRNRG。希望这会有所帮助!

解决方案

意见



<我个人推荐的第一件事就是建立一个从3-char密码子到氨基酸的字典。这将允许您的程序获取几个密码子串链并将它们转换为氨基串,而不必每次都进行昂贵的深度查找。字典将起到这样的作用

  codonDict ['GCA'] //'A'
codonDict ['TGC '] //'C'
//等等

从那里,我实现了两个实用程序函数: slide slideStr 。这些并不是特别重要,所以我只用一些输入和输出的例子来介绍它们。

  slide(2 ,1)([1,2,3,4])
// [[1,2],[2,3],[3,4]]

slide(2 ,2)([1,2,3,4])
// [[1,2],[3,4]]

slideStr(2,1)('abcd ')
// ['ab','bc','cd']

slideStr(2,2)('abcd')
// ['ab' ,'cd']

使用我们可以使用的反向字典和通用实用程序函数,编写 codon2amino 是轻而易举的

  // codon2amino :: String  - > String 
const codon2amino = str =>
slideStr(3,3)(str)
.map(c => codonDict [c])
.join('')






Runnable demo



为了澄清,我们基于 aminoDict 一次构建 codonDict ,然后重新将它用于每个密码子到氨基酸的计算。



  //你的原创数据重命名为aminoDictconst aminoDict = {'A':['GCA','GCC','GCG','GCT'],'C':['TGC','TGT'],'D':['GAC ','GAT'],'E':['GAA','GAG'],'F':['TTC','TTT'],'G':['GGA','GGC','GGG ','GGT'],'H':['CAC','CAT'],'我':['ATA','ATC','ATT'],'K':['AAA','AAG '','L':['CTA','CTC','CTG','CTT','TTA','TTG'], 'M':['ATG'],'N':['AAC','AAT'],'P':['CCA','CCC','CCG','CCT'],'Q': ['CAA','CAG'],'R':['AGA','AGG','CGA','CGC','CGG','CGT'],'S':['AGC',' AGT','TCA','TCC','TCG','TCT'],'T':['ACA','ACC','ACG','ACT'],'V':['GTA' ,'GTC','GTG','GTT'],'W':['TGG'],'Y':['TAC','TAT']}; //密码字典源自aminoDictconst codonDict = Object。 keys(aminoDict).reduce((dict,a)=> Object.assign(dict,... aminoDict [a] .map(c =>({[c]:a}))),{})// slide ::(Int,Int) - > [a]  - > [[a]] const slide =(n,m)=> xs => {if(n> xs.length)return [] else return [xs.slice(0,n),... slide(n,m)(xs.slice(m))]} // slideStr ::( Int,Int) - >字符串 - > [String] const slideStr =(n,m)=> str => slide(n,m)(Array.from(str))。map(s => s.join(''))// codon2amino :: String  - > Stringconst codon2amino = str => slideStr(3,3)(str).map(c => codonDict [c])。join('')console.log(codon2amino('AAGCATAGAAATCGAGGG'))// KHRNRG  






进一步说明


你能澄清一些这些变量应该代表什么吗? (n,m,xs,c等)


我们的幻灯片功能给我们一个数组的滑动窗口。它期望窗口的两个参数 - n 窗口大小, m 步长 - 以及一个参数通过迭代的项目数组 - xs ,可以读作 x ,或复数 x ,如 x 项目的集合



slide 是故意通用的,因为它可以在任何 iterable xs 上工作。这意味着它可以使用Array,String或其他任何实现 Symbol.iterator 的方法。这也是为什么我们使用像 xs 这样的通用名称的原因,因为将它命名为特定信息让我们认为它只适用于特定类型



其他内容,例如 .map中的变量 c (c => codonDict [c])并不是特别重要 - 我为密码命名为 c ,但我们可以将其命名为 x foo ,没关系。理解 c 的技巧是理解 .map

  [1,2,3,4,5] .map(c => f(c))
// [f(1),f( 2),f(3),f(4),f(5)]

所以真的全部我们在这里做的是拿一个数组( [1 2 3 4 5] )并创建一个新的数组,我们称之为 f



现在,当我们查看 .map(c => codonDict [c])我们知道我们所做的就是在每个元素的 codonDict 中查找 c

  const codon2amino = str => 
slideStr(3,3)(str)// ['AAG','CAT','AGA','AAT',...]
.map(c => codonDict [c ])// [codonDict ['AAG'],codonDict ['CAT'],codonDict ['AGA'],codonDict ['AAT'],...]
.join('')//' KHRN ...'




此外,这些'const'项目是否能够基本上替换我原来的 translateInput()函数?


如果你是不熟悉ES6(ES2015),上面使用的一些语法可能看起来很陌生。

  //使用传统函数的foo语法
函数foo(x){return x + 1}

// foo作为箭头函数
const foo = x => x + 1

所以简而言之,是的, codon2amino translateInput 的确切替代品,只是使用 const 绑定和箭头函数定义的。我选择 codon2amino 作为名称,因为它更好地描述了该函数的操作 - translateInput 没有说明它是哪种方式翻译(A到B,或B到A?),input在这里是一种无意义的描述符,因为所有函数都可以输入。



您看到其他 const 声明的原因是因为我们将您的函数工作分成多个函数。造成这种情况的原因大多超出了这个答案的范围,但简短的解释是,承担多项任务责任的一个专门职能对我们来说没有比可以合理方式组合/重用的多个通用函数更有用。 。



当然, codon2amino 需要查看输入字符串中的每个3个字母的序列,但这不是意味着我们必须在 codon2amino 函数内编写字符串拆分代码。我们可以编写一个通用的字符串拆分函数,就像我们用 slideStr 一样,这对任何希望通过字符串序列进行迭代然后得到 codon2amino的函数都很有用。 函数使用它 - 如果我们在 codon2amino 中封装了所有字符串分割代码,下次我们需要通过字符串序列进行迭代时,我们我必须复制该部分代码。






所有这一切......


在保留原始for循环结构的同时,有什么方法可以做到这一点吗?


我真的认为你应该花一些时间来通过上面的代码来看看它是如何工作的。如果您还没有看到以这种方式分离的程序问题,那么可以从中学到很多宝贵的经验教训。



当然,这不是解决问题的唯一方法。我们可以使用原始进行循环。对我来说,考虑创建迭代器 i 并手动递增 i ++ i是更多的精神开销+ = 3 ,确保检查 i< str.length ,重新分配返回值结果+ =某事等 - 添加更多变量,你的大脑很快变成汤。



  function makeCodonDict(aminoDict){let result = {} for(let k of Object .keys(aminoDict))for(let a of aminoDict [k])result [a] = k return result} function translateInput(dict,str){let result =''for(let i = 0; i< str。 length; i + = 3)result + = dict [str.substr(i,3)] return result} const aminoDict = {'A':['GCA','GCC','GCG','GCT'], 'C':['TGC','TGT'],'D':['GAC','GAT'],'E':['GAA','GAG'],'F':['TTC' ,'TTT'],'G':['GGA','GGC','GGG','GGT'],'H':['CAC','CAT'],'I':['ATA' 'ATC', 'ATT'], K':['AAA','AAG'],'L':['CTA','CTC','CTG','CTT','TTA','TTG'],'M':['ATG '','N':['AAC','AAT'],'P':['CCA','CCC','CCG','CCT'],'Q':['CAA','CAG '','R':['AGA','AGG','CGA','CGC','CGG','CGT'],'S':['AGC','AGT','TCA', 'TCC','TCG','TCT'],'T':['ACA','ACC','ACG','ACT'],'V':['GTA','GTC','GTG ',''GTT'],'W':['TGG'],'Y':['TAC','TAT']}; const codonDict = makeCodonDict(aminoDict)const codons ='AAGCATAGAAATCGAGGG'const aminos = translateInput( codonDict,密码子)console.log(aminos)// KHRNRG  


I'm creating a Chrome Extension that converts a string of nucleotides of length nlen into the corresponding amino acids.

I've done something similar to this before in Python but as I'm still very new to JavaScript I'm struggling to translate that same logic from Python to JavaScript. The code I have so far is the below:

function translateInput(n_seq) {
  // code to translate goes here

  // length of input nucleotide sequence
  var nlen = n_seq.length

  // declare initially empty amino acids string
  var aa_seq = ""

  // iterate over each chunk of three characters/nucleotides
  // to match it with the correct codon
  for (var i = 0; i < nlen; i++) {




      aa_seq.concat(codon)
  }

  // return final string of amino acids   
  return aa_seq
}

I know that I want to iterate over characters three at a time, match them to the correct amino acid, and then continuously concatenate that amino acid to the output string of amino acids (aa_seq), returning that string once the loop is complete.

I also tried creating a dictionary of the codon to amino acid relationships and was wondering if there was a way to use something like that as a tool to match the three character codons to their respective amino acids:

codon_dictionary = { 
 "A": ["GCA","GCC","GCG","GCT"], 
 "C": ["TGC","TGT"], 
 "D": ["GAC", "GAT"],
 "E": ["GAA","GAG"],
 "F": ["TTC","TTT"],
 "G": ["GGA","GGC","GGG","GGT"],
 "H": ["CAC","CAT"],
 "I": ["ATA","ATC","ATT"],
 "K": ["AAA","AAG"],
 "L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
 "M": ["ATG"],
 "N": ["AAC","AAT"],
 "P": ["CCA","CCC","CCG","CCT"],
 "Q": ["CAA","CAG"],
 "R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
 "S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
 "T": ["ACA","ACC","ACG","ACT"],
 "V": ["GTA","GTC","GTG","GTT"],
 "W": ["TGG"],
 "Y": ["TAC","TAT"],
};

EDIT: An example of an input string of nucleotides would be "AAGCATAGAAATCGAGGG", with the corresponding output string "KHRNRG". Hope this helps!

解决方案

Opinion

The first thing I would personally recommend is to build a dictionary that goes from 3-char codon to amino. This will allow your program to take several chains of codon strings and convert them to amino strings without having to do expensive deep lookups every time. The dictionary will work something like this

codonDict['GCA'] // 'A'
codonDict['TGC'] // 'C'
// etc

From there, I implemented two utility functions: slide and slideStr. These aren't particularly important, so I'll just cover them with a couple examples of input and output.

slide (2,1) ([1,2,3,4])
// [[1,2], [2,3], [3,4]]

slide (2,2) ([1,2,3,4])
// [[1,2], [3,4]]

slideStr (2,1) ('abcd')
// ['ab', 'bc', 'cd']

slideStr (2,2) ('abcd')
// ['ab', 'cd']

With the reverse dictionary and generic utility functions at our disposal, writing codon2amino is a breeze

// codon2amino :: String -> String
const codon2amino = str =>
  slideStr(3,3)(str)
    .map(c => codonDict[c])
    .join('')


Runnable demo

To clarify, we build codonDict based on aminoDict once, and re-use it for every codon-to-amino computation.

// your original data renamed to aminoDict
const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };

// codon dictionary derived from aminoDict
const codonDict =
 Object.keys(aminoDict).reduce((dict, a) =>
   Object.assign(dict, ...aminoDict[a].map(c => ({[c]: a}))), {})

// slide :: (Int, Int) -> [a] -> [[a]]
const slide = (n,m) => xs => {
  if (n > xs.length)
    return []
  else
    return [xs.slice(0,n), ...slide(n,m) (xs.slice(m))]
}

// slideStr :: (Int, Int) -> String -> [String]
const slideStr = (n,m) => str =>
  slide(n,m) (Array.from(str)) .map(s => s.join(''))

// codon2amino :: String -> String
const codon2amino = str =>
  slideStr(3,3)(str)
    .map(c => codonDict[c])
    .join('')

console.log(codon2amino('AAGCATAGAAATCGAGGG'))
// KHRNRG


Further explanation

can you clarify what some of these variables are supposed to represent? (n, m, xs, c, etc)

Our slide function gives us a sliding window over an array. It expects two parameters for the window – n the window size, and m the step size – and one parameter that is the array of items to iterate thru – xs, which can be read as x's, or plural x, as in a collection of x items

slide is purposefully generic in that it can work on any iterable xs. That means it can work with an Array, a String, or anything else that implements Symbol.iterator. That's also why we use a generic name like xs because naming it something specific pigeonholes us into thinking it can only work with a specific type

Other things like the variable c in .map(c => codonDict[c]) are not particularly important – I named it c for codon, but we could've named it x or foo, it doesn't matter. The "trick" to understanding c is to understand .map.

[1,2,3,4,5].map(c => f(c))
// [f(1), f(2), f(3), f(4), f(5)]

So really all we're doing here is taking an array ([1 2 3 4 5]) and making a new array where we call f for each element in the original array

Now when we look at .map(c => codonDict[c]) we understand that all we're doing is looking up c in codonDict for each element

const codon2amino = str =>
  slideStr(3,3)(str)          // [ 'AAG', 'CAT', 'AGA', 'AAT', ...]
    .map(c => codonDict[c])   // [ codonDict['AAG'], codonDict['CAT'], codonDict['AGA'], codonDict['AAT'], ...]
    .join('')                 // 'KHRN...'

Also, are these 'const' items able to essentially replace my original translateInput() function?

If you're not familiar with ES6 (ES2015), some of the syntaxes used above might seem foreign to you.

// foo using traditional function syntax
function foo (x) { return x + 1 }

// foo as an arrow function
const foo = x => x + 1

So in short, yes, codon2amino is the exact replacement for your translateInput, just defined using a const binding and an arrow function. I chose codon2amino as a name because it better describes the operation of the function – translateInput doesn't say which way it's translating (A to B, or B to A?), and "input" is sort of a senseless descriptor here because all functions can take input.

The reason you're seeing other const declarations is because we're splitting up the work of your function into multiple functions. The reasons for this are mostly beyond the scope of this answer, but the brief explanation is that one specialized function that takes on the responsibility of several tasks is less useful to us than multiple generic functions that can be combined/re-used in sensible ways.

Sure, codon2amino needs look at each 3-letter sequence in the input string, but that doesn't mean we have to write the string-splitting code inside of the codon2amino function. We can write a generic string splitting function like we did with slideStr which is useful to any function that wants to iterate thru string sequences and then have our codon2amino function use it – if we encapsulated all of that string-splitting code inside of codon2amino, the next time we needed to iterate thru string sequences, we'd have to duplicate that portion of the code.


All that said..

Is there any way I can do this while keeping my original for loop structure?

I really think you should spend some time stepping thru the code above to see how it works. There's a lot of valuable lessons to learn there if you haven't yet seen program concerns separated in this way.

Of course that's not the only way to solve your problem tho. We can use a primitive for loop. For me it's more mental overhead to thinking about creating iterators i and manually incrementing i++ or i += 3, making sure to check i < str.length, reassignment of the return value result += something etc – add a couple more variables and your brain quickly turns to soup.

function makeCodonDict (aminoDict) {
  let result = {}
  for (let k of Object.keys(aminoDict))
    for (let a of aminoDict[k])
      result[a] = k
  return result
}

function translateInput (dict, str) {
  let result = ''
  for (let i = 0; i < str.length; i += 3)
    result += dict[str.substr(i,3)]
  return result
}

const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };
const codonDict = makeCodonDict(aminoDict)

const codons = 'AAGCATAGAAATCGAGGG'
const aminos = translateInput(codonDict, codons)
console.log(aminos) // KHRNRG

这篇关于使用JavaScript将核苷酸转换为氨基酸的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆