字典帮助Julia - 从文本文件创建字典 [英] Dictionary help in Julia - creating dictionary from text file
问题描述
我试图从Julia中的文本文件的内容创建一个库,用于生物信息学问题。该文件格式如下:
UUU F CUU L AUU I GUU V
UUC F CUC L AUC I GUC V
...
我想制作一个字母,其中的关键是3个字母部分(密码子),条目是单字母部分(氨基酸)。我可以在每个匹配中使用grep:
取出正确的组件(r([AUGC] {3 ,3})\s([AZ]),文件)
密码子,aa = m.captures
如果我在此循环中打印密码子
和 aa
,我得到正确的输出(全部密码子,所有的aa),但我不知道如何把它放入字典。如果我这样做: codons = {codon => aa}
在循环结束时,我最终只能包含最后一个条目。
我确定语法是一件非常明显的事情,但我是一名生物学家,而不是一名程序员,所以我读了文档没有让我在任何地方。它说:
给定一个字典D,语法D [x]返回键x的值(如果存在)或抛出一个错误,D [x] = y存储D中的键值对x => y(替换关键字x的任何现有值)。
但是我在循环结束时尝试了密码子[codon] = aa
(我用密码子= {}
在循环之前),但是我收到错误:
没有方法setindex!(Array {Any, 1},SubString {UTF8String},SubString {UTF8String})
在In [35]:5
在匿名的无文件:4
任何帮助将不胜感激。
编辑:显然,我没有正确启动字典。如果我做密码子= {blah=> blahblah}
开始时,循环工作并正确填写。所以一个修改后的问题:你如何启动空库?
EDIT2:最小不工作的例子:
file = open(readall,rna_codons.txt)
pre>
密码子= {}
在每个匹配中的m(r([AUGC] {3,3})\\ \\ s([AZ]),文件)
密码子,aa = m.capture
密码子[codon] = aa
end
解决方案只是为了总结一个最小工作示例(MWE),将您的格式化文本文件读入Julia Dict ...
file = open(readall,rna_codons.txt)
密码子= Dict()
for each match r([AUGC] {3,3})\s([AZ]),文件)
密码子,aa = m.capture
密码子[codon] = aa
end
注意:如果文件非常大,可能会有更快的方式生成您的
字典
。
编辑
明显的文本文件格式,这里是另一种创建您的
Dict
的方法。我没有测试确定任何性能损失/收益。condon_array = open(readdlm,rna_codons.txt)
condons = Dict {ASCIIString,ASCIIString}(condon_array [:,1:2:end] [:],condon_array [:,2:2:end] [:])
注意:如果您使用它,请更好地查看它的正确性。
I'm attempting to create a library from the contents of a text file in Julia for use in a bioinformatics problem. The file is formatted like this:
UUU F CUU L AUU I GUU V UUC F CUC L AUC I GUC V ...
I want to make a dictionary where the key is the 3 letter part (the codon), and the entry is the one letter part (the amino acid). I'm able to pull out the right components with grep:
for m in eachmatch(r"([AUGC]{3,3})\s([A-Z])", file) codon, aa = m.captures
If I print
codon
andaa
in this loop, I get out the correct output (all the codon's, all the aa's) but I can't figure out how to put it into a dictionary. If I do:codons = {codon => aa}
at the end of the loop, I end up with a dictionary that only contains the last entry.I'm sure the syntax is something really obvious, but I'm a biologist, not a programmer, so my reading of the documentation isn't getting me anywhere. It says:
Given a dictionary D, the syntax D[x] returns the value of key x (if it exists) or throws an error, and D[x] = y stores the key-value pair x => y in D (replacing any existing value for the key x).
But I tried
codons[codon] = aa
at the end of the loop (I initiated the dictionary withcodons = {}
before the loop), but I get the error:no method setindex!(Array{Any,1},SubString{UTF8String},SubString{UTF8String}) at In[35]:5 in anonymous at no file:4
Any help would be greatly appreciated.
EDIT: Evidently, I'm not initiating the dictionary correctly. If I do
codons = {"blah" => "blahblah"}
at the beginning, the loop works and fills in correctly. So a modified question: how do you initiate empty libraries?EDIT2: Minimal not working example:
file = open(readall, "rna_codons.txt") codons = {} for m in eachmatch(r"([AUGC]{3,3})\s([A-Z])", file) codon, aa = m.capture codons[codon] = aa end
解决方案Just to summarize a Minimal Working Example (MWE) for your case of reading your formatted text file into a Julia Dict...
file = open(readall, "rna_codons.txt") codons = Dict() for m in eachmatch(r"([AUGC]{3,3})\s([A-Z])", file) codon, aa = m.capture codons[codon] = aa end
N.B.: If the file is very large, there is likely a faster way of generating your
Dict
.EDIT
Given your apparent text file format, here's another way to create your
Dict
. I made no tests to determine any performance loss/gain.condon_array = open(readdlm, "rna_codons.txt") condons = Dict{ASCIIString,ASCIIString}(condon_array[:,1:2:end][:],condon_array[:,2:2:end][:])
N.B.: If you use it, better check it for correctness.
这篇关于字典帮助Julia - 从文本文件创建字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!