编译和迭代字典 [英] compiling and iterating over a dictionary

查看:113
本文介绍了编译和迭代字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对python还是很陌生,并且正在从文件构建字典,然后遍历字典.我一直在日食中工作,没有得到任何输出,甚至没有任何警告.

I'm fairly new to python, and am working on building a dictionary from a file, and then iterating over the dictionary. I have been working in eclipse, and am not getting any output, or even any warnings.

输入看起来像这样(实际输入大得多)

The input look like this (actual input significantly larger)

[Term]
id: GO:0000010
name: trans-hexaprenyltranstransferase activity
namespace: molecular_function
def: "Catalysis of the reaction: all-trans-hexaprenyl diphosphate + isopentenyl diphosphate = all-trans-heptaprenyl diphosphate + diphosphate." [KEGG:R05612, RHEA:20839]
subset: gosubset_prok
xref: KEGG:R05612
xref: RHEA:20839
is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups

[Term]
id: GO:0000011
name: vacuole inheritance
namespace: biological_process
def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069]
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance

[Term]
id: GO:0000012
name: single strand break repair
namespace: biological_process
def: "The repair of single strand breaks in DNA. Repair of such breaks is mediated by the same enzyme systems as are used in base excision repair." [http://www.ultranet.com/~jkimball/BiologyPages/D/DNArepair.html]
subset: gosubset_prok
is_a: GO:0006281 ! DNA repair

[Term]
id: GO:0000014
name: single-stranded DNA endodeoxyribonuclease activity
namespace: molecular_function
def: "Catalysis of the hydrolysis of ester linkages within a single-stranded deoxyribonucleic acid molecule by creating internal breaks." [GOC:mah]
synonym: "single-stranded DNA specific endodeoxyribonuclease activity" RELATED []
synonym: "ssDNA-specific endodeoxyribonuclease activity" RELATED [GOC:mah]
is_a: GO:0004520 ! endodeoxyribonuclease activity

我尝试产生的输出将是

GO:0000010     molecular_function
trans-hexaprenyltranstransferase activity
GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups

GO:0000011    biological_process
vacuole inheritance
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance

GO:0000012    biological_process
single strand break repair
is_a: GO:0006281 ! DNA repair

GO:0000014    molecular_function
single-stranded DNA endodeoxyribonuclease activity
is_a: GO:0004520 ! endodeoxyribonuclease activity

我的代码是:

import re

id_to_info = {} #declare dictionary

def parse_record(term):
    go_id = re.findall(r"id:\s(.*?)\n", term, re.DOTALL)
    name = re.findall(r"name:\s(.*?)\n", term, re.DOTALL)
    namespace = re.findall(r"namespace:\s(.*?)\n", term, re.DOTALL)
    is_a = re.findall(r"is_a:\s(.*?)\n", term, re.DOTALL)
    info = namespace + "\n" + name + "\n" + is_a
    id_to_info[go_id] = info
    for go_id, info in id_to_info.interitems():
        print(go_id + "\t" + info)

def split_record(record):
    sp_file = open(record)
    sp_records = sp_file.read()
    sp_split_records = re.findall(r"(\[.*?)\n\n", sp_records, re.DOTALL)
    for sp_record in sp_split_records:
        parse_record(term=sp_record)
    sp_file.close()

split_record(record="go.rtf")

我真的不知道我要去哪里错了,但是我想主要的问题是我的字典通话吗?

I don't really know where I am going wrong, but I am thinking the main issue is my dictionary call?

推荐答案

re.findall 返回找到的内容列表;您的代码采用字符串.由于每行只有一个匹配,因此请在可行的情况下添加 [0] . is_a 可能会变空,因此需要更多的投标处理.

re.findall returns a list of things it found; your code assumes strings. Since you have only one hit per line, just add [0] where feasible. is_a can come back empty, so it needs a little more tender handling.

(键,值)方法也是 items (迭代项),而不是 i n .

Also, the (key, value) method is iteritems (iteration items), not i n teritems.

这是一个更新:

import re

id_to_info = {} #declare dictionary

def parse_record(term):
    go_id = re.findall(r"id:\s(.*?)\n", term, re.DOTALL)[0]
    name = re.findall(r"name:\s(.*?)\n", term, re.DOTALL)[0]
    namespace = re.findall(r"namespace:\s(.*?)\n", term, re.DOTALL)[0]
    is_a = re.findall(r"is_a:\s(.*?)\n", term, re.DOTALL)
    is_a = is_a[0] if is_a else ""
    # print namespace, name, is_a
    info = namespace + "\n" + name + "\n" + is_a
    id_to_info[go_id] = info
    for go_id, info in id_to_info.iteritems():
        print(go_id + "\t" + info)

def split_record(record):
    sp_file = open(record)
    sp_records = sp_file.read()
    sp_split_records = re.findall(r"(\[.*?)\n\n", sp_records, re.DOTALL)
    for sp_record in sp_split_records:
        parse_record(term=sp_record)
    sp_file.close()

split_record(record="go.rtf")

输出:

GO:0000010  molecular_function
trans-hexaprenyltranstransferase activity
GO:0016765 ! transferase activity, transferring alkyl or aryl (other
GO:0000011  biological_process
vacuole inheritance
GO:0007033 ! vacuole organization
GO:0000010  molecular_function
trans-hexaprenyltranstransferase activity
GO:0016765 ! transferase activity, transferring alkyl or aryl (other
GO:0000011  biological_process
vacuole inheritance
GO:0007033 ! vacuole organization
GO:0000010  molecular_function
trans-hexaprenyltranstransferase activity
GO:0016765 ! transferase activity, transferring alkyl or aryl (other
GO:0000012  biological_process
single strand break repair

我将其余格式留给您. :-)

I'll leave the rest of the formatting to you. :-)

这篇关于编译和迭代字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆