如何从滑行中获取分类学等级名称? [英] How can I get taxonomic rank names from taxid?

查看:90
本文介绍了如何从滑行中获取分类学等级名称?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与以下内容有关:

This question is related to: How to get taxonomic specific ids for kingdom, phylum, class, order, family, genus and species from taxid?

那里给出的解决方案是可行的,但是我想为定义等级的每个分类ID提供名称.我已经在ete3上找到了它,它可以完成这项工作:

The solution given there works but I would like to have the names for each taxonomic ids for defined ranks. I have found this on ete3 which can do the job:

names = ncbi.get_taxid_translator(lineage)
print [names[taxid] for taxid in lineage]

但不是Python程序员,我无法将其合并到以上链接中给出的代码中.这是我尝试过的:

but not being python programmer, I am failing to incorporate this into the code given in the link above. Here is what I have tried:

import csv
from ete3 import NCBITaxa

ncbi = NCBITaxa()

def get_desired_ranks(taxid, desired_ranks):
    lineage = ncbi.get_lineage(taxid)
    print lineage
    #[1, 131567, 2157, 28890, 183925, 2158, 2159, 2160, 2162, 1204725]
    names = ncbi.get_taxid_translator(lineage)
    print names
    #{1: u'root', 2157: u'Archaea', 2158: u'Methanobacteriales', 2159: u'Methanobacteriaceae', 2160: u'Methanobacterium', 2162: u'Methanobacterium formicicum', 183925: u'Methanobacteria', 28890: u'Euryarchaeota', 131567: u'cellular organisms', 1204725: u'Methanobacterium formicicum DSM 3637'}
    lineage2ranks = ncbi.get_rank(names)
    print lineage2ranks
    #{1: u'no rank', 2157: u'superkingdom', 2158: u'order', 2159: u'family', 2160: u'genus', 2162: u'species', 183925: u'class', 28890: u'phylum', 131567: u'no rank', 1204725: u'no rank'}
    ranks2lineage = dict((rank,taxid) for (taxid, rank) in lineage2ranks.items())
    print ranks2lineage
    return{'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}

def main(taxids, desired_ranks, path):
    with open(path, 'w') as csvfile:
        fieldnames = ['{}_id'.format(rank) for rank in desired_ranks]
    writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=fieldnames)
        writer.writeheader()
        for taxid in taxids:
            writer.writerow(get_desired_ranks(taxid, desired_ranks))

if __name__ == '__main__':
    taxids = [1204725, 2162,  1300163, 420247]
    desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
    path = 'taxids.csv'
    main(taxids, desired_ranks, path)

非常感谢您能提供的任何帮助.

Many thanks for any help you could provide.

推荐答案

让您的taxids保持原样.

taxids = [1204725, 2162,  1300163, 420247]

然后为每个单独的taxid调用get_desired_ranks.

Then call get_desired_ranks for each individual taxid.

for taxid in taxids:
    ranks = get_desired_ranks(taxid, desired_ranks)

现在分别对ranksprint中的每个key(等级)调用ncbi.get_taxid_translator:

Now call ncbi.get_taxid_translator for each key (rank) in ranks and print the output:

for taxid in taxids:
    print(ncbi.get_taxid_translator([taxid]))
    ranks = get_desired_ranks(taxid, desired_ranks)
    for key, rank in ranks.items():
        if rank != '<not present>':
            print(ncbi.get_taxid_translator([rank]))

输出

{1204725: 'Methanobacterium formicicum DSM 3637'}
{183925: 'Methanobacteria'}
{2159: 'Methanobacteriaceae'}
{2160: 'Methanobacterium'}
{28890: 'Euryarchaeota'}
{2162: 'Methanobacterium formicicum'}
{2158: 'Methanobacteriales'}
{2162: 'Methanobacterium formicicum'}
[...]      
{420247: 'Methanobrevibacter smithii ATCC 35061'}
{183925: 'Methanobacteria'}
{2159: 'Methanobacteriaceae'}
{2172: 'Methanobrevibacter'}
{28890: 'Euryarchaeota'}
{2173: 'Methanobrevibacter smithii'}
{2158: 'Methanobacteriales'}

完整代码,输出得到改善

import csv
from ete3 import NCBITaxa

ncbi = NCBITaxa()

def get_desired_ranks(taxid, desired_ranks):
    lineage = ncbi.get_lineage(taxid)   
    names = ncbi.get_taxid_translator(lineage)
    lineage2ranks = ncbi.get_rank(names)
    ranks2lineage = dict((rank,taxid) for (taxid, rank) in lineage2ranks.items())
    return{'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}

if __name__ == '__main__':
    taxids = [1204725, 2162,  1300163, 420247]
    desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
    for taxid in taxids:
        print(list(ncbi.get_taxid_translator([taxid]).values())[0])
        ranks = get_desired_ranks(taxid, desired_ranks)
        for key, rank in ranks.items():
            if rank != '<not present>':
                print(key + ': ' + list(ncbi.get_taxid_translator([rank]).values())[0])
        print('=' * 60)


如果要使用制表符分隔的输出,可以使用\t连接字符串,或仅使用\t将所有结果添加到listjoin.


If you want to have a tab-separated output you can concatenate the strings with \t or just add all results to a list and join with \t.

在下面的代码片段中,结果存储在名为resultslist中,其中包含另一个存储您的字段(原始ID,王国等)的列表.在每个循环中,结果将添加到最后一个条目(results[-1]).

In the snippet below, the results are stored in a list called results which contains another list which stores your fields (original ID, kingdom, etc.). In each loop the results are added to the last entry (results[-1]).

if __name__ == '__main__':
    taxids = [1204725, 2162,  1300163, 420247]
    desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
    results = list()
    for taxid in taxids:
        results.append(list())
        results[-1].append(str(taxid))
        ranks = get_desired_ranks(taxid, desired_ranks)
        for key, rank in ranks.items():
            if rank != '<not present>':
                results[-1].append(list(ncbi.get_taxid_translator([rank]).values())[0])
            else:
                results[-1].append(rank)

    #generate the header
    header = ['Original_query_taxid']
    header.extend(desired_ranks)
    print('\t'.join(header))

    #print the results
    for result in results:
        print('\t'.join(result))

输出

Original_query_taxid    kingdom phylum  class   order   family  genus   species
1204725 Methanobacterium formicicum     Methanobacteriaceae     Euryarchaeota
Methanobacteria Methanobacteriales      Methanobacterium        <not present>
2162    Methanobacterium formicicum     Methanobacteriaceae     Euryarchaeota
Methanobacteria Methanobacteriales      Methanobacterium        <not present>
1300163 Methanobacterium formicicum     Methanobacteriaceae     Euryarchaeota
Methanobacteria Methanobacteriales      Methanobacterium        <not present>
420247  Methanobrevibacter smithii      Methanobacteriaceae     Euryarchaeota
Methanobacteria Methanobacteriales      Methanobrevibacter      <not present>

这篇关于如何从滑行中获取分类学等级名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆