将命名元组的值从字符串转换为整数 [英] Converting values of named tuples from strings to integers

查看:153
本文介绍了将命名元组的值从字符串转换为整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个脚本,以便从列标题中将csv文件读入一组命名元组。然后我将使用这些命名的元组来拉出符合特定条件的数据行。



我已经计算出输入(如下所示),但我遇到了过滤问题然后将数据输出到另一个文件。

  import csv 
从集合import namedtuple

with open('test_data.csv')as f:
f_csv = csv.reader(f)#read using csv.reader()
Base = namedtuple('Base',next(f_csv)) #create namedtuple键从标题行
for r在f_csv:#为文件中的每一行
row = Base(* r)
#处理行
print(row)#打印数据

我的输入文件的内容如下:

  Locus Total_Depth Average_Depth_sample Depth_for_17 
chr1:6484996 1030 1030 1030
chr1:6484997 14 14 14
chr1:6484998 0 0 0

它们是从我的代码打印如下:


Base(Locus ='chr1:6484996',Total_Depth ='1030',
Average_Depth_sample ='1030',Depth_for_17 ='1030')
Base(Locus = 'chr1:6484997',Total_Depth = '14',
Average_Depth_sample = '14',Depth_for_17 = '14')
Base(Locus ='chr1:6484998',Total_Depth ='0',Average_Depth_sample = '0',
Depth_for_17 ='0')


我想只能Total_Depth大于15。



直观地,我尝试了以下函数:

  if Base.Total_Depth> = 15:
print row

最后一行数据(来自上面的输出表)。我认为问题是双重的。就我可以告诉我不存储我命名的元组在任何地方,以供他们以后参考。



首先,如果我需要将命名的元组存储在某个地方,首先有人可以更正我。



其次如何将字符串值转换为整数?或者这是不可能的,因为namedtuples是不可变的。



谢谢!



先前曾提出类似问题到字典,但现在想要使用namedtuples。 :)

解决方案

在创建命名元组时将值映射到 int 实例:

  row = Base(r [0],* map(int,r [1:]))

这会将 r [0] ,并将剩余值映射到 int()





演示:

 >>>>从集合import namedtuple 
>>> Base = namedtuple('Base',['Locus','Total_Depth','Average_Depth_sample','Depth_for_17'])
>>> r = ['chr1:6484996','1030','1030','1030']
>>>>
Base(Locus ='chr1:6484996',Total_Depth = 1030,Average_Depth_sample = 1030,Depth_for_17 = 1030)
基本(r [0],* map(int,r [1: / code>

注意,你应该测试行,而不是 Base

  if row.Total_Depth> = 15:

在循环中,或在收集的行的新循环中。


I'm creating a script to read a csv file into a set of named tuples from their column headers. I will then use these namedtuples to pull out rows of data which meet certain criteria.

I've worked out the input (shown below), but am having issues with filtering the data before outputting it to another file.

import csv
from collections import namedtuple

with open('test_data.csv') as f:
    f_csv = csv.reader(f) #read using csv.reader()
    Base = namedtuple('Base', next(f_csv)) #create namedtuple keys from header row
    for r in f_csv: #for each row in the file
        row = Base(*r) 
        # Process row
        print(row) #print data

The contents of my input file are as follows:

Locus           Total_Depth     Average_Depth_sample    Depth_for_17
chr1:6484996    1030            1030                    1030
chr1:6484997    14              14                      14
chr1:6484998    0               0                       0

And they are printed from my code as follow:

Base(Locus='chr1:6484996', Total_Depth='1030', Average_Depth_sample='1030', Depth_for_17='1030') Base(Locus='chr1:6484997', Total_Depth='14', Average_Depth_sample='14', Depth_for_17='14') Base(Locus='chr1:6484998', Total_Depth='0', Average_Depth_sample='0', Depth_for_17='0')

I want to be able to pull out only the records with a Total_Depth greater than 15.

Intuitively I tried the following function:

if Base.Total_Depth >= 15 :
    print row

However this only prints the final row of data (from the above output table). I think the problem is twofold. As far as I can tell I'm not storing my named tuples anywhere for them to be referenced later. And secondly the numbers are being read in string format rather than as integers.

Firstly can someone correct me if I need to store my namedtuples somewhere.

And secondly how do I convert the string values to integers? Or is this not possible because namedtuples are immutable.

Thanks!

I previously asked a similar question with respect to dictionaries, but now would like to use namedtuples instead. :)

解决方案

Map your values to int when creating the named tuple instances:

row = Base(r[0], *map(int, r[1:])) 

This keeps the r[0] value as a string, and maps the remaining values to int().

This does require knowledge of the CSV columns as which ones can be converted to integer is hardcoded here.

Demo:

>>> from collections import namedtuple
>>> Base = namedtuple('Base', ['Locus', 'Total_Depth', 'Average_Depth_sample', 'Depth_for_17'])
>>> r = ['chr1:6484996', '1030', '1030', '1030']
>>> Base(r[0], *map(int, r[1:]))
Base(Locus='chr1:6484996', Total_Depth=1030, Average_Depth_sample=1030, Depth_for_17=1030)

Note that you should test against the rows, not the Base class:

if row.Total_Depth >= 15:

within the loop, or in a new loop of collected rows.

这篇关于将命名元组的值从字符串转换为整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆