csv文件列读取和提取使用python [英] csv file column reading and extracting using python

查看:205
本文介绍了csv文件列读取和提取使用python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码...

  reader = csv.DictReader(open(test1.csv, r))
allrows = list(reader)

keepcols = [c for all in allrows [0] if all(r [c]!='0'for r in allrows) ]

print keepcols
writer = csv.DictWriter(open(output1.csv,w),fieldnames ='keepcols',extrasaction ='ignore')
writer.writerows(allrows)

我有一个csv文件,
第一列有一些名称。

除了第一列,所有其他只有0和1的...
,当然,整个表有一些标题..

im试图从csv文件中读取列,我需要只提取那些cols与1的

的问题是输出文件是空的,即使有几列



pre> 标题3003_contact 3003_backbone 3003_sidechain 3003_polar 3003_hydrophobic 3003_acceptor 3003_donor 3003_aromatic
l1 1 1 0 1 1 0 0 0
l1 1 0 1 0 0 0 1 0
l1 1 0 0 0 0 0 0 0
l1 1 0 0 0 1 0 0 1
l1 1 0 0 0 0 0 0 0
l2 1 0 0 0 1 0 0 0
l2 1 0 0 0 0 1 0 0
l3 1 0 0 0 0 0 0 0
l3 1 0 0 0 0 0 1 0
l3 1 0 0 0 0 0 0 1
l3 1 0 0 0 0 0 0 0
l3 1 0 0 0 0 0 0 0
l4 1 0 0 0 0 0 0 0
l4 1 0 0 0 0 0 0 0
l4 1 0 0 0 0 0 0 0

它只返回第1列。 ..我试过改变'keepcols'为keepcols ...我先得到column2然后column1作为输出

解决方案

修改:如果输入文件是逗号分隔的值文件,则
维护键的顺序,请使用 reader.fieldnames



而不是 allrows [0]

b
$ b

  keepcols = [c for reader.fieldnames if any(r [c]!='0'for r in allrows] 

上面发布的输入文件看起来像有空格分隔的列。在这种情况下,我不认为 csv 是解析它的正确工具。您可以使用 split

  import csv 
(test1.csv,r)as f:
fields = next(f).split()
#print(fields)
allrows = []
for line in f:
line = line.split()
row = dict(zip(fields,line))
allrows.append(row)
#print )
keepcols = [c for c in fields if any(row [c]!='0'for row in allrows)]
print keepcols
writer = csv.DictWriter output1.csv,w),fieldnames = keepcols,extrasaction ='ignore')
writer.writerows(allrows)

Edit2:列顺序改变的原因是因为allrows [0] 中的c的 allrows [0] dict 键在默认情况下不排序。上面的代码通过将字段定义为列表而不是 dict 来解决这个问题。



原始答案
fieldnames ='keepcols'更改为 fieldnames = keepcols



fieldnames 的键,例如 ['fieldA','fieldB',...]



潜在的陷阱在Python中要注意的是字符串是序列。当你遍历一个字符串时,你得到字符串的字符。因此,当您说 fieldnames ='keepcols'时,您将 fieldnames 设置为字符序列 ['k','e','e','p','c','o','l','s'] 。您不会得到错误,因为这是一个有效的键序列。但是你的列表, allrows 不会发生有这些键。 writer.writerows blithely忽略此项,因为 extrasaction ='ignore'


i have the following code...

reader=csv.DictReader(open("test1.csv","r"))
allrows = list(reader)

keepcols = [c for c in allrows[0] if all(r[c] != '0' for r in allrows)]

print keepcols
writer=csv.DictWriter(open("output1.csv","w"),fieldnames='keepcols',extrasaction='ignore')
writer.writerows(allrows)

i have a csv file which has about 45 cols..
the first column has some names..
except the first column, all others have only 0's and 1's... and of course, the whole table has some titles as well..
i m trying to read columns from csv file and i need to extract only those cols with 1's
the problem is the output file is empty even though there are a few columns in the table with 1's..

could somebody please help me out.... :( i m stuck terribly..

Title    3003_contact    3003_backbone   3003_sidechain  3003_polar  3003_hydrophobic    3003_acceptor   3003_donor  3003_aromatic
l1  1   1   0   1   1   0   0   0
l1  1   0   1   0   0   0   1   0
l1  1   0   0   0   0   0   0   0
l1  1   0   0   0   1   0   0   1
l1  1   0   0   0   0   0   0   0
l2  1   0   0   0   1   0   0   0
l2  1   0   0   0   0   1   0   0
l3  1   0   0   0   0   0   0   0
l3  1   0   0   0   0   0   1   0
l3  1   0   0   0   0   0   0   1
l3  1   0   0   0   0   0   0   0
l3  1   0   0   0   0   0   0   0
l4  1   0   0   0   0   0   0   0
l4  1   0   0   0   0   0   0   0
l4  1   0   0   0   0   0   0   0

it returns only column 1... I've tried changing 'keepcols' to keepcols... and I get column2 first and then column1 as output

解决方案

Edit: If the input file is a comma-separated values file, then to maintain the order of the keys, use reader.fieldnames instead of the keys in allrows[0].

So the solution would be:

keepcols = [c for c in reader.fieldnames if any(r[c] != '0' for r in allrows)]

The input file posted above looks like it has space-separated columns. In this case, I don't think csv is the right tool for parsing it. Instead, you can use split:

import csv
with open("test1.csv","r") as f:
    fields=next(f).split()
    # print(fields)
    allrows=[]
    for line in f:
        line=line.split()
        row=dict(zip(fields,line))
        allrows.append(row)
        # print(row)
    keepcols = [c for c in fields if any(row[c] != '0' for row in allrows)]
    print keepcols
    writer=csv.DictWriter(open("output1.csv","w"),fieldnames=keepcols,extrasaction='ignore')
    writer.writerows(allrows)

Edit2: The reason why the column order was changing is because for c in allrows[0] returns the keys of allrows[0] in an unspecified order. dict keys are not ordered by default. The above code works around this by defining fields to be a list, not a dict.

Original answer: Change fieldnames='keepcols' to fieldnames=keepcols.

fieldnames needs to be a sequence of keys, such as ['fieldA','fieldB',...].

A potential pitfall to be aware of in Python is that strings are sequences. When you iterate over a string, you get the characters of the string. So when you say fieldnames='keepcols', you are setting fieldnames to be the sequence of characters ['k','e','e','p','c','o','l','s']. You don't get an error because this is a valid sequence of keys. But your list of dicts, allrows doesn't happen to have these keys. writer.writerows blithely ignores this since extrasaction='ignore'.

这篇关于csv文件列读取和提取使用python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆