当所有其他列都保证相同时,只从CSV文件中读取所选列 [英] Reading selected column only from CSV file, when all other columns are guaranteed to be identical

查看:117
本文介绍了当所有其他列都保证相同时,只从CSV文件中读取所选列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一堆CSV文件,我试图连接成一个单一的csv文件。 CSV文件由单个空格分隔,如下所示:

 'initial','pos' 'ratio'
'chr','106681','+','0.06'
'chr','106681','+','0.88'
'chr' 106681','+','0.01'
'chr','106681','+','0.02'

可以看到,除了 ratio ,所有的值都是相同的。我创建的连接文件将如下所示:

 'filename','initial','pos' ,'ratio1','ratio2','ratio3'
'jon','chr','106681','+','0.06','0.88','0.01'
初始的一个值c>, pos orientation ,但比率的所有值并更新连接文件中的表。这证明比我更混乱,虽然它会是。我有以下代码片段读取csv文件:

  concatenated_file = open('josh.csv',rb )
reader = csv.reader(concatenated_file)

读取行:
打印行

它提供:

  ['chrom','pos','strand' ,'meth_ratio'] 
['chr2','106681786','+','0.06']
['chr2','106681796','+','0.88']
['chr2','106681830','+','0.01']
['chr2','106681842','+','0.02']

如果有人能告诉我如何存储初始 pos orientation (因为它们保持不变),但

解决方案

这是一个带有 pandas.read_csv()。我们甚至可以删除引号:

  import pandas as pd 

csva = pd.read_csv ('a.csv',header = 0,quotechar =',delim_whitespace = True)

csva ['ratio']
0 0.06
1 0.88
2 0.01
3 0.02
名称:ratio,dtype:float64

几个点:




I have a bunch of CSV files that Im trying to concatenate into one single csv file . The CSV files are separated by a single space and look like this:

'initial', 'pos', 'orientation', 'ratio'
'chr', '106681', '+', '0.06'
'chr', '106681', '+', '0.88'
'chr', '106681', '+', '0.01'
'chr', '106681', '+', '0.02'

As you can see, all the values are the same except for the ratio. The concatenated file I am creating will look like this:

'filename','initial', 'pos', 'orientation', 'ratio1','ratio2','ratio3'
'jon' , 'chr', '106681', '+', '0.06' , '0.88' ,'0.01'

So basically, ill be iterating through each file, storing only one value of the initial , pos, orientation but all the values of the ratio and updating the table in the concatenated file. This is proving much more confusing than i though it would be. I have the following piece of code to read the csv files:

concatenated_file  = open('josh.csv', "rb")
reader = csv.reader(concatenated_file)

for row in reader:
    print row

which gives:

['chrom', 'pos', 'strand', 'meth_ratio']
['chr2', '106681786', '+', '0.06']
['chr2', '106681796', '+', '0.88']
['chr2', '106681830', '+', '0.01']
['chr2', '106681842', '+', '0.02']

It would be really helpful if some one can show me how to store only one value of the initial , pos, orientation (because they remain same) but all the values of the ratio

解决方案

This is a one-liner with pandas.read_csv(). And we can even drop the quoting too:

import pandas as pd

csva = pd.read_csv('a.csv', header=0, quotechar="'", delim_whitespace=True)

csva['ratio']
0    0.06
1    0.88
2    0.01
3    0.02
Name: ratio, dtype: float64

A couple of points:

  • actually your separator is comma + whitespace. In that sense it's not plain-vanilla CSV. See "How to make separator in read_csv more flexible?"
  • note we dropped the quoting on numeric fields, by setting quotechar="'"
  • if you really insist on saving memory (don't), you can drop all other columns of csva than 'ratio', after you do the read_csv. See the pandas doc.

这篇关于当所有其他列都保证相同时,只从CSV文件中读取所选列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆