当所有其他列都保证相同时，只从CSV文件中读取所选列 [英] Reading selected column only from CSV file, when all other columns are guaranteed to be identical

查看：117 发布时间：2017/2/24 22:49:30 python csv file-format

本文介绍了当所有其他列都保证相同时，只从CSV文件中读取所选列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一堆CSV文件，我试图连接成一个单一的csv文件。 CSV文件由单个空格分隔，如下所示：

 'initial'，'pos' 'ratio'
'chr'，'106681'，'+'，'0.06'
'chr'，'106681'，'+'，'0.88'
'chr' 106681'，'+'，'0.01'
'chr'，'106681'，'+'，'0.02'

可以看到，除了 ratio ，所有的值都是相同的。我创建的连接文件将如下所示：

 'filename'，'initial'，'pos' ，'ratio1'，'ratio2'，'ratio3'
'jon'，'chr'，'106681'，'+'，'0.06'，'0.88'，'0.01'
 初始的一个值c>， pos ， orientation ，但比率的所有值并更新连接文件中的表。这证明比我更混乱，虽然它会是。我有以下代码片段读取csv文件：
  concatenated_file = open（'josh.csv'，rb ）
 reader = csv.reader（concatenated_file）
 
读取行：
打印行
  
它提供：
  ['chrom'，'pos'，'strand' ，'meth_ratio'] 
 ['chr2'，'106681786'，'+'，'0.06'] 
 ['chr2'，'106681796'，'+'，'0.88'] 
 ['chr2'，'106681830'，'+'，'0.01'] 
 ['chr2'，'106681842'，'+'，'0.02'] 
  
如果有人能告诉我如何存储初始， pos ， orientation （因为它们保持不变），但 
解决方案
这是一个带有 pandas.read_csv（）。我们甚至可以删除引号：
  import pandas as pd 
 
 csva = pd.read_csv （'a.csv'，header = 0，quotechar ='，delim_whitespace = True）
 
 csva ['ratio'] 
 0 0.06 
 1 0.88 
 2 0.01 
 3 0.02 
名称：ratio，dtype：float64 
  
几个点：
 
 
  
 其实你的分隔符是逗号+空格。在这个意义上，它不是纯粹的vanilla CSV。请参见如何使read_csv中的分隔符更加灵活？< a> 
 
 请注意，我们通过设置 quotechar =' 
 
 如果您真的坚持保存内存（不要），您可以在执行read_csv之后删除 csva 的所有其他列，而不是ratio。请参阅pandas文档。
 
 
 
I have a bunch of CSV files that Im trying to concatenate into one single csv file . The CSV files are separated by a single space and look like this:
'initial', 'pos', 'orientation', 'ratio'
'chr', '106681', '+', '0.06'
'chr', '106681', '+', '0.88'
'chr', '106681', '+', '0.01'
'chr', '106681', '+', '0.02'
As you can see, all the values are the same except for the ratio. The concatenated file I am  creating will look like this:
'filename','initial', 'pos', 'orientation', 'ratio1','ratio2','ratio3'
'jon' , 'chr', '106681', '+', '0.06' , '0.88' ,'0.01'
So basically, ill be iterating through each file, storing only one value of the initial , pos, orientation but all the values of the ratio and updating the table in the concatenated file. This is proving much more confusing than i though it would be. I have the following piece of code to read the csv files:
concatenated_file  = open('josh.csv', "rb")
reader = csv.reader(concatenated_file)

for row in reader:
    print row
which gives:
['chrom', 'pos', 'strand', 'meth_ratio']
['chr2', '106681786', '+', '0.06']
['chr2', '106681796', '+', '0.88']
['chr2', '106681830', '+', '0.01']
['chr2', '106681842', '+', '0.02']
It would be really helpful if some one can show me how to store only one value of the initial , pos, orientation (because they remain same) but all the values of the ratio
 解决方案 This is a one-liner with pandas.read_csv(). And we can even drop the quoting too:
import pandas as pd

csva = pd.read_csv('a.csv', header=0, quotechar="'", delim_whitespace=True)

csva['ratio']
0    0.06
1    0.88
2    0.01
3    0.02
Name: ratio, dtype: float64
A couple of points:


actually your separator is comma + whitespace. In that sense it's not plain-vanilla CSV. See "How to make separator in read_csv more flexible?"
note we dropped the quoting on numeric fields, by setting quotechar="'"
if you really insist on saving memory (don't), you can drop all other columns of csva than 'ratio', after you do the read_csv. See the pandas doc.


                        这篇关于当所有其他列都保证相同时，只从CSV文件中读取所选列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

当所有其他列都保证相同时，只从CSV文件中读取所选列 [英] Reading selected column only from CSV file, when all other columns are guaranteed to be identical

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

当所有其他列都保证相同时，只从CSV文件中读取所选列 [英] Reading selected column only from CSV file, when all other columns are guaranteed to be identical

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭