从csv读取pandas数据帧,从非修复头开始 [英] Read pandas dataframe from csv beginning with non-fix header

查看:140
本文介绍了从csv读取pandas数据帧,从非修复头开始的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据文件由我的实验室中使用的一些相当麻木的脚本。该脚本是相当有趣的,因为它附加在标题之间的行数在文件之间变化(尽管它们是相同的格式和具有相同的标题)。



我正在写一个批处理来处理所有这些文件到数据帧。如果我不知道位置,如何使熊猫识别正确的标题?我知道确切的heder文本,以及直接在它之前的两行文本(它们是文档中 \r\\\
的唯一连续实例) 。



我试图在文档结尾处定义空跳,并选择每个文件包含的(幸运)固定数据行数:

  df = pd.read_csv(myfile,skipfooter = 0,nrows = 267)


您有任何其他想法吗?

\r\\\
,并将结果传递给parser,即

 以open(csv_file_name,'rb')作为源:
consec_empty_lines = 0

if line =='\r\\\
':
consec_empty_lines + = 1
如果consec_empty_lines == 2:
break
else :
consec_empty_lines = 0
df = pd.read_csv(source)


I have a number of data files produced by some rather hackish script used in my lab. The script is quite entertaining in that the number of lines it appends before the header varies from file to file (though they are of the same format and have the same header).

I am writing a batch to process all of these files to dataframes. How can I make pandas identify the correct header if I do not know the position? I know the exact heder text, and the text of the two lines that come directly before it (they are the only consecutive instances of \r\n in the document).

I have tried to define null skipping at the end of the document and select the (thankfully) fixed number of data rows each file contains:

df = pd.read_csv(myfile, skipfooter=0, nrows=267)

That did not work.

Do you have any further ideas?

解决方案

You can open file and iterate it until consecutive \r\n are met, and pass result to parser, i.e.

with open(csv_file_name, 'rb') as source:
    consec_empty_lines = 0
    for line in source:
        if line == '\r\n':
            consec_empty_lines += 1
            if consec_empty_lines == 2: 
                break
        else:
            consec_empty_lines = 0
    df = pd.read_csv(source)

这篇关于从csv读取pandas数据帧,从非修复头开始的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆