pandas :如何在同一单元格上读取多行csv? [英] Pandas: how to read csv with multiple lines on the same cell?

查看:64
本文介绍了 pandas :如何在同一单元格上读取多行csv?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv,我无法使用read_csv来阅读 用精美的文字打开csv会显示类似以下内容:

I have a csv that I am not able to read using read_csv Opening the csv with sublime text shows something like:

col1,col2,col3
text,2,3
more text,3,4
HELLO

THIS IS FUN
,3,4

如您所见,文本HELLO THIS IS FUN占三行,而pd.read_csv则感到困惑,因为它认为这是三个新的观察结果.如何在Pandas中正确解析?

As you can see, the text HELLO THIS IS FUN takes three lines, and pd.read_csv is confused as it thinks these are three new observations. How can I parse that correctly in Pandas?

谢谢!

推荐答案

看来您必须手动预处理数据:

It looks like you'll have to preprocess the data manually:

with open('data.csv','r') as f:
    lines = f.read().splitlines()
processed = []
cum_c = 0
buffer = ''
for line in lines:
    buffer += line # Append the current line to a buffer
    c = buffer.count(',')
    if cum_c == 2:
        processed.append(line)
        buffer = ''
    elif cum_c > 2:
        raise # This should never happen

这假设您的数据仅包含不需要的换行符,例如如果您的数据中有一行3个元素,下一行2个元素,则下一行应该为空白或仅包含1个元素.如果它有2个或更多,即缺少必要的换行符,则会引发错误.如有必要,您可以稍作修改就可以适应这种情况.

This assumes that your data only contains unwanted newlines, e.g. if you had data with say, 3 elements in one row, 2 elements in the next, then the next row should either be blank or contain only 1 element. If it has 2 or more, i.e. it's missing a necessary newline, then an error is thrown. You can accommodate this case if necessary with a minor modification.

实际上,相反,删除换行符可能更有效,但这无关紧要,除非您有很多数据.

Actually, it might be more efficient to remove newlines instead, but it shouldn't matter unless you have a lot of data.

这篇关于 pandas :如何在同一单元格上读取多行csv?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆