pandas :如何在同一单元格上读取多行csv? [英] Pandas: how to read csv with multiple lines on the same cell?
问题描述
我有一个csv
,我无法使用read_csv
来阅读
用精美的文字打开csv
会显示类似以下内容:
I have a csv
that I am not able to read using read_csv
Opening the csv
with sublime text shows something like:
col1,col2,col3
text,2,3
more text,3,4
HELLO
THIS IS FUN
,3,4
如您所见,文本HELLO THIS IS FUN
占三行,而pd.read_csv
则感到困惑,因为它认为这是三个新的观察结果.如何在Pandas中正确解析?
As you can see, the text HELLO THIS IS FUN
takes three lines, and pd.read_csv
is confused as it thinks these are three new observations. How can I parse that correctly in Pandas?
谢谢!
推荐答案
看来您必须手动预处理数据:
It looks like you'll have to preprocess the data manually:
with open('data.csv','r') as f:
lines = f.read().splitlines()
processed = []
cum_c = 0
buffer = ''
for line in lines:
buffer += line # Append the current line to a buffer
c = buffer.count(',')
if cum_c == 2:
processed.append(line)
buffer = ''
elif cum_c > 2:
raise # This should never happen
这假设您的数据仅包含不需要的换行符,例如如果您的数据中有一行3个元素,下一行2个元素,则下一行应该为空白或仅包含1个元素.如果它有2个或更多,即缺少必要的换行符,则会引发错误.如有必要,您可以稍作修改就可以适应这种情况.
This assumes that your data only contains unwanted newlines, e.g. if you had data with say, 3 elements in one row, 2 elements in the next, then the next row should either be blank or contain only 1 element. If it has 2 or more, i.e. it's missing a necessary newline, then an error is thrown. You can accommodate this case if necessary with a minor modification.
实际上,相反,删除换行符可能更有效,但这无关紧要,除非您有很多数据.
Actually, it might be more efficient to remove newlines instead, but it shouldn't matter unless you have a lot of data.
这篇关于 pandas :如何在同一单元格上读取多行csv?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!