在 pandas 中添加行终止符最终会添加另一个\ r [英] Adding a line-terminator in pandas ends up adding another \r
问题描述
我可以将熊猫默认设置的csv文件加载到熊猫数据框中:
I am able to load a csv file fine into a pandas dataframe with the panda defaults:
df = pd.read_csv(file)
>>> df
distance recession_velocity
0 # not a row NaN
1 0.032 170.0
2 0.034 290.0
3 0.214 -130.0
但是,一旦我添加了lineterminator
,该程序似乎就一团糟:
However, as soon as I add the lineterminator
, the program seems to go haywire:
df = pd.read_csv(file, lineterminator='\n')
distance recession_velocity\r
0 # not a row \r
1 0.032 170\r
2 0.034 290\r
3 0.214 -130\r
该文件确实具有\n
行分隔符:
The file indeed does have a \n
line separator:
>>> print(repr(open('/Users/david/example.csv').read()))
'distance,recession_velocity\n# not a row,\n0.032,170\n0.034,290\n0.214,-130\n0.263,
这里的问题是什么,有没有办法解决此问题而不必修剪所有列值?
What is the issue here and is there a way to fix it without having to trim all the column values?
推荐答案
Python的文件对象将以文本模式自动将\r\n
转换为\n
. read_csv
使用其自己的文件处理,它的确会看到\r\n
,因此,如果您传递lineterminator="\n"
,它实际上只会修剪一个字符.
Python's file objects will automatically translate \r\n
to \n
in text mode. read_csv
uses its own file handling, it will indeed see \r\n
instead, so if you pass lineterminator="\n"
it will really just trim that one character.
如果根本不传递lineterminator
参数,它将猜测行尾样式.您也可以传入文件对象而不是路径.这可能会使速度变慢,但会为您提供与直接阅读时相同的转换行为.
If you don't pass the lineterminator
parameter at all, it will guess the line-ending style. You can also pass in a file object instead of a path. This may slow things down a bit, but it will give you the same transformation behaviour that you see when you do a straight read.
这篇关于在 pandas 中添加行终止符最终会添加另一个\ r的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!