为什么`pandas.read_csv`不是`pandas.DataFrame.to_csv`的倒数? [英] Why is `pandas.read_csv` not the reciprocal of `pandas.DataFrame.to_csv`?
问题描述
对于我来说,对于 pandas.read_csv
并不是直接相互作用的函数,对于 df.to_csv
来说似乎很奇怪。在本例中,请注意如何使用所有默认设置,原始和最终的DataFrames由未命名列不同。
It seems strange to me that pandas.read_csv
is not a direct reciprocal function to df.to_csv
. In this illustration, notice how when using all the default settings the original and final DataFrames differ by the "Unnamed" column.
In [1]: import pandas as pd
In [2]: orig_df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); orig_df
Out[2]:
AAA BBB CCC
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
[4 rows x 3 columns]
In [3]: orig_df.to_csv('test.csv')
In [4]: final_df = pd.read_csv('test.csv'); final_df
Out[4]:
Unnamed: 0 AAA BBB CCC
0 0 4 10 100
1 1 5 20 50
2 2 6 30 -30
3 3 7 40 -50
[4 rows x 4 columns]
似乎默认的 read_csv
应该是
In [6]: final2_df = pd.read_csv('test.csv', index_col=0); final2_df
Out[7]:
AAA BBB CCC
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
[4 rows x 3 columns]
或默认 to_csv
应该是
In [8]: df.to_csv('test2.csv', index=False)
读取时给出
In [9]: pd.read_csv('test2.csv')
Out[9]:
AAA BBB CCC
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
[4行x 3列]
(也许这应该发送给开发人员,但我真的很感兴趣为什么这样这是默认的行为,希望它也可以帮助别人避免我的困惑)。
(Perhaps this should instead be sent to the developer/s but I am genuinely interested why this is the default behavior. Hopefully it also can help someone else avoid the confusion I had).
推荐答案
感谢提示发贴 github 页面@EdChum。这导致我进入 pandas.DataFrame.from_csv
函数,它确实是 pandas.DataFrame.to_csv
的倒数。
Thanks for the tip to post to the github page @EdChum. This led me to the pandas.DataFrame.from_csv
function which is indeed the reciprocal of pandas.DataFrame.to_csv
.
In [6]: final_df = pd.DataFrame.from_csv('test.csv')
In [7]: final_df
Out[7]:
AAA BBB CCC
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
[4 rows x 3 columns]
这篇关于为什么`pandas.read_csv`不是`pandas.DataFrame.to_csv`的倒数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!