在 Pandas 中使用 read_csv 时精度丢失 [英] Precision lost while using read_csv in pandas
问题描述
我在文本文件中有以下格式的文件,我试图将其读入熊猫数据帧.
I have files of the below format in a text file which I am trying to read into a pandas dataframe.
895|2015-4-23|19|10000|LA|0.4677978806|0.4773469340|0.4089938425|0.8224291972|0.8652525793|0.6829942860|0.5139162227|
如您所见,输入文件中浮点数后有 10 个整数.
As you can see there are 10 integers after the floating point in the input file.
df = pd.read_csv('mockup.txt',header=None,delimiter='|')
当我尝试将它读入数据帧时,我没有得到最后 4 个整数
When I try to read it into dataframe, I am not getting the last 4 integers
df[5].head()
0 0.467798
1 0.258165
2 0.860384
3 0.803388
4 0.249820
Name: 5, dtype: float64
如何获得输入文件中的完整精度?我有一些需要执行的矩阵运算,所以我不能将它转换为字符串.
How can I get the complete precision as present in the input file? I have some matrix operations that needs to be performed so i cannot cast it as string.
我发现我必须对 dtype
做一些事情,但我不确定应该在哪里使用它.
I figured out that I have to do something about dtype
but I am not sure where I should use it.
推荐答案
只是显示问题,见文档:
#temporaly set display precision
with pd.option_context('display.precision', 10):
print df
0 1 2 3 4 5 6 7
0 895 2015-4-23 19 10000 LA 0.4677978806 0.477346934 0.4089938425
8 9 10 11 12
0 0.8224291972 0.8652525793 0.682994286 0.5139162227 NaN
(谢谢马克·狄金森):
Pandas 使用专用的十进制到二进制转换器,为了速度而牺牲了完美的准确性.将 float_precision='round_trip'
传递给 read_csv 可以解决这个问题.请参阅文档了解更多.
Pandas uses a dedicated decimal-to-binary converter that sacrifices perfect accuracy for the sake of speed. Passing
float_precision='round_trip'
to read_csv fixes this. See the documentation for more.
这篇关于在 Pandas 中使用 read_csv 时精度丢失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!