Python Pandas科学符号图标 [英] Python Pandas Scientific Notation Iconsistent
问题描述
我正在研究在64位Ubuntu 14.04上使用Pandas(自从我刚刚发现它)重写一些数据分析代码,并且遇到了一些奇怪的行为.我的数据文件如下所示:
I am looking into rewriting some data analysis code using Pandas (since I just discovered it) on Ubuntu 14.04 64-bit and I have hit upon some strange behaviour. My data files look like this:
26/09/2014 00:00:00 2.423009 -58.864655 3.312355E-7 6.257226E-8 302 305
26/09/2014 00:00:00 2.395637 -62.73302 3.321525E-7 7.065322E-8 302 305
26/09/2014 00:00:01 2.332541 -57.763269 3.285718E-7 6.873837E-8 302 305
26/09/2014 00:00:02 2.366828 -51.900812 3.262279E-7 7.397762E-8 302 305
26/09/2014 00:00:03 2.435500 -40.820161 3.241068E-7 6.777224E-8 302 305
26/09/2014 00:00:04 2.428922 -65.573049 3.212358E-7 6.761804E-8 302 305
26/09/2014 00:00:05 2.419931 -59.414711 3.185517E-7 7.243236E-8 302 305
26/09/2014 00:00:06 2.416663 -60.064279 3.209795E-7 6.242328E-8 302 305
26/09/2014 00:00:07 2.411954 -52.586242 3.184297E-7 5.825581E-8 302 304
26/09/2014 00:00:08 2.457342 -61.874388 3.151493E-7 6.327384E-8 303 304
其中的列用制表符分隔.为了将它们读入Pandas,我使用以下简单命令:
Where columns are tab-separated. In order to read these into Pandas, I am using the following simple commands:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv("path/to/file.dat", sep="\t", header=None)
print data
这将产生以下输出:
0 1 2 3 4 5 6 7
0 26/09/2014 00:00:00 2.423009 -58.864655 0 6.257226e-08 302 305
1 26/09/2014 00:00:00 2.395637 -62.733020 0 7.065322e-08 302 305
2 26/09/2014 00:00:01 2.332541 -57.763269 0 6.873837e-08 302 305
3 26/09/2014 00:00:02 2.366828 -51.900812 0 7.397762e-08 302 305
4 26/09/2014 00:00:03 2.435500 -40.820161 0 6.777224e-08 302 305
5 26/09/2014 00:00:04 2.428922 -65.573049 0 6.761804e-08 302 305
6 26/09/2014 00:00:05 2.419931 -59.414711 0 7.243236e-08 302 305
7 26/09/2014 00:00:06 2.416663 -60.064279 0 6.242328e-08 302 305
8 26/09/2014 00:00:07 2.411954 -52.586242 0 5.825581e-08 302 304
9 26/09/2014 00:00:08 2.457342 -61.874388 0 6.327384e-08 303 304
[10 rows x 8 columns]
此处要注意的重要事项是第4列.将其与第5列以及原始数据进行比较.第5列以科学计数法表示,而第4列则没有.它不只是将列清零或将其转换为int,原因是:
The important thing to notice here is column 4. Compare it to column 5, and to the original data. Column 5 has been rendered in scientific notation, while column 4 has not. It hasn't just zeroed out the column or converted it to int because:
>>> data[4][0]*1e7
3.3123550000000002
这是我所期望的.因此,数据值相同,但表示形式已更改.如果这只是装饰性的事情,那么我可以忍受,但这让我感到不安,我想知道这里发生了什么.
Which is what I would expect. So the data values are the same but the representation has changed. If this is just a cosmetic thing, then I could put up with it, but it makes me feel uneasy and I'd like to know what's going on here.
推荐答案
Yes it's a cosmetic thing, you can change this using set_option
:
In [21]:
pd.set_option('display.precision',20)
df[4]
Out[21]:
0 0.0000003312355
1 0.0000003321525
2 0.0000003285718
3 0.0000003262279
4 0.0000003241068
5 0.0000003212358
6 0.0000003185517
7 0.0000003209795
8 0.0000003184297
9 0.0000003151493
Name: 4, dtype: float64
基础数据将不会被截断,并且会保留下来,包括在将数据写回csv时
The underlying data will not have been truncated and will be preserved including when you write the data back out to csv
如果您使用的是iPython,则可以检查默认设置是什么,对于显示精度(有效数字),通常为7.
If you are in iPython then you can check what the default settings are, for display precision (significant digits) it is 7 normally.
这篇关于Python Pandas科学符号图标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!