pandas Series.value_counts() 的奇怪行为 [英] Bizarre behaviour of pandas Series.value_counts()

查看:56
本文介绍了 pandas Series.value_counts() 的奇怪行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含数值数据的 Pandas 系列,我想找到它的唯一值以及它们的频率外观.我使用标准程序

I have a Pandas Series with numerical data and I want to find its unique values together with their frequency-appearance. I use the standard procedure

# Given the my_data is a column of a pd.Dataframe df
unique = df[my_data].value_counts() 
print unique

这是我得到的结果

# -------------------OUTPUT
-0.010000    46483 
-0.010000    16895
-0.027497    12215
-0.294492    11915
 0.027497    11397

我不明白的是为什么我有两次相同的值"(-0.01).这是一个内部阈值(小值)还是我做错了什么?

What I don't get is why I have the "same value" (-0.01) occurring twice. Is that an internal threshold (small value) or is something that I am doing wrong??

更新

如果我将数据帧存储在 csv 中并再次读取它,我会得到正确的结果,即:

If I store the dataframe in csv and read it again I get the correct result, namely:

# -------------------输出-0.010000 63378-0.027497 12215-0.294492 119150.027497 11397解决方案

根据讨论,我找到了问题的根源和解决方案.如前所述,它是一个浮点精度,可以通过四舍五入来解决.虽然,如果没有

Based on the discussion, I found the source of the problem and the solution. As mentioned it is a floating-point precision which can be solved with rounding the values. Though, I wouldn't be able to see that without

pd.set_option('display.float_format', repr)

非常感谢您的帮助!!

推荐答案

我认为这是一个类似于以下问题的浮点精度问题:

I think it's a float precision issue similar to the following one:

In [1]: 0.1 + 0.2
Out[1]: 0.30000000000000004

In [2]: 0.1 + 0.2 == 0.3
Out[2]: False

试试这个:

df[my_data].round(6).value_counts() 

<小时>

更新:

演示:

In [14]: s = pd.Series([-0.01, -0.01, -0.01000000000123, 0.2])

In [15]: s
Out[15]:
0   -0.01
1   -0.01
2   -0.01
3    0.20
dtype: float64

In [16]: s.value_counts()
Out[16]:
-0.01    2
-0.01    1
 0.20    1
dtype: int64

In [17]: s.round(6).value_counts()
Out[17]:
-0.01    3
 0.20    1
dtype: int64

这篇关于 pandas Series.value_counts() 的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆