pandas KeyError,使用浮点数时找不到索引 [英] pandas KeyError, can't find index when using floats

查看:91
本文介绍了 pandas KeyError,使用浮点数时找不到索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到以下问题:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(401), index=np.linspace(0, 1, 401))
print(np.linspace(0, 1, 401))

我们看到0.47在其中:

[ 0.      0.0025  0.005   0.0075  0.01    0.0125  0.015   0.0175  0.02
  0.0225  0.025   0.0275  0.03    0.0325  0.035   0.0375  0.04    0.0425
  0.045   0.0475  0.05    0.0525  0.055   0.0575  0.06    0.0625  0.065
  0.0675  0.07    0.0725  0.075   0.0775  0.08    0.0825  0.085   0.0875
  0.09    0.0925  0.095   0.0975  0.1     0.1025  0.105   0.1075  0.11
  0.1125  0.115   0.1175  0.12    0.1225  0.125   0.1275  0.13    0.1325
  0.135   0.1375  0.14    0.1425  0.145   0.1475  0.15    0.1525  0.155
  0.1575  0.16    0.1625  0.165   0.1675  0.17    0.1725  0.175   0.1775
  0.18    0.1825  0.185   0.1875  0.19    0.1925  0.195   0.1975  0.2
  0.2025  0.205   0.2075  0.21    0.2125  0.215   0.2175  0.22    0.2225
  0.225   0.2275  0.23    0.2325  0.235   0.2375  0.24    0.2425  0.245
  0.2475  0.25    0.2525  0.255   0.2575  0.26    0.2625  0.265   0.2675
  0.27    0.2725  0.275   0.2775  0.28    0.2825  0.285   0.2875  0.29
  0.2925  0.295   0.2975  0.3     0.3025  0.305   0.3075  0.31    0.3125
  0.315   0.3175  0.32    0.3225  0.325   0.3275  0.33    0.3325  0.335
  0.3375  0.34    0.3425  0.345   0.3475  0.35    0.3525  0.355   0.3575
  0.36    0.3625  0.365   0.3675  0.37    0.3725  0.375   0.3775  0.38
  0.3825  0.385   0.3875  0.39    0.3925  0.395   0.3975  0.4     0.4025
  0.405   0.4075  0.41    0.4125  0.415   0.4175  0.42    0.4225  0.425
  0.4275  0.43    0.4325  0.435   0.4375  0.44    0.4425  0.445   0.4475
  0.45    0.4525  0.455   0.4575  0.46    0.4625  0.465   0.4675  0.47
  0.4725  0.475   0.4775  0.48    0.4825  0.485   0.4875  0.49    0.4925
  0.495   0.4975  0.5     0.5025  0.505   0.5075  0.51    0.5125  0.515
  0.5175  0.52    0.5225  0.525   0.5275  0.53    0.5325  0.535   0.5375
  0.54    0.5425  0.545   0.5475  0.55    0.5525  0.555   0.5575  0.56
  0.5625  0.565   0.5675  0.57    0.5725  0.575   0.5775  0.58    0.5825
  0.585   0.5875  0.59    0.5925  0.595   0.5975  0.6     0.6025  0.605
  0.6075  0.61    0.6125  0.615   0.6175  0.62    0.6225  0.625   0.6275
  0.63    0.6325  0.635   0.6375  0.64    0.6425  0.645   0.6475  0.65
  0.6525  0.655   0.6575  0.66    0.6625  0.665   0.6675  0.67    0.6725
  0.675   0.6775  0.68    0.6825  0.685   0.6875  0.69    0.6925  0.695
  0.6975  0.7     0.7025  0.705   0.7075  0.71    0.7125  0.715   0.7175
  0.72    0.7225  0.725   0.7275  0.73    0.7325  0.735   0.7375  0.74
  0.7425  0.745   0.7475  0.75    0.7525  0.755   0.7575  0.76    0.7625
  0.765   0.7675  0.77    0.7725  0.775   0.7775  0.78    0.7825  0.785
  0.7875  0.79    0.7925  0.795   0.7975  0.8     0.8025  0.805   0.8075
  0.81    0.8125  0.815   0.8175  0.82    0.8225  0.825   0.8275  0.83
  0.8325  0.835   0.8375  0.84    0.8425  0.845   0.8475  0.85    0.8525
  0.855   0.8575  0.86    0.8625  0.865   0.8675  0.87    0.8725  0.875
  0.8775  0.88    0.8825  0.885   0.8875  0.89    0.8925  0.895   0.8975
  0.9     0.9025  0.905   0.9075  0.91    0.9125  0.915   0.9175  0.92
  0.9225  0.925   0.9275  0.93    0.9325  0.935   0.9375  0.94    0.9425
  0.945   0.9475  0.95    0.9525  0.955   0.9575  0.96    0.9625  0.965
  0.9675  0.97    0.9725  0.975   0.9775  0.98    0.9825  0.985   0.9875
  0.99    0.9925  0.995   0.9975  1.    ]

例如,现在我尝试df[0.47]并得到以下错误:

Now for example I try df[0.47] and get the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()

pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()

KeyError: 0.47

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-117-76c97f917184> in <module>()
----> 1 df[0.47]

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()

pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()

KeyError: 0.47

我不明白为什么会这样.

I don't understand why this happens.

推荐答案

此处的问题是由于不精确的浮点数,您可以使用方法get_slice_bound返回该行的顺序位置:

The issue here is due to float imprecision, you can use the method get_slice_bound to return you the ordinal position for that row:

In [237]:
df.iloc[df.index.get_slice_bound(0.47, side='left', kind='loc')]

Out[237]:
0    0.854001
Name: 0.47, dtype: float64

我们可以看到该索引标签的实际价值:

We can see the real value of that index label:

In [238]:
df.index[df.index.get_slice_bound(0.47, side='left', kind='loc')]
Out[238]:
0.47000000000000003

虽然熊猫确实支持float64Index,但是这样做会给精确的标签查找带来麻烦,但最好还是保留默认的Int64Index

Whilst pandas does support float64Index it's going to be problematic for exact label lookup by doing this, you'd be better off sticking with the default Int64Index

get_slice_bound是一个未公开的方法,但是文档字符串为您提供了足够的信息:

get_slice_bound is an undocumented method but the docstring gives you enough info:

Signature: df.index.get_slice_bound(label, side, kind) Docstring: Calculate slice bound that corresponds to given label.

Returns leftmost (one-past-the-rightmost if ``side=='right'``) position of given label.

Parameters
---------- label : object side : {'left', 'right'} kind : {'ix', 'loc', 'getitem'}

您还可以使用 get_loc 并通过method='nearest'来实现相同的目的:

You can also use get_loc and pass method='nearest' to achieve the same:

In [240]:
df.iloc[df.index.get_loc(0.47, method='nearest')]

Out[240]:
0    0.854001
Name: 0.47, dtype: float64

这篇关于 pandas KeyError,使用浮点数时找不到索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆