pandas KeyError,使用浮点数时找不到索引 [英] pandas KeyError, can't find index when using floats
问题描述
我遇到以下问题:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(401), index=np.linspace(0, 1, 401))
print(np.linspace(0, 1, 401))
我们看到0.47
在其中:
[ 0. 0.0025 0.005 0.0075 0.01 0.0125 0.015 0.0175 0.02
0.0225 0.025 0.0275 0.03 0.0325 0.035 0.0375 0.04 0.0425
0.045 0.0475 0.05 0.0525 0.055 0.0575 0.06 0.0625 0.065
0.0675 0.07 0.0725 0.075 0.0775 0.08 0.0825 0.085 0.0875
0.09 0.0925 0.095 0.0975 0.1 0.1025 0.105 0.1075 0.11
0.1125 0.115 0.1175 0.12 0.1225 0.125 0.1275 0.13 0.1325
0.135 0.1375 0.14 0.1425 0.145 0.1475 0.15 0.1525 0.155
0.1575 0.16 0.1625 0.165 0.1675 0.17 0.1725 0.175 0.1775
0.18 0.1825 0.185 0.1875 0.19 0.1925 0.195 0.1975 0.2
0.2025 0.205 0.2075 0.21 0.2125 0.215 0.2175 0.22 0.2225
0.225 0.2275 0.23 0.2325 0.235 0.2375 0.24 0.2425 0.245
0.2475 0.25 0.2525 0.255 0.2575 0.26 0.2625 0.265 0.2675
0.27 0.2725 0.275 0.2775 0.28 0.2825 0.285 0.2875 0.29
0.2925 0.295 0.2975 0.3 0.3025 0.305 0.3075 0.31 0.3125
0.315 0.3175 0.32 0.3225 0.325 0.3275 0.33 0.3325 0.335
0.3375 0.34 0.3425 0.345 0.3475 0.35 0.3525 0.355 0.3575
0.36 0.3625 0.365 0.3675 0.37 0.3725 0.375 0.3775 0.38
0.3825 0.385 0.3875 0.39 0.3925 0.395 0.3975 0.4 0.4025
0.405 0.4075 0.41 0.4125 0.415 0.4175 0.42 0.4225 0.425
0.4275 0.43 0.4325 0.435 0.4375 0.44 0.4425 0.445 0.4475
0.45 0.4525 0.455 0.4575 0.46 0.4625 0.465 0.4675 0.47
0.4725 0.475 0.4775 0.48 0.4825 0.485 0.4875 0.49 0.4925
0.495 0.4975 0.5 0.5025 0.505 0.5075 0.51 0.5125 0.515
0.5175 0.52 0.5225 0.525 0.5275 0.53 0.5325 0.535 0.5375
0.54 0.5425 0.545 0.5475 0.55 0.5525 0.555 0.5575 0.56
0.5625 0.565 0.5675 0.57 0.5725 0.575 0.5775 0.58 0.5825
0.585 0.5875 0.59 0.5925 0.595 0.5975 0.6 0.6025 0.605
0.6075 0.61 0.6125 0.615 0.6175 0.62 0.6225 0.625 0.6275
0.63 0.6325 0.635 0.6375 0.64 0.6425 0.645 0.6475 0.65
0.6525 0.655 0.6575 0.66 0.6625 0.665 0.6675 0.67 0.6725
0.675 0.6775 0.68 0.6825 0.685 0.6875 0.69 0.6925 0.695
0.6975 0.7 0.7025 0.705 0.7075 0.71 0.7125 0.715 0.7175
0.72 0.7225 0.725 0.7275 0.73 0.7325 0.735 0.7375 0.74
0.7425 0.745 0.7475 0.75 0.7525 0.755 0.7575 0.76 0.7625
0.765 0.7675 0.77 0.7725 0.775 0.7775 0.78 0.7825 0.785
0.7875 0.79 0.7925 0.795 0.7975 0.8 0.8025 0.805 0.8075
0.81 0.8125 0.815 0.8175 0.82 0.8225 0.825 0.8275 0.83
0.8325 0.835 0.8375 0.84 0.8425 0.845 0.8475 0.85 0.8525
0.855 0.8575 0.86 0.8625 0.865 0.8675 0.87 0.8725 0.875
0.8775 0.88 0.8825 0.885 0.8875 0.89 0.8925 0.895 0.8975
0.9 0.9025 0.905 0.9075 0.91 0.9125 0.915 0.9175 0.92
0.9225 0.925 0.9275 0.93 0.9325 0.935 0.9375 0.94 0.9425
0.945 0.9475 0.95 0.9525 0.955 0.9575 0.96 0.9625 0.965
0.9675 0.97 0.9725 0.975 0.9775 0.98 0.9825 0.985 0.9875
0.99 0.9925 0.995 0.9975 1. ]
例如,现在我尝试df[0.47]
并得到以下错误:
Now for example I try df[0.47]
and get the following error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2133 try:
-> 2134 return self._engine.get_loc(key)
2135 except KeyError:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()
pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()
KeyError: 0.47
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-117-76c97f917184> in <module>()
----> 1 df[0.47]
/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)
2057 return self._getitem_multilevel(key)
2058 else:
-> 2059 return self._getitem_column(key)
2060
2061 def _getitem_column(self, key):
/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_column(self, key)
2064 # get column
2065 if self.columns.is_unique:
-> 2066 return self._get_item_cache(key)
2067
2068 # duplicate columns & possible reduce dimensionality
/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
1384 res = cache.get(item)
1385 if res is None:
-> 1386 values = self._data.get(item)
1387 res = self._box_item_values(item, values)
1388 cache[item] = res
/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
3541
3542 if not isnull(item):
-> 3543 loc = self.items.get_loc(item)
3544 else:
3545 indexer = np.arange(len(self.items))[isnull(self.items)]
/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2134 return self._engine.get_loc(key)
2135 except KeyError:
-> 2136 return self._engine.get_loc(self._maybe_cast_indexer(key))
2137
2138 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()
pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()
KeyError: 0.47
我不明白为什么会这样.
I don't understand why this happens.
推荐答案
此处的问题是由于不精确的浮点数,您可以使用方法get_slice_bound
返回该行的顺序位置:
The issue here is due to float imprecision, you can use the method get_slice_bound
to return you the ordinal position for that row:
In [237]:
df.iloc[df.index.get_slice_bound(0.47, side='left', kind='loc')]
Out[237]:
0 0.854001
Name: 0.47, dtype: float64
我们可以看到该索引标签的实际价值:
We can see the real value of that index label:
In [238]:
df.index[df.index.get_slice_bound(0.47, side='left', kind='loc')]
Out[238]:
0.47000000000000003
虽然熊猫确实支持float64Index
,但是这样做会给精确的标签查找带来麻烦,但最好还是保留默认的Int64Index
Whilst pandas does support float64Index
it's going to be problematic for exact label lookup by doing this, you'd be better off sticking with the default Int64Index
get_slice_bound
是一个未公开的方法,但是文档字符串为您提供了足够的信息:
get_slice_bound
is an undocumented method but the docstring gives you enough info:
Signature: df.index.get_slice_bound(label, side, kind) Docstring: Calculate slice bound that corresponds to given label.
Returns leftmost (one-past-the-rightmost if ``side=='right'``) position of given label.
Parameters
---------- label : object side : {'left', 'right'} kind : {'ix', 'loc', 'getitem'}
您还可以使用 get_loc
并通过method='nearest'
来实现相同的目的:
You can also use get_loc
and pass method='nearest'
to achieve the same:
In [240]:
df.iloc[df.index.get_loc(0.47, method='nearest')]
Out[240]:
0 0.854001
Name: 0.47, dtype: float64
这篇关于 pandas KeyError,使用浮点数时找不到索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!