尴尬的 pandas / Python数据帧索引的替代品:df_REPEATED [df_REPEATED ['var']]> 0? [英] Alternatives to awkward Pandas/Python Dataframe Indexing: df_REPEATED[df_REPEATED['var']]>0?
问题描述
在Pandas / Python中,我必须在对自己的变量进行调节时两次写入数据帧名称:
In Pandas/Python, I have to write the dataframe name twice when conditioning on its own variable:
df_REPEATED[df_REPEATED['var']>0]
这种情况发生了很多次似乎不合理。 90-99%的用户会在95%的时间内感到满意,例如:
This happens so many times it seems unreasonable. 90-99% of users would be happy 95% of the time with something like:
df_REPEATED[['var']>0]
使用 .loc []
。是否有任何替代或快捷方式来写这个?
This syntax is also necessary using .loc[]
. Is there any alternative or shortcut to writing this?
另一方面,是否有一些我不理解的用例,实际上我在python中的教育实际上是不够的?
On the other hand, is there some use case I don't understand and actually my education in python has been woefully insufficient?
推荐答案
不是官方答案......但它最近使我的生活更加简单:
Not an official answer... but it already made my life simpler recently:
https://github.com /toobaz/generic_utils/blob/master/generic_utils/pandas/where.py
您无需下载整个仓库:保存文件和做
You don't need to download the entire repo: saving the file and doing
from where import Where as W
应该足够了。然后你就像这样使用它:
should suffice. Then you use it like this:
df = pd.DataFrame([[1, 2, True],
[3, 4, False],
[5, 7, True]],
index=range(3), columns=['a', 'b', 'c'])
# On specific column:
print(df.loc[W['a'] > 2])
print(df.loc[-W['a'] == W['b']])
print(df.loc[~W['c']])
# On entire DataFrame:
print(df.loc[W.sum(axis=1) > 3])
print(df.loc[W[['a', 'b']].diff(axis=1)['b'] > 1])
稍微不那么愚蠢的用法示例:
A slightly less stupid usage example:
data = pd.read_csv('ugly_db.csv').loc[~(W == '$null$').any(axis=1)]
编辑: 此答案提到了一种不需要外部组件的类似方法,导致:
this answer mentions an analogous approach not requiring external components, resulting in:
data = (pd.read_csv('ugly_db.csv')
.loc[lambda df : ~(df == '$null$').any(axis=1)])
另一种可能性是使用 .apply()
,如
and another possibility is to use .apply()
, as in
data = (pd.read_csv('ugly_db.csv')
.pipe(lambda df : ~(df == '$null$').any(axis=1)))
这篇关于尴尬的 pandas / Python数据帧索引的替代品:df_REPEATED [df_REPEATED ['var']]> 0?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!