尴尬的 pandas / Python数据帧索引的替代品:df_REPEATED [df_REPEATED ['var']]> 0? [英] Alternatives to awkward Pandas/Python Dataframe Indexing: df_REPEATED[df_REPEATED['var']]>0?

查看:107
本文介绍了尴尬的 pandas / Python数据帧索引的替代品:df_REPEATED [df_REPEATED ['var']]> 0?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Pandas / Python中,我必须在对自己的变量进行调节时两次写入数据帧名称:

In Pandas/Python, I have to write the dataframe name twice when conditioning on its own variable:

df_REPEATED[df_REPEATED['var']>0]

这种情况发生了很多次似乎不合理。 90-99%的用户会在95%的时间内感到满意,例如:

This happens so many times it seems unreasonable. 90-99% of users would be happy 95% of the time with something like:

df_REPEATED[['var']>0]

使用 .loc [] 。是否有任何替代或快捷方式来写这个?

This syntax is also necessary using .loc[]. Is there any alternative or shortcut to writing this?

另一方面,是否有一些我不理解的用例,实际上我在python中的教育实际上是不够的?

On the other hand, is there some use case I don't understand and actually my education in python has been woefully insufficient?

推荐答案

不是官方答案......但它最近使我的生活更加简单:

Not an official answer... but it already made my life simpler recently:

https://github.com /toobaz/generic_utils/blob/master/generic_utils/pandas/where.py

您无需下载整个仓库:保存文件和做

You don't need to download the entire repo: saving the file and doing

from where import Where as W

应该足够了。然后你就像这样使用它:

should suffice. Then you use it like this:

df = pd.DataFrame([[1, 2, True],
                   [3, 4, False], 
                   [5, 7, True]],
                  index=range(3), columns=['a', 'b', 'c'])
# On specific column:
print(df.loc[W['a'] > 2])
print(df.loc[-W['a'] == W['b']])
print(df.loc[~W['c']])
# On entire DataFrame:
print(df.loc[W.sum(axis=1) > 3])
print(df.loc[W[['a', 'b']].diff(axis=1)['b'] > 1])

稍微不那么愚蠢的用法示例:

A slightly less stupid usage example:

data = pd.read_csv('ugly_db.csv').loc[~(W == '$null$').any(axis=1)]

编辑: 此答案提到了一种不需要外部组件的类似方法,导致:

this answer mentions an analogous approach not requiring external components, resulting in:

data = (pd.read_csv('ugly_db.csv')
          .loc[lambda df : ~(df == '$null$').any(axis=1)])

另一种可能性是使用 .apply(),如

and another possibility is to use .apply(), as in

data = (pd.read_csv('ugly_db.csv')
          .pipe(lambda df : ~(df == '$null$').any(axis=1)))

这篇关于尴尬的 pandas / Python数据帧索引的替代品:df_REPEATED [df_REPEATED ['var']]> 0?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆