尴尬的 pandas / Python数据帧索引的替代品：df_REPEATED [df_REPEATED ['var']]> 0？ [英] Alternatives to awkward Pandas/Python Dataframe Indexing: df_REPEATED[df_REPEATED['var']]>0?

查看：107 发布时间：2018/8/2 13:41:42 python dataframe indexing syntax

本文介绍了尴尬的 pandas / Python数据帧索引的替代品：df_REPEATED [df_REPEATED ['var']]> 0？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在Pandas / Python中，我必须在对自己的变量进行调节时两次写入数据帧名称：

In Pandas/Python, I have to write the dataframe name twice when conditioning on its own variable:

df_REPEATED[df_REPEATED['var']>0]

这种情况发生了很多次似乎不合理。 90-99％的用户会在95％的时间内感到满意，例如：

This happens so many times it seems unreasonable. 90-99% of users would be happy 95% of the time with something like:

df_REPEATED[['var']>0]

使用 .loc [] 。是否有任何替代或快捷方式来写这个？

This syntax is also necessary using .loc[]. Is there any alternative or shortcut to writing this?

另一方面，是否有一些我不理解的用例，实际上我在python中的教育实际上是不够的？

On the other hand, is there some use case I don't understand and actually my education in python has been woefully insufficient?

推荐答案

不是官方答案......但它最近使我的生活更加简单：

Not an official answer... but it already made my life simpler recently:

https://github.com /toobaz/generic_utils/blob/master/generic_utils/pandas/where.py

您无需下载整个仓库：保存文件和做

You don't need to download the entire repo: saving the file and doing

from where import Where as W

应该足够了。然后你就像这样使用它：

should suffice. Then you use it like this:

df = pd.DataFrame([[1, 2, True],
                   [3, 4, False], 
                   [5, 7, True]],
                  index=range(3), columns=['a', 'b', 'c'])
# On specific column:
print(df.loc[W['a'] > 2])
print(df.loc[-W['a'] == W['b']])
print(df.loc[~W['c']])
# On entire DataFrame:
print(df.loc[W.sum(axis=1) > 3])
print(df.loc[W[['a', 'b']].diff(axis=1)['b'] > 1])

稍微不那么愚蠢的用法示例：

A slightly less stupid usage example:

data = pd.read_csv('ugly_db.csv').loc[~(W == '$null$').any(axis=1)]

编辑： 此答案提到了一种不需要外部组件的类似方法，导致：

this answer mentions an analogous approach not requiring external components, resulting in:

data = (pd.read_csv('ugly_db.csv')
          .loc[lambda df : ~(df == '$null$').any(axis=1)])

另一种可能性是使用 .apply（），如

and another possibility is to use .apply(), as in

data = (pd.read_csv('ugly_db.csv')
          .pipe(lambda df : ~(df == '$null$').any(axis=1)))

这篇关于尴尬的 pandas / Python数据帧索引的替代品：df_REPEATED [df_REPEATED ['var']]> 0？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

尴尬的 pandas / Python数据帧索引的替代品：df_REPEATED [df_REPEATED ['var']]> 0？ [英] Alternatives to awkward Pandas/Python Dataframe Indexing: df_REPEATED[df_REPEATED['var']]>0?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

尴尬的 pandas / Python数据帧索引的替代品：df_REPEATED [df_REPEATED ['var']]&gt; 0？ [英] Alternatives to awkward Pandas/Python Dataframe Indexing: df_REPEATED[df_REPEATED[&#39;var&#39;]]&gt;0?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

尴尬的 pandas / Python数据帧索引的替代品：df_REPEATED [df_REPEATED ['var']]> 0？ [英] Alternatives to awkward Pandas/Python Dataframe Indexing: df_REPEATED[df_REPEATED['var']]>0?

登录关闭