使用不在 pandas 中的矢量化逻辑来过滤框架 [英] Using vectorized logical not in pandas to filter a frame

查看：84 发布时间：2020/5/24 4:13:26 python pandas

本文介绍了使用不在 pandas 中的矢量化逻辑来过滤框架的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个要修剪的熊猫数据框.我想取出该部分为2且标识符不是以数字开头的行.首先，我想数一数.如果我运行这个

I have a pandas data frame I would like to prune. I want to take out the rows where the section is 2 and the identifier does not start with a digit. First I would like to count them. If I run this

len(analytic_events[analytic_events['section']==2].index)

我得到结果1247669

I get the result 1247669

当我缩小范围并运行它时

When I narrow things down and run this

len(analytic_events[(analytic_events['section']==2) & ~(analytic_events['identifier'][0].isdigit())].index)

我得到的答案完全相同:1247669

I get exactly the same answer: 1247669

例如，我知道十个行将其作为标识符

I know, for example, that ten of the rows have this as their identifier

.help.your_tools.subtopic2

不以数字开头，并且15,000行以其作为标识符

which does not start with a digit, and that 15,000 rows have this as their identifier

240.1007

这样做以数字开头.

为什么我的过滤器传递所有行，而不是仅传递其标识符不是以数字开头的行?

Why is my filter passing all the rows rather than just those whose identifier does not start with a digit?

推荐答案

使用str处理文本函数，使用str[0]表示字符串的第一个值，使用最后一个sum表示计数True的值:

Use str for working with text functions and str[0] for first value of string, last sum for count Trues values:

mask= ((analytic_events['section']==2) & 
       ~(analytic_events['identifier'].str[0].str.isdigit()))

print (mask.sum())

如果性能很重要且没有缺失值，请使用列表理解:

If performance is important and no missing values use list comprehension:

arr = ~np.array([x[0].isdigit() for x in analytic_events['identifier']])
mask = ((analytic_events['section']==2) & arr)

为什么我的过滤器传递所有行，而不是仅传递其标识符不是以数字开头的行?

Why is my filter passing all the rows rather than just those whose identifier does not start with a digit?

如果测试解决方案的输出:

If test output of your solution:

analytic_events = pd.DataFrame(
                        {'section':[2,2,2,3,2],
                         'identifier':['4hj','8hj','gh','th','h6h']})

print (analytic_events)
   section identifier
0        2        4hj
1        2        8hj
2        2         gh
3        3         th
4        2        h6h

获取列的第一个值:

print ((analytic_events['identifier'][0]))
4hj

检查标量的位数是否为

print ((analytic_events['identifier'][0].isdigit()))
False

print (~(analytic_events['identifier'][0].isdigit()))
-1

带有第一个蒙版的链条将其转换为True:

With chain with first mask it is converted to True:

print ((analytic_events['section']==2) & ~(analytic_events['identifier'][0].isdigit()))
0     True
1     True
2     True
3    False
4     True
Name: section, dtype: bool

所以它的工作原理就像不存在第二个面具一样

So it working same like second mask not exist:

print (analytic_events['section']==2)
0     True
1     True
2     True
3    False
4     True
Name: section, dtype: bool

这篇关于使用不在 pandas 中的矢量化逻辑来过滤框架的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用不在 pandas 中的矢量化逻辑来过滤框架 [英] Using vectorized logical not in pandas to filter a frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用不在 pandas 中的矢量化逻辑来过滤框架 [英] Using vectorized logical not in pandas to filter a frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭