根据其分组属性过滤DataFrame [英] Filtering DataFrame based on its groups properties

查看：112 发布时间：2020/5/24 4:22:34 python pandas grouping

本文介绍了根据其分组属性过滤DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我们有问题跟踪日志，我们想找出问题所有者(记录问题时间最多的人)

Let's say we have issue tracker logs and we want to find out issues owners (guys who logged the most time to the issue)

用户可以多次记录同一问题的时间
如果2个用户同时登录，则均为所有者

所以我们有一些示例数据:

So we have some sample data:

df = pd.DataFrame([
        [1, 10, 'John'],
        [1, 20, 'John'],
        [1, 30, 'Tom'],
        [1, 10, 'Bob'],
        [2, 25, 'John'],
        [2, 15, 'Bob']], columns = ['IssueKey','TimeSpent','User'])

作为输出，我们想要这样的东西:

As the output we want something like this:

issues_owners = pd.DataFrame([
        [1, 30, 'John'],
        [1, 30, 'Tom'],
        [2, 25, 'John']], columns = ['IssueKey','TimeSpent','User'])

John和Tom都是问题1的所有者，因为他们都花了30分钟的时间.
John实际上在另外2天中处理了问题1
John也是问题2
鲍勃很懒，没有任何问题:)

Both John and Tom are owners of issue 1, as they both spent 30 minutes on it.
John actually worked on issue 1 on 2 separate days
John is also the owner of the issue 2
Bob is lazy and doesn't own any issues :)

我想到的东西让人很恶心(我是Python的新手):

What I came up with feels quite disgusting (I'm relatively new to Python):

df = df.groupby(['IssueKey', 'User']).sum().reset_index()
maxTimesPerIssue = df.groupby('IssueKey')['TimeSpent'].max().reset_index()
maxTimesPerIssue = dict(zip(maxTimesPerIssue['IssueKey'], maxTimesPerIssue['TimeSpent']))
df['MaxTimePerIssue'] = [maxTimesPerIssue[key] for key in df['IssueKey']]
df = df[df.MaxTimePerIssue == df.TimeSpent]
df = df.drop(columns=['MaxTimePerIssue'])

我不喜欢自己的Python代码:

What I dislike about my Python code:

maxTimesPerIssue出现在处理df破坏思维过程(或管道)的过程中
maxTimesPerIssue本身的创建有点混乱
添加MaxTimePerIssue df
由于使用了许多低级的东西，例如reset_index()，list()，dict()，列表解析，删除列

maxTimesPerIssue appears in the middle of processing the df disrupting the thought process (or pipeline)
The creation of maxTimesPerIssue itself is kind of messy
Adding MaxTimePerIssue the df
It's definitely way less self-explanatory than the C# version, due to using lots of low level stuff like: reset_index(), list(), dict(), list comprehensions, dropping columns

有人可以帮我清理吗?

根据其分组属性过滤DataFrame [英] Filtering DataFrame based on its groups properties

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

根据其分组属性过滤DataFrame [英] Filtering DataFrame based on its groups properties

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭