如何根据多列(其他列)条件来选择数据框行? [英] How to select dataframe rows according to multi-(other column)-condition on columnar groups?

查看:141
本文介绍了如何根据多列(其他列)条件来选择数据框行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将以下数据框复制到剪贴板:

  textId分数textInfo 
0 name1 1.0 text_stuff
1 name1 2.0 different_text_stuff
2 name1 2.0 text_stuff
3 name2 1.0 different_text_stuff
4 name2 1.3 different_text_stuff
5 name2 2.0 still_different_text
6 name2 1.0 yoko ono
7 name2 3.0 I lika da Gweneth
8 name3 1.0总是折衷
9 name3 3.0什么?!

现在使用

 将pandas导入为pd 
df = pd.read_clipboard(sep ='\s\s +')

将其加载到您的环境中。如果得分,那么如何将这个数据框分割成 all 特定 textId 的行$ c> group textId 至少包含一个等于1.0,2.0和3.0的分数。这里,由于缺少分数组,因此所需操作的结果将排除 textId rows name1 因为它的分数组缺少一个2.0:



<$ p $ 3.0,所以排除了 name3 p> textId分数textInfo
0 name2 1.0 different_text_stuff
1 name2 1.3 different_text_stuff
2 name2 2.0 still_different_text
3 name2 1.0 yoko ono
4 name2 3.0 I



尝试




  1. df [df.textId ==textIdRowName& df.score == 1.0& df.score == 2.0
    & &安培; df.score == 3.0]
    不正确,因为条件不是对 textId 组执行
    操作,而是只对单个行。如果可以将
    重写为匹配 textId 组,则可以将它放置在$ for循环中,并为唯一的 textIdRowName 的。这样一个函数
    将会收集一系列中的 textId 的名字(比如说
    textIdThatMatchScore123 ),然后可用于将原始df
    分割为 df [df.textId.isin(textIdThatMatchScore123)]

  2. 未能在 groupby


解决方案

以下是一个解决方案 - groupby textId,然后只保留那些独特的score值是超集的组(> = [1.0,2.0,3.0] 。

 在[58]中:df.groupby('textId')。filter(lambda x:set(x ['score'])> = set([1.,2,3。]))
Out [b]:
textId得分textInfo
3 name2 1.0 different_text_stuff
4 name2 1.3 different_text_stuff
5 name2 2.0 still_different_text
6 name2 1.0 yoko ono
7 name2 3.0 I lika da Gweneth


Copy the following dataframe to your clipboard:

  textId   score              textInfo
0  name1     1.0            text_stuff
1  name1     2.0  different_text_stuff
2  name1     2.0            text_stuff
3  name2     1.0  different_text_stuff
4  name2     1.3  different_text_stuff
5  name2     2.0  still_different_text
6  name2     1.0              yoko ono
7  name2     3.0     I lika da Gweneth
8  name3     1.0     Always a tradeoff
9  name3     3.0                What?!

Now use

import pandas as pd
df=pd.read_clipboard(sep='\s\s+')

to load it into your environment. How does one slice this dataframe such that all the rows of a particular textId are returned if the score group of that textId includes at least one score that equals 1.0, 2.0 and 3.0? Here, the desired operation's result would exclude textId rows name1 since its score group is missing a 3.0 and exclude name3 since its score group is missing a 2.0:

  textId   score              textInfo
0  name2     1.0  different_text_stuff
1  name2     1.3  different_text_stuff
2  name2     2.0  still_different_text
3  name2     1.0              yoko ono
4  name2     3.0     I lika da Gweneth

Attempts

  1. df[df.textId == "textIdRowName" & df.score == 1.0 & df.score == 2.0 & & df.score == 3.0] isn't right since the condition isn't acting on the textId group but only individual rows. If this could be rewritten to match against textId groups then it could be placed in a for loop and fed the unique textIdRowName's. Such a function would collect the names of the textId in a series (say textIdThatMatchScore123) that could then be used to slice the original df like df[df.textId.isin(textIdThatMatchScore123)].
  2. Failing at groupby.

解决方案

Here's one solution - groupby textId, then keep only those groups where the unique values of score is a superset (>=) of [1.0, 2.0, 3.0].

In [58]: df.groupby('textId').filter(lambda x: set(x['score']) >= set([1.,2.,3.]))
Out[58]: 
  textId  score              textInfo
3  name2    1.0  different_text_stuff
4  name2    1.3  different_text_stuff
5  name2    2.0  still_different_text
6  name2    1.0              yoko ono
7  name2    3.0     I lika da Gweneth

这篇关于如何根据多列(其他列)条件来选择数据框行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆