如何根据多列(其他列)条件来选择数据框行? [英] How to select dataframe rows according to multi-(other column)-condition on columnar groups?
问题描述
将以下数据框复制到剪贴板:
textId分数textInfo
0 name1 1.0 text_stuff
1 name1 2.0 different_text_stuff
2 name1 2.0 text_stuff
3 name2 1.0 different_text_stuff
4 name2 1.3 different_text_stuff
5 name2 2.0 still_different_text
6 name2 1.0 yoko ono
7 name2 3.0 I lika da Gweneth
8 name3 1.0总是折衷
9 name3 3.0什么?!
现在使用
将pandas导入为pd
df = pd.read_clipboard(sep ='\s\s +')
将其加载到您的环境中。如果得分$ c>,那么如何将这个数据框分割成 all 特定
至少包含一个等于1.0,2.0和3.0的 textId
的行$ c> group textId 分数
。这里,由于缺少分数
组,因此所需操作的结果将排除 textId
rows name1 因为它的分数
组缺少一个2.0:
<$ p $ 3.0,所以排除了 name3 p>
textId分数textInfo
0 name2 1.0 different_text_stuff
1 name2 1.3 different_text_stuff
2 name2 2.0 still_different_text
3 name2 1.0 yoko ono
4 name2 3.0 I
尝试
-
df [df.textId ==textIdRowName& df.score == 1.0& df.score == 2.0
不正确,因为条件不是对
& &安培; df.score == 3.0]textId
组执行
操作,而是只对单个行。如果可以将
重写为匹配textId
组,则可以将它放置在$ for循环中,并为唯一的 textIdRowName 的。这样一个函数
将会收集一系列中的textId
的名字(比如说
textIdThatMatchScore123
),然后可用于将原始df
分割为df [df.textId.isin(textIdThatMatchScore123)]
。 - 未能在
groupby
。
以下是一个解决方案 - groupby
textId,然后只保留那些独特的score值是超集的组(> =
[1.0,2.0,3.0] 。
在[58]中:df.groupby('textId')。filter(lambda x:set(x ['score'])> = set([1.,2,3。]))
Out [b]:
textId得分textInfo
3 name2 1.0 different_text_stuff
4 name2 1.3 different_text_stuff
5 name2 2.0 still_different_text
6 name2 1.0 yoko ono
7 name2 3.0 I lika da Gweneth
Copy the following dataframe to your clipboard:
textId score textInfo
0 name1 1.0 text_stuff
1 name1 2.0 different_text_stuff
2 name1 2.0 text_stuff
3 name2 1.0 different_text_stuff
4 name2 1.3 different_text_stuff
5 name2 2.0 still_different_text
6 name2 1.0 yoko ono
7 name2 3.0 I lika da Gweneth
8 name3 1.0 Always a tradeoff
9 name3 3.0 What?!
Now use
import pandas as pd
df=pd.read_clipboard(sep='\s\s+')
to load it into your environment. How does one slice this dataframe such that all the rows of a particular textId
are returned if the score
group of that textId
includes at least one score
that equals 1.0, 2.0 and 3.0? Here, the desired operation's result would exclude textId
rows name1 since its score
group is missing a 3.0 and exclude name3 since its score
group is missing a 2.0:
textId score textInfo
0 name2 1.0 different_text_stuff
1 name2 1.3 different_text_stuff
2 name2 2.0 still_different_text
3 name2 1.0 yoko ono
4 name2 3.0 I lika da Gweneth
Attempts
df[df.textId == "textIdRowName" & df.score == 1.0 & df.score == 2.0 & & df.score == 3.0]
isn't right since the condition isn't acting on thetextId
group but only individual rows. If this could be rewritten to match againsttextId
groups then it could be placed in a for loop and fed the unique textIdRowName's. Such a function would collect the names of thetextId
in a series (saytextIdThatMatchScore123
) that could then be used to slice the original df likedf[df.textId.isin(textIdThatMatchScore123)]
.- Failing at
groupby
.
Here's one solution - groupby
textId, then keep only those groups where the unique values of score is a superset (>=
) of [1.0, 2.0, 3.0]
.
In [58]: df.groupby('textId').filter(lambda x: set(x['score']) >= set([1.,2.,3.]))
Out[58]:
textId score textInfo
3 name2 1.0 different_text_stuff
4 name2 1.3 different_text_stuff
5 name2 2.0 still_different_text
6 name2 1.0 yoko ono
7 name2 3.0 I lika da Gweneth
这篇关于如何根据多列(其他列)条件来选择数据框行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!