根据另一个数据框中的数据量删除一个数据框中的行 [英] Delete rows of a data frame based on quantity of data in another data frame
问题描述
我有两个熊猫数据帧A
和B
. B
是A的子集.
I have two pandas data frames A
and B
. B
is a subset of A.
我想从A中删除所有数字,如果它在B中.但是,如果一个数字在A中出现两次,在B中出现1次,那么它只会从A中删除该数字出现1次
I want to delete all numbers from A if it's in B. But, if a number occurs two times in A and 1 time in B then it will only delete 1 occurrence of the number from A.
这是我的示例数据集:
df_A df_B
[Test] [Test]
1 1
2 2
3 5
2 5
4
5
5
操作后,我希望将新的数据帧c作为
After the operation I want new data frame c as
df_C
[Test]
3
2
4
您能建议怎么做吗?
The suggested duplicate removes all occurrences from A
if present in B
, not just the first N occurrences.
推荐答案
我可能会做些事情(偷取SR的设置):
I might do something (stealing SR's setup):
dfA = pd.DataFrame({'A': [1, 2, 3, 2, 4, 5, 5]})
dfB = pd.DataFrame({'B': [1, 2, 5, 5]})
counts = dfA.groupby('A').cumcount()
limits = dfB['B'].value_counts().reindex(dfA.A).fillna(0).values
dfC = dfA.loc[counts >= limits]
这给了我
In [121]: dfC
Out[121]:
A
2 3
3 2
4 4
这可以通过使用groupby来获得A中给定值出现之前的次数:
This works by using groupby to get the number of times a given value in A has been seen before:
In [124]: dfA.groupby('A').cumcount()
Out[124]:
0 0
1 0
2 0
3 1
4 0
5 0
6 1
dtype: int64
并使用value_counts
来获取限制,然后我们重新索引以匹配计数:
and using value_counts
to get the limits which we then reindex to match the counts:
In [139]: dfB['B'].value_counts()
Out[139]:
5 2
2 1
1 1
Name: B, dtype: int64
In [140]: dfB['B'].value_counts().reindex(dfA.A)
Out[140]:
A
1 1.0
2 1.0
3 NaN
2 1.0
4 NaN
5 2.0
5 2.0
Name: B, dtype: float64
这篇关于根据另一个数据框中的数据量删除一个数据框中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!