从数据框中提取特定行 [英] Extracting specific rows from a data frame

查看:65
本文介绍了从数据框中提取特定行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框 df1,其中包含两列ids"和names" -

I have a data frame df1 with two columns 'ids' and 'names' -

ids     names
fhj56   abc
ty67s   pqr
yu34o   xyz

我有另一个数据框 df2,其中一些列是 -

I have another data frame df2 which has some of the columns being -

user     values                       
1        ['fhj56','fg7uy8']
2        ['glao0','rt56yu','re23u']
3        ['fhj56','ty67s','hgjl09']

我的结果应该给我来自 df2 的那些用户,这些用户的值至少包含来自 df1 的一个 id,并告诉我哪些 id 负责将它们放入结果表中.结果应该是这样的 -

My result should give me those users from df2 whose values contains at least one of the ids from df1 and also tell which ids are responsible to put them into resultant table. Result should look like -

   user     values_responsible     names
   1        ['fhj56']              ['abc']
   3        ['fhj56','ty67s']      ['abc','pqr']

用户 2 没有出现在结果表中,因为 df1 中不存在它的任何值.

User 2 doesn't come in resultant table because none of its values exist in df1.

我试图这样做 -

df2.query('values in @df1.ids')

但这似乎效果不佳.

推荐答案

您可以遍历行,然后使用 .locisin 来查找匹配的行来自 df2.我将此过滤后的数据框转换为字典

You can iterate through the rows and then use .loc together with isin to find the matching rows from df2. I converted this filtered dataframe into a dictionary

ids = []
names = []
users = []
for _, row in df2.iterrows():
    result = df1.loc[df1['ids'].isin(row['values'])]
    if not result.empty:
        ids.append(result['ids'].tolist())
        names.append(result['names'].tolist())
        users.append(row['user'])

>>> pd.DataFrame({'user': users, 'values_responsible': ids, 'names': names})[['user', 'values_responsible', 'names']]
   user values_responsible       names
0     1            [fhj56]       [abc]
1     3     [fhj56, ty67s]  [abc, pqr]

或者,为了整洁的数据:

Or, for tidy data:

ids = []
names = []
users = []
for _, row in df2.iterrows():
    result = df1.loc[df1['ids'].isin(row['values'])]
    if not result.empty:
        ids.extend(result['ids'].tolist())
        names.extend(result['names'].tolist())
        users.extend([row['user']] * len(result['ids']))

>>> pd.DataFrame({'user': users, 'values_responsible': ids, 'names': names})[['user', 'values_responsible', 'names']])
   user values_responsible names
0     1              fhj56   abc
1     3              fhj56   abc
2     3              ty67s   pqr

这篇关于从数据框中提取特定行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆