pandas -合并两个数据框,创建新列,将值追加到数组 [英] Pandas - Merge two data frames, create new column, append values to array

查看:50
本文介绍了 pandas -合并两个数据框,创建新列,将值追加到数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望在每个数据帧中的同一id上合并两个数据帧,但要创建一个新列,并将指定列中的任何值附加到新数据帧列中的数组.我希望在第二个数据框中看到多个匹配的ID.

I am looking to merge two data frames on the same id in each dataframe, but to create a new column and append any values in a specified column to an array in the new dataframe column. I would expect to see multiple matching ids in the second data frame.

下面是一个示例,以阐明我要查找的内容:

Here is an example to clarify what I am looking for:

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.random.randint(3, size=(5, 4)), columns=('ID', 'X1', 'X2', 'X3'))
print(df1)

   ID  X1  X2  X3
0   1   1   0   2
1   0   1   0   1
2   0   1   2   2
3   1   2   2   0
4   2   1   0   0

d = {'ID' : pd.Series([1, 2, 1, 4, 5]), 'Tag' : pd.Series(['One', 'Two', 'Two', 'Four', 'Five'])}
df2 = (pd.DataFrame(d))
print(df2)

   ID   Tag
0   1   One
1   2   Two
2   1   Two
3   4  Four
4   5  Five

这是我期望在第一行看到的内容:

This is what I am expecting to see for the first row:

   ID  X1  X2  X3  Merged_Tags
0   1   1   0   2  ['One', 'Two']

我想通过查看所有df2来查找匹配的ids(会有多个匹配ID ),从而加入df1的id列.找到匹配的id时,应将存储在df2['Tag']中的值附加到df1中的列(可能是数组)上.

I want to join on the id column of df1 by looking through all of df2 for matching ids (there will be multiple matching ids). When a matching id is found, the value stored in df2['Tag'] should be appended to a column in df1, perhaps an array.

我对此进行了迭代处理,但是我的数据集相对较大,因此尚未找到可行的方法.

I managed this iteratively but my dataset is relativity large and so have not found it viable.

推荐答案

尝试一下:

In [35]: pd.merge(df1, df2.groupby('ID').Tag.apply(list).reset_index(), on='ID', how='left')
Out[35]:
   ID  X1  X2  X3         Tag
0   2   1   1   2       [Two]
1   1   0   1   1  [One, Two]
2   0   2   1   2         NaN
3   1   0   2   2  [One, Two]
4   0   0   2   2         NaN

或者您可以使用map()方法:

In [38]: df1['Merged_Tags'] = df1.ID.map(df2.groupby('ID').Tag.apply(list))

In [39]: df1
Out[39]:
   ID  X1  X2  X3 Merged_Tags
0   2   1   1   2       [Two]
1   1   0   1   1  [One, Two]
2   0   2   1   2         NaN
3   1   0   2   2  [One, Two]
4   0   0   2   2         NaN

这篇关于 pandas -合并两个数据框,创建新列,将值追加到数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆