pandas -合并两个数据框,创建新列,将值追加到数组 [英] Pandas - Merge two data frames, create new column, append values to array
问题描述
我希望在每个数据帧中的同一id
上合并两个数据帧,但要创建一个新列,并将指定列中的任何值附加到新数据帧列中的数组.我希望在第二个数据框中看到多个匹配的ID.
I am looking to merge two data frames on the same id
in each dataframe, but to create a new column and append any values in a specified column to an array in the new dataframe column. I would expect to see multiple matching ids in the second data frame.
下面是一个示例,以阐明我要查找的内容:
Here is an example to clarify what I am looking for:
import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.random.randint(3, size=(5, 4)), columns=('ID', 'X1', 'X2', 'X3'))
print(df1)
ID X1 X2 X3
0 1 1 0 2
1 0 1 0 1
2 0 1 2 2
3 1 2 2 0
4 2 1 0 0
d = {'ID' : pd.Series([1, 2, 1, 4, 5]), 'Tag' : pd.Series(['One', 'Two', 'Two', 'Four', 'Five'])}
df2 = (pd.DataFrame(d))
print(df2)
ID Tag
0 1 One
1 2 Two
2 1 Two
3 4 Four
4 5 Five
这是我期望在第一行看到的内容:
This is what I am expecting to see for the first row:
ID X1 X2 X3 Merged_Tags
0 1 1 0 2 ['One', 'Two']
我想通过查看所有df2来查找匹配的ids
(会有多个匹配ID ),从而加入df1的id
列.找到匹配的id
时,应将存储在df2['Tag']
中的值附加到df1中的列(可能是数组)上.
I want to join on the id
column of df1 by looking through all of df2 for matching ids
(there will be multiple matching ids). When a matching id
is found, the value stored in df2['Tag']
should be appended to a column in df1, perhaps an array.
我对此进行了迭代处理,但是我的数据集相对较大,因此尚未找到可行的方法.
I managed this iteratively but my dataset is relativity large and so have not found it viable.
推荐答案
尝试一下:
In [35]: pd.merge(df1, df2.groupby('ID').Tag.apply(list).reset_index(), on='ID', how='left')
Out[35]:
ID X1 X2 X3 Tag
0 2 1 1 2 [Two]
1 1 0 1 1 [One, Two]
2 0 2 1 2 NaN
3 1 0 2 2 [One, Two]
4 0 0 2 2 NaN
或者您可以使用map()
方法:
In [38]: df1['Merged_Tags'] = df1.ID.map(df2.groupby('ID').Tag.apply(list))
In [39]: df1
Out[39]:
ID X1 X2 X3 Merged_Tags
0 2 1 1 2 [Two]
1 1 0 1 1 [One, Two]
2 0 2 1 2 NaN
3 1 0 2 2 [One, Two]
4 0 0 2 2 NaN
这篇关于 pandas -合并两个数据框,创建新列,将值追加到数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!