合并DataFrame时如何合并两个列表列? [英] How to merge two list columns when merging DataFrames?

查看:382
本文介绍了合并DataFrame时如何合并两个列表列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个 DataFrame:

df1:

       date        ids
0   2015-10-13       [978]
1   2015-10-14  [978, 121]

df2:

       date        ids
0   2015-10-13  [978, 12]
1   2015-10-14     [2, 1]

当我根据 date 合并它们时,如下所示:

When I merge them based on date as below:

df = pandas.merge(df1, df2, on='date', sort=False)

我将拥有以下 DataFrame:

   date            ids_x             ids_y
0   2015-10-13    [978]            [978, 12]
1   2015-10-14    [978, 121]       [2, 1]

我希望将 one ids 列从两个列表中合并,例如 [978, 978, 12] 或者最好删除重复项并有一些东西像[978, 12].

I want to have one ids column merged from both lists like [978, 978, 12] or preferably removing duplicates and have something like [978, 12].

推荐答案

您可以将两列相加得到您要查找的列表,然后使用 df.drop()>axis=1 删除 ids_xids_y 列.示例 -

You can add both columns together to get the list you are looking for, and then use df.drop() with axis=1 to drop the ids_x and ids_y columns. Example -

df = pd.merge(df1, df2, on='date', sort=False)
df['ids'] = df['ids_x'] + df['ids_y']
df = df.drop(['ids_x','ids_y'],axis=1)

演示 -

In [65]: df
Out[65]:
         date       ids_x      ids_y
0  2015-10-13       [978]  [978, 12]
1  2015-10-14  [978, 121]     [2, 1]

In [67]: df['ids'] = df['ids_x'] + df['ids_y']

In [68]: df
Out[68]:
         date       ids_x      ids_y               ids
0  2015-10-13       [978]  [978, 12]    [978, 978, 12]
1  2015-10-14  [978, 121]     [2, 1]  [978, 121, 2, 1]

In [70]: df = df.drop(['ids_x','ids_y'],axis=1)

In [71]: df
Out[71]:
         date               ids
0  2015-10-13    [978, 978, 12]
1  2015-10-14  [978, 121, 2, 1]

<小时>

如果您也想删除重复值,并且不关心顺序,那么您可以使用 Series.apply 然后将列表转换为 set 然后回到list.示例 -


If you want to remove the duplicate values as well, and you do not care about order, then you can use Series.apply and then convert the list to set and then back to list. Example -

df['ids'] = df['ids'].apply(lambda x: list(set(x)))

演示 -

In [72]: df['ids'] = df['ids'].apply(lambda x: list(set(x)))

In [73]: df
Out[73]:
         date               ids
0  2015-10-13         [978, 12]
1  2015-10-14  [121, 978, 2, 1]

<小时>

或者如果你想用 numpy.unique() 做它,你也可以将它与 Series.apply 一起使用 -


Or as asked in comments if you want to do it with numpy.unique() , you can use that along with Series.apply as well -

import numpy as np
df['ids'] = df['ids'].apply(lambda x: np.unique(x))

演示 -

In [79]: df['ids'] = df['ids'].apply(lambda x: np.unique(x))

In [80]: df
Out[80]:
         date               ids
0  2015-10-13         [12, 978]
1  2015-10-14  [1, 2, 121, 978]

这篇关于合并DataFrame时如何合并两个列表列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆