pandas 合并多个DataFrame [英] Pandas Merge Multi DataFrame( relate DataFrame )

查看:170
本文介绍了 pandas 合并多个DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对熊猫数据框合并有疑问.

I have question relate in pandas dataframe merge.

Plz,在数据下方.

Plz, below data..

Rating csv
UserID ContentID Rating 
U-1      C-1       3
U-1      C-2       4
U-3      C-3       1
U-5      C-1       5

Content csv
Title ContentID Language
T-1      C-1      EN
T-2      C-2      EN
T-3      C-3      EN

User csv
UserID Age Gender
U-1    10    1
U-2    20    0
U-3    30    1
U-4    40    0 
U-5    50    1
U-6    60    0
U-7    70    1

我想要结果

UserID ContentID Rating Title Language Age  Gender
U-1      C-1       3     T-1     EN     10     1
U-1      C-2       4     T-2     EN     10     1
U-1      C-3      NAN    T-3     EN     10     1
U-2      C-1      NAN    T-1     EN     20     0   
U-2      C-2      NAN    T-2     EN     20     0
U-2      C-3      NAN    T-3     EN     20     0
U-3      C-1      NAN    T-1     EN     30     1         
U-3      C-2      NAN    T-2     EN     30     1   
U-3      C-3       1     T-3     EN     30     1    
U-4      C-1      NAN    T-1     EN     40     0    
U-4      C-2      NAN    T-2     EN     40     0   
U-4      C-3      NAN    T-3     EN     40     0   
U-5      C-1       5     T-1     EN     50     1          
U-5      C-2      NAN    T-2     EN     50     1    
U-5      C-3      NAN    T-3     EN     50     1    
U-6      C-1      NAN    T-1     EN     60     0     
U-6      C-2      NAN    T-2     EN     60     0     
U-6      C-3      NAN    T-3     EN     60     0     
U-7      C-1      NAN    T-1     EN     70     1     
U-7      C-2      NAN    T-2     EN     70     1     
U-7      C-3      NAN    T-3     EN     70     1    

DF总行大小为UserID(用户csv)计数* ContentID(内容csv)计数 (例如,> 7 * 3-> 21行以上)

Total DF Rows Size are UserID(User csv) Count * ContentID(Content csv) Count ( ex> Above 7 * 3 -> 21 rows)

所有DataFrame都相关. -评分/内容-> ContentID -评分/用户->用户ID

All DataFrame are relate. - Rating / Content -> ContentID - Rating / User -> UserID

换句话说,结果数据帧仅保留在评级区域(NAN),其他区域则不为nan.

In other words, Result DataFrame is only remain rating zone(NAN), Other zone is none nan.

实际大小内容(6000),用户(220000)->结果行总数:约1300000000

Real Size Content( 6000 ), User(220000 ) -> Total Result Rows Count : about 1300000000

我尝试过,但是会引发memoryError ...

I try it, but it's raise memoryError...

plz,请帮帮我.谢谢.

plz, help me..Thanks..

推荐答案

您可以使用交叉连接和左连接-df2.ContentIDdf3.UserID中必需的唯一值:

You can use cross join with left join - necessary unique values in df2.ContentID and df3.UserID:

df = pd.merge(pd.merge(df3.assign(A=1), df2.assign(A=1), on='A'), df1, 'left').drop('A', 1)
print (df)
   UserID  Age  Gender Title ContentID Language  Rating
0     U-1   10       1   T-1       C-1       EN     3.0
1     U-1   10       1   T-2       C-2       EN     4.0
2     U-1   10       1   T-3       C-3       EN     NaN
3     U-2   20       0   T-1       C-1       EN     NaN
4     U-2   20       0   T-2       C-2       EN     NaN
5     U-2   20       0   T-3       C-3       EN     NaN
6     U-3   30       1   T-1       C-1       EN     NaN
7     U-3   30       1   T-2       C-2       EN     NaN
8     U-3   30       1   T-3       C-3       EN     1.0
9     U-4   40       0   T-1       C-1       EN     NaN
10    U-4   40       0   T-2       C-2       EN     NaN
11    U-4   40       0   T-3       C-3       EN     NaN
12    U-5   50       1   T-1       C-1       EN     5.0
13    U-5   50       1   T-2       C-2       EN     NaN
14    U-5   50       1   T-3       C-3       EN     NaN
15    U-6   60       0   T-1       C-1       EN     NaN
16    U-6   60       0   T-2       C-2       EN     NaN
17    U-6   60       0   T-3       C-3       EN     NaN
18    U-7   70       1   T-1       C-1       EN     NaN
19    U-7   70       1   T-2       C-2       EN     NaN
20    U-7   70       1   T-3       C-3       EN     NaN

这篇关于 pandas 合并多个DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆