如何合并一个csv文件中的两个字段与另一个csv文件中的一个字段? [英] How to merge two field on one csv file with one field of other csv file?

查看:151
本文介绍了如何合并一个csv文件中的两个字段与另一个csv文件中的一个字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想按以下方式合并两个CSV文件:

I would like to merge two CSV files as follow:

第一个CSV文件:

df = pd.DataFrame()
df["ticket_number"] = ['AAA', 'AAA', 'AAA', 'ABC', 'ABA','ADC','ABA','BBB']
df["train_board_station"] = ['Tokyo', 'LA', 'Paris', 'New_York', 'Delhi','Phoenix', 'London','LA']
df["train_off_station"] = ['Phoenix', 'London', 'Sydney', 'Berlin', 'Shanghai','LA', 'Paris', 'New_York']

第二个CSV文件:

rec = pd.DataFrame()
rec["code"] = ['Tokyo','London','Paris','New_York','Shanghai','LA','Sydney','Berlin','Phoenix','Delhi']
rec["count_A"] = ['1.2','7.8','4','8','7.8','3','8','5','2','10']
rec["count_B"] = ['12','78','4','8','78','36','88','51','25','10']

我使用以下代码:

for x in ["board", "off"]:
    df["station"] = df["train_" + x + "_station"]
    df["code"] = df["train_" + x + "_station"]
    df = pd.concat([df,rec], axis=1, join_axes=[df.index])
    df[x + "_count_A"] = df["count_A"]
    df[x + "_count_B"] = df["count_B"]
    df = df.drop(["station", "code","count_A","count_B"], axis=1)

我得到以下错误输出:

ticket_number,train_board_station,train_off_station,board_count_A,board_count_B,off_count_A,off_count_B
AAA,Tokyo,Phoenix,1.2,12,1.2,12
AAA,LA,London,7.8,78,7.8,78
AAA,Paris,Sydney,4,4,4,4
ABC,New_York,Berlin,8,8,8,8
ABA,Delhi,Shanghai,7.8,78,7.8,78
ADC,Phoenix,LA,3,36,3,36
ABA,London,Paris,8,88,8,88
BBB,LA,New_York,5,51,5,51

我注意到,而不是count_A和count_B与train_board station和train_off_station合并相同的行,第一行与train_board_station合并,第二行与train_off_station合并两次。

I notice that instead of count_A and count_B merging with train_board station and train_off_station of same line, first line gets merged with train_board_station and second lines gets merged with train_off_station twice.

预期输出为:

ticket_number,train_board_station,train_off_station,board_count_A,board_count_B,off_count_A,off_count_B
AAA,Tokyo,Phoenix,1.2,12,2,25
AAA,LA,London,3,36,7.8,78
AAA,Paris,Sydney,4,4,8,88
ABC,New_York,Berlin,8,8,5,51
ABA,Delhi,Shanghai,10,10,7.8,78
ADC,Phoenix,LA,2,26,3,36
ABA,London,Paris,7.7,78,4,4
BBB,LA,New_York,36,36,8,8


推荐答案

重复项有问题,我使用 加入 并加入左加入:

There is problem with duplicates, I use join with left join:

for x in ["board", "off"]:
    df["code"] = df["station"] = df["train_" + x + "_station"]
    df = df.join(rec.set_index('code'), on='code')
    df[x + "_count_A"] = df["count_A"]
    df[x + "_count_B"] = df["count_B"]
    df = df.drop(["station", "code","count_A","count_B"], axis=1)

print (df)
  ticket_number train_board_station train_off_station board_count_A  \
0           AAA               Tokyo           Phoenix           1.2   
1           AAA                  LA            London             3   
2           AAA               Paris            Sydney             4   
3           ABC            New_York            Berlin             8   
4           ABA               Delhi          Shanghai            10   
5           ADC             Phoenix                LA             2   
6           ABA              London             Paris           7.8   
7           BBB                  LA          New_York             3   

  board_count_B off_count_A off_count_B  
0            12           2          25  
1            36         7.8          78  
2             4           8          88  
3             8           5          51  
4            10         7.8          78  
5            25           3          36  
6            78           4           4  
7            36           8           8  

这篇关于如何合并一个csv文件中的两个字段与另一个csv文件中的一个字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆