将一个csv文件中的值匹配到另一个csv文件中,然后使用pandas/python替换整列 [英] Matching values from one csv file to another and replace entire column using pandas/python

查看:115
本文介绍了将一个csv文件中的值匹配到另一个csv文件中,然后使用pandas/python替换整列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下示例:

我有一个Movielens数据集-

I have a dataset of Movielens-

u.item.csv

ID|MOVIE NAME (YEAR)|REL.DATE|NULL|IMDB LINK|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|
1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0

此处使用的分隔符是Pipe,仍可管理.

Seperator used here is Pipe, which is still manageable.

training_data.csv

,user_id,movie_id,rating,unix_timestamp
0,1,1,5,874965758
1,1,2,3,876893171
2,1,3,4,878542960

因为我需要在"Training_data"中显示电影名称,而不是"movie id",所以我需要将 u.item.csv 的每个ID与带有 training_data.csv的movie_id匹配,然后将其替换.

Since I need to show the Movie names in "Training_data", instead of "movie id" I need to match every ID of u.item.csv with movie_id with training_data.csv and then replace it.

我正在使用Python Pandas,并且训练数据已从Sframe转换为Dataframe到CSV.这样我就可以获取所需的更改,但仍未成功.我当然可以使用一些循环结构,但是匹配和替换是我面临的真正挑战.

I'm using Python Pandas, and The training data was converted from Sframe to Dataframe to CSV. So that I could acquire the required change, which is yet unsuccessful. I can surely use some looping structure, but matching and replacing is real challenge I face.

P.s.我知道培训数据将按用户顺序排列,如果被替换,它将产生准确的输出,但是我需要学习这一点,以便在推荐电影时我需要显示电影名称而不是ID.

P.s. I know Training data will be in sequence per user and will produce the exact output if replaced, but I need to learn this so that when I recommend movies I need MOVIE Names to displayed and not IDs.

我已经尝试过

  1. THIS(pandas-python-replace-multiple-values-in -multiple-columns)-但是当我在数据集中有100K值时,可能会花费很多时间
  2. THIS(pandas-replace-multiple-values-one-column)-匹配值未解释
  3. 这(pandas-replacing-column-values)-完成手动输入
  1. THIS (pandas-python-replace-multiple-values-in-multiple-columns) - But can cost a lot of time when I have 100K values in Dataset
  2. THIS (pandas-replace-multiple-values-one-column) - Matching values not explained
  3. THIS (pandas-replacing-column-values) - Manual entries are done

推荐答案

我认为您需要Series .html"rel =" nofollow noreferrer> set_index :

I think you need map by Series created by set_index:

print (df1.set_index('ID')['MOVIE NAME (YEAR)'])
ID
1     Toy Story (1995)
2     GoldenEye (1995)
3    Four Rooms (1995)
Name: MOVIE NAME (YEAR), dtype: object

df2['movie_id'] = df2['movie_id'].map(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id           movie_id  rating  unix_timestamp
0        1   Toy Story (1995)       5       874965758
1        1   GoldenEye (1995)       3       876893171
2        1  Four Rooms (1995)       4       878542960

或使用 replace :

df2['movie_id'] = df2['movie_id'].replace(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id           movie_id  rating  unix_timestamp
0        1   Toy Story (1995)       5       874965758
1        1   GoldenEye (1995)       3       876893171
2        1  Four Rooms (1995)       4       878542960

如果不匹配则为差异,map创建NaN并替换为let原始值:

Difference is if not match, map create NaN and replace let original value:

print (df2)
   user_id  movie_id  rating  unix_timestamp
0        1         1       5       874965758
1        1         2       3       876893171
2        1         5       4       878542960 <- 5 not match

df2['movie_id'] = df2['movie_id'].map(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id          movie_id  rating  unix_timestamp
0        1  Toy Story (1995)       5       874965758
1        1  GoldenEye (1995)       3       876893171
2        1               NaN       4       878542960


df2['movie_id'] = df2['movie_id'].replace(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id          movie_id  rating  unix_timestamp
0        1  Toy Story (1995)       5       874965758
1        1  GoldenEye (1995)       3       876893171
2        1                 5       4       878542960

这篇关于将一个csv文件中的值匹配到另一个csv文件中,然后使用pandas/python替换整列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆