从大型字典替换DataFrame中值的更好方法 [英] Better way to replace values in DataFrame from large dictionary
问题描述
我已经编写了一些代码,使用字典将DataFrame中的值替换为另一帧中的值,并且可以正常工作,但是我在一些大型文件中使用了此代码,字典可能会变得很长.几千双.然后,当我使用此代码时,它的运行速度非常慢,并且在某些情况下它也已耗尽内存.
I have written some code that replaces values in a DataFrame with values from another frame using a dictionary, and it is working, but i am using this on some large files, where the dictionary can get very long. A few thousand pairs. When I then uses this code it runs very slow, and it have also been going out of memory on a few ocations.
我有些坚信,我的方法远非最佳,而且必须有一些更快的方法来做到这一点.我创建了一个简单的示例,该示例可以满足我的要求,但是对于大量数据而言这很慢.希望有人有一个更简单的方法来做到这一点.
I am somewhat convinced that my method of doing this is far from optimal, and that there must be some faster ways to do this. I have created a simple example that does what I want, but that is slow for large amounts of data. Hope someone have a simpler way to do this.
import pandas as pd
#Frame with data where I want to replace the 'id' with the name from df2
df1 = pd.DataFrame({'id' : [1, 2, 3, 4, 5, 3, 5, 9], 'values' : [12, 32, 42, 51, 23, 14, 111, 134]})
#Frame containing names linked to ids
df2 = pd.DataFrame({'id' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'name' : ['id1', 'id2', 'id3', 'id4', 'id5', 'id6', 'id7', 'id8', 'id9', 'id10']})
#My current "slow" way of doing this.
#Starts by creating a dictionary from df2
#Need to create dictionaries from the domain and banners tables to link ids
df2_dict = dict(zip(df2['id'], df2['name']))
#and then uses the dict to replace the ids with name in df1
df1.replace({'id' : df2_dict}, inplace=True)
推荐答案
我认为您可以使用 to_dict
-如果df2
中不存在值,则获取NaN
:
I think you can use map
with Series
converted to_dict
- get NaN
if not exist value in df2
:
df1['id'] = df1.id.map(df2.set_index('id')['name'].to_dict())
print (df1)
id values
0 id1 12
1 id2 32
2 id3 42
3 id4 51
4 id5 23
5 id3 14
6 id5 111
7 id9 134
或 replace
,如果df2
中不存在值,则将df1
中的原始值设为:
Or replace
, if dont exist value in df2
let original values from df1
:
df1['id'] = df1.id.replace(df2.set_index('id')['name'])
print (df1)
id values
0 id1 12
1 id2 32
2 id3 42
3 id4 51
4 id5 23
5 id3 14
6 id5 111
7 id9 134
示例:
#Frame with data where I want to replace the 'id' with the name from df2
df1 = pd.DataFrame({'id' : [1, 2, 3, 4, 5, 3, 5, 9], 'values' : [12, 32, 42, 51, 23, 14, 111, 134]})
print (df1)
#Frame containing names linked to ids
df2 = pd.DataFrame({'id' : [1, 2, 3, 4, 6, 7, 8, 9, 10], 'name' : ['id1', 'id2', 'id3', 'id4', 'id6', 'id7', 'id8', 'id9', 'id10']})
print (df2)
df1['new_map'] = df1.id.map(df2.set_index('id')['name'].to_dict())
df1['new_replace'] = df1.id.replace(df2.set_index('id')['name'])
print (df1)
id values new_map new_replace
0 1 12 id1 id1
1 2 32 id2 id2
2 3 42 id3 id3
3 4 51 id4 id4
4 5 23 NaN 5
5 3 14 id3 id3
6 5 111 NaN 5
7 9 134 id9 id9
这篇关于从大型字典替换DataFrame中值的更好方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!