使用df.merge填充df中的新列会产生奇怪的匹配 [英] Use df.merge to populate a new column in df gives strange matchs
问题描述
我想基于另一个数据框在我的数据框(df)中创建一个新列. 基本上df2包含我要插入df的更新信息. 为了复制实际情况(> 1m行),我将使用简单的列填充两个随机df.
I want to create a new column in my dataframe (df) based on another dataframe. Basically df2 contains updated informations that I want to plug into df. In order to replicate my real case (>1m lines), I will just populate two random df with simple columns.
我使用pandas.merge()来做到这一点,但这给了我奇怪的结果.
I use pandas.merge() to do this, but this is giving me strange results.
这是一个典型的例子.让我们随机创建df并创建具有简单关系的df2:"New Type" ="Type" +1.我创建了此简单关系,以便我们可以轻松检查输出.在我的实际应用程序中,我当然没有如此简单的关系.
Here is a typical example. Let's create df randomly and create df2 with a simple relationship : "New Type" = "Type" + 1. I create this simple relationship so that we can check easily the ouput. In my real application I don't have such an easy relationship of course.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 1)),columns = ["Type"])
df.head()
Type
0 45
1 3
2 89
3 6
4 39
df1 = pd.DataFrame({"Type":range(1,100)})
df1["New Type"] = df1["Type"] + 1
print(df1.head())
Type New Type
0 1 2
1 2 3
2 3 4
3 4 5
4 5 6
现在假设我要基于df1上的新类型"更新df类型"
Now let's say I want to update df "Type" based on the "New Type" on df1
df["Type2"] = df.merge(df1,on="Type")["New Type"]
print(df.head())
我得到了这个奇怪的输出,我们清楚地看到它不起作用
I get this strange output where we clearly see that it does not work
Type Type2
0 45 46.0
1 3 4.0
2 89 4.0
3 6 4.0
4 39 90.0
我认为输出应该像
Type Type2
0 45 46.0
1 3 4.0
2 89 90.0
3 6 7.0
4 39 40.0
仅第一行正确匹配.你知道我错过了吗?
Only the first line is properly matched. Do you know what I've missed?
1.我需要与how ="left"合并,否则默认选择是"inner"生成另一个维度与df不同的表.
1.I need to do merge with how="left" otherwise the default choice is "inner" producing another table with a different dimension than df.
- 我还需要使用sort = false作为合并功能的属性.否则,合并结果将先排序,然后再应用于df.
推荐答案
使用map
,set_index
和squeeze
的一种方法:
df['Type2'] = df['Type'].map(df1.set_index('Type').squeeze())
输出:
Type Type2
0 22 23.0
1 56 57.0
2 63 64.0
3 33 34.0
4 25 26.0
这篇关于使用df.merge填充df中的新列会产生奇怪的匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!