Pandas-Vlookup-搜索栏中的重复值 [英] Pandas - Vlookup - Duplicate values in search column

查看:48
本文介绍了Pandas-Vlookup-搜索栏中的重复值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Pandas中模仿AV查找(Excel函数)(使用测试数据集,合并功能似乎有效)-但我对此处的示例确实有疑问

下面是一张公开票,似乎与我的问题相似,但没有解决方案.

https://github.com/pandas-dev/pandas/issues/20769

如果这含糊,我深表歉意,我无法上载df和excel文件,因为它用于工作,而我尝试的测试DF并没有引发相同的错误.

最后,我只想对pandas进行vlookup,并且vlookup值可能是重复的,因此在这种情况下,只要先击中任何重复的值,那就是将在新列中返回的值./p>

下面是一个示例df,可帮助您想象Cust_PO_Number中重复项的含义

  a = {'Cust_PO_Number':['A','B','C','C'],'ColumnB':[1,2,3,4]}b = {'Cust_PO_Number':['A','B','C','C'],'Column_That_I_Want_added':[2,3,4,5]}df = pd.DataFrame(data = a)df2 = pd.DataFrame(data = b)所需的dfc = {'ColumnA':['A','B','C','C'],'ColumnB':[1,2,3,4],'MatchedColumn',[2,3,4,5]}requireddf = pd.DataFrame(data = c) 

现在要探索多级列

  print(plannerdf.columns)MultiIndex(levels = [['Cust_PO_Number','Department']],标签= [[0,1]]) 

解决方案

尝试一下:

  df.insert(2,'Column_That_I_Want_added',df ['Cust_PO_Number'].map(df2.drop_duplicates('Cust_PO_Number').set_index('Cust_PO_Number')['Column_That_I_Want_added'])) 

其中 df 是原始数据帧以及所需的数据帧,而 df2 是您从中查找数据的位置.

I am trying to mimic a v lookup (excel function) in Pandas ( using test data sets the merge function seems to work) - but I do have question regarding the example here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html . If you look at the first example it mergers the two df's, the output has more columns and more ROWS. I would just want to return a new column - like how a v lookup works. non the less , when I try my code even for the above I get this error:

agingdf = agingdf.merge(plannerdf, left_on ='Cust_PO_Number', right_on='Cust_PO_Number')

ValueError: The column label 'Cust_PO_Number' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.

Below is an open ticket that seems similar to my issues, but had no resolution.

https://github.com/pandas-dev/pandas/issues/20769

I apologize if this is vague, I can't upload the df and excel file because it is for work and the test DF's I tried did not throw the same error.

At the end of the day I just want to do a vlookup with pandas, and the vlookup values may be duplicate, so in that case just whatever duplicate value got hit first thats the value that would return in the new column.

Below, is an example df to help you imagine what I mean by duplicates in Cust_PO_Number

a = {'Cust_PO_Number': ['A', 'B', 'C', 'C'], 'ColumnB': [1,2,3,4]}
b = {'Cust_PO_Number': ['A', 'B', 'C', 'C'], 'Column_That_I_Want_added': [2,3,4,5]}
df = pd.DataFrame(data=a)
df2 = pd.DataFrame(data=b)

desired df
c = {'ColumnA': ['A', 'B', 'C', 'C'], 'ColumnB': [1,2,3,4], 'MatchedColumn', [2,3,4,5]}

desireddf = pd.DataFrame(data=c)

Now to explore multi- level columns

print(plannerdf.columns)
MultiIndex(levels=[['Cust_PO_Number', 'Department']],
           labels=[[0, 1]])

解决方案

Try this:

df.insert(2,'Column_That_I_Want_added', df['Cust_PO_Number'].map(df2.drop_duplicates('Cust_PO_Number').set_index('Cust_PO_Number')['Column_That_I_Want_added']))

where df is the original dataframe as well as the desired dataframe, and df2 is where you look up the data from.

这篇关于Pandas-Vlookup-搜索栏中的重复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆