Pandas-Vlookup-搜索栏中的重复值 [英] Pandas - Vlookup - Duplicate values in search column
问题描述
我正在尝试在Pandas中模仿AV查找(Excel函数)(使用测试数据集,合并功能似乎有效)-但我对此处的示例确实有疑问 下面是一张公开票,似乎与我的问题相似,但没有解决方案. https://github.com/pandas-dev/pandas/issues/20769 如果这含糊,我深表歉意,我无法上载df和excel文件,因为它用于工作,而我尝试的测试DF并没有引发相同的错误. 最后,我只想对pandas进行vlookup,并且vlookup值可能是重复的,因此在这种情况下,只要先击中任何重复的值,那就是将在新列中返回的值./p> 下面是一个示例df,可帮助您想象Cust_PO_Number中重复项的含义 现在要探索多级列 尝试一下: 其中 I am trying to mimic a v lookup (excel function) in Pandas ( using test data sets the merge function seems to work) - but I do have question regarding the example here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html . If you look at the first example it mergers the two df's, the output has more columns and more ROWS. I would just want to return a new column - like how a v lookup works. non the less , when I try my code even for the above I get this error: Below is an open ticket that seems similar to my issues, but had no resolution. https://github.com/pandas-dev/pandas/issues/20769 I apologize if this is vague, I can't upload the df and excel file because it is for work and the test DF's I tried did not throw the same error. At the end of the day I just want to do a vlookup with pandas, and the vlookup values may be duplicate, so in that case just whatever duplicate value got hit first thats the value that would return in the new column. Below, is an example df to help you imagine what I mean by duplicates in Cust_PO_Number Now to explore multi- level columns
Try this: where 这篇关于Pandas-Vlookup-搜索栏中的重复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
a = {'Cust_PO_Number':['A','B','C','C'],'ColumnB':[1,2,3,4]}b = {'Cust_PO_Number':['A','B','C','C'],'Column_That_I_Want_added':[2,3,4,5]}df = pd.DataFrame(data = a)df2 = pd.DataFrame(data = b)所需的dfc = {'ColumnA':['A','B','C','C'],'ColumnB':[1,2,3,4],'MatchedColumn',[2,3,4,5]}requireddf = pd.DataFrame(data = c)
print(plannerdf.columns)MultiIndex(levels = [['Cust_PO_Number','Department']],标签= [[0,1]])
df.insert(2,'Column_That_I_Want_added',df ['Cust_PO_Number'].map(df2.drop_duplicates('Cust_PO_Number').set_index('Cust_PO_Number')['Column_That_I_Want_added']))
df
是原始数据帧以及所需的数据帧,而 df2
是您从中查找数据的位置.agingdf = agingdf.merge(plannerdf, left_on ='Cust_PO_Number', right_on='Cust_PO_Number')
ValueError: The column label 'Cust_PO_Number' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.
a = {'Cust_PO_Number': ['A', 'B', 'C', 'C'], 'ColumnB': [1,2,3,4]}
b = {'Cust_PO_Number': ['A', 'B', 'C', 'C'], 'Column_That_I_Want_added': [2,3,4,5]}
df = pd.DataFrame(data=a)
df2 = pd.DataFrame(data=b)
desired df
c = {'ColumnA': ['A', 'B', 'C', 'C'], 'ColumnB': [1,2,3,4], 'MatchedColumn', [2,3,4,5]}
desireddf = pd.DataFrame(data=c)
print(plannerdf.columns)
MultiIndex(levels=[['Cust_PO_Number', 'Department']],
labels=[[0, 1]])
df.insert(2,'Column_That_I_Want_added', df['Cust_PO_Number'].map(df2.drop_duplicates('Cust_PO_Number').set_index('Cust_PO_Number')['Column_That_I_Want_added']))
df
is the original dataframe as well as the desired dataframe, and df2
is where you look up the data from.