Python pandas 与OR逻辑合并 [英] Python pandas merge with OR logic

查看:75
本文介绍了Python pandas 与OR逻辑合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在搜索,但未找到此问题的答案,您可以使用OR逻辑执行熊猫数据帧的合并吗?基本上,等效于使用其中t1.A = t2.A或t1.A = t2.B"的SQL合并.

I'm searching and haven't found an answer to this question, can you perform a merge of pandas dataframes using OR logic? Basically, the equivalent of a SQL merge using "where t1.A = t2.A OR t1.A = t2.B".

我遇到的一种情况是,我要将信息从一个数据库拉到一个数据帧(df1)中,并且需要将其与另一个数据库的信息合并,然后又将其拉到另一个数据帧(df2)中,并基于单个列进行合并( col1).如果它们在匹配时始终使用相同的值,那将非常简单.我遇到的情况是,有时它们匹配,有时使用同义词.第三个数据库有一个表,该表提供此数据实体(col1和col1_alias)的同义词之间的查找,可以将其拉入第三个数据帧(df3).我想做的是合并df1中我需要的列和df2中我需要的列.

I have a situation where I am pulling information from one database into a dataframe (df1) and I need to merge it with information from another database, which I pulled into another dataframe (df2), merging based on a single column (col1). If these always used the same value when they matched, it would be very straightforward. The situation I have is that sometimes they match and sometimes they use a synonym. There is a third database that has a table that provides a lookup between synonyms for this data entity (col1 and col1_alias), which could be pulled into a third dataframe (df3). What I am looking to do is merge the columns I need from df1 and the columns I need from df2.

如上所述,在df1.col1和df2.col1匹配的情况下,这将起作用...

As stated above, in cases where df1.col1 and df2.col1 match, this would work...

df = df1.merge(df2, on='col1', how='left')

但是,它们并不总是具有相同的值,有时甚至具有同义词.我考虑过根据df3.col1在df1.col1中或df3.col1_alias在df1.col1中的时间创建df3.然后,从df3.col1和df3.col1_alias(列表1)创建一个值列表,并基于列表1中的df2.col1选择df2.这会给我我需要的df2中的行,但是,这仍然无法使我适合合并与相应行匹配的df1和df2.我认为,如果有一个OR合并选项,我可以逐步解决并使它起作用,但是以下所有内容都会引发语法错误:

However, they don't always have the same value and sometimes have the synonyms. I thought about creating df3 based on when df3.col1 was in df1.col1 OR df3.col1_alias was in df1.col1. Then, creating a single list of values from df3.col1 and df3.col1_alias (list1) and selecting df2 based on df2.col1 in list1. This would give me the rows from df2 I need but, that still wouldn't put me in position to merge df1 and df2 matching the appropriate rows. I think if there an OR merge option, I can step through this and make it work, but all of the following threw a syntax error:

df = df1.merge((df3, left_on='col1', right_on='col1', how='left')|(df3, left_on='col1', right_on='col1_alias', how='left'))

df = df1.merge(df3, (left_on='col1', right_on='col1')|(left_on='col1', right_on='col1_alias'), how='left')

df = df1.merge(df3, left_on='col1', right_on='col1'|right_on='col1_alias', how='left')

和其他几种变体.关于如何执行或"合并的任何指导,或者对使用df3的两列中的同义词进行完全不同的df1和df2合并方法的建议?

and several other variations. Any guidance on how to perform an OR merge or suggestions on a completely different approach to merging df1 and df2 using the synonyms in two columns in df3?

推荐答案

我想我会做为两个合并:

I think I would do this as two merges:

In [11]: df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=["A", "B"])

In [12]: df2 = pd.DataFrame([[1, 7], [2, 8], [4, 9]], columns=["C", "D"])

In [13]: res = df.merge(df2, left_on="B", right_on="C", how="left")

In [14]: res.update(df.merge(df2, left_on="A", right_on="C", how="left"))

In [15]: res
Out[15]:
   A  B    C    D
0  1  2  1.0  7.0
1  3  4  4.0  9.0
2  5  6  NaN  NaN

如您所见,它选择A = 1-> D = 7而不是B = 2-> D = 8.

As you can see this picks A = 1 -> D = 7 rather than B = 2 -> D = 8.

注意:为了获得更大的可扩展性(匹配不同的列),拔出单个列可能很有意义,尽管在此示例中它们是相同的:

Note: For more extensibility (matching different columns) it might make sense to pull out a single column, although they're both the same in this example:

In [21]: res = df.merge(df2, left_on="B", right_on="C", how="left")["C"]

In [22]: res.update(df.merge(df2, left_on="A", right_on="C", how="left")["C"])

In [23]: res
Out[23]:
0    1.0
1    4.0
2    NaN
Name: C, dtype: float64

这篇关于Python pandas 与OR逻辑合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆