基于2列的值的条件切片 [英] conditional slicing based on values of 2 columns
问题描述
我有一个看起来像df1的
I have a df1 that looks like:
Out[43]:
city1 city2
0 YYZ SFO
1 YYZ EWR
2 YYZ DFW
3 YYZ LAX
4 YYZ YYC
我还有一个要基于df1进行切片的df2,即df2中的city1和city2必须与df1中的同一city1和city2对相对应.
I have another df2 that I want to slice based on df1 i.e. city1 and city2 in df2 have to correspond to the same city1 and city2 pair in df1.
我只希望df2中的行,其中city1和city2列与df1中的行完全匹配.
I only want rows in df2 where the city1 and city2 columns match exactly as those in df1.
我是否必须将dfs合并/合并在一起作为df1上的左连接,这是唯一的干净方法?我不想使用值作为city1和city2的串联来创建另一列.那会行得通,但必须有一种简单的方法可以将其内置到熊猫中,而无需操纵我的数据.
Do I have to merge/join the dfs together as a left join on df1 as the only clean way to do this? I don't want to create another column with the value as a concatenation of city1 and city2. That will work but there must be an easy way that is built into pandas without having to manipulate my data.
更新:
df2不仅有2列.它总共有20列.为了简单起见,我只提到了city1和city2.
df2 has more than just 2 columns. It has a total of 20 columns. For simplicity I only mentioned city1 and city2.
无论如何,我想返回df2(具有20列的df)中的所有行,其中city1和city2对匹配df1中存在的行.
In any case, I want to return all rows in df2 (df with 20 columns) where the city1 and city2 pair match what is present in df1.
推荐答案
设置
setup
df2 = pd.DataFrame([
['YYZ', 'SFO', 1],
['YYZ', 'YYD', 1]
], columns=['city1', 'city2', 'val'])
cols = ['city1', 'city2']
选项1
numpy
broadcasting
option 1
numpy
broadcasting
multi_isin_cond = (df2[cols].values[:, None] == df1[cols].values).all(-1).any(-1)
df2.loc[multi_isin_cond]
city1 city2 val
0 YYZ SFO 1
选项2
pandas
merge
option 2
pandas
merge
df2.merge(df1, on=cols)
city1 city2 val
0 YYZ SFO 1
选项3
不知道该怎么称呼,不推荐它.
option 3
Don't know what to call it, Don't recommend it.
idx = pd.MultiIndex.from_arrays(df1.values.T, names=df1.columns)
df2[df2[cols].apply(tuple, 1).isin(idx)]
city1 city2 val
0 YYZ SFO 1
这篇关于基于2列的值的条件切片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!