如何在 pandas 中实现内在联系 [英] How to achieve an inner join in pandas
问题描述
我需要有效地执行用Python实现的内部联接.
I need to effectively do an inner join implemented in Python.
我有2个数据集,它们来自不同的来源,但是共享一个公用密钥.
I have 2 data sets which come from separate sources but share a common key.
(为了争辩)让我们说他们看起来像这样:
Lets say (for the sake of argument) that they look like this:
person_likes = [{'person_id': '1', 'food': 'ice_cream', 'pastimes': 'swimming'},
{'person_id': '2', 'food': 'paella', 'pastimes': 'banjo'}]
person_accounts = [{'person_id': '1', 'blogs': ['swimming digest', 'cooking puddings']},
{'person_id': '2', 'blogs': ['learn flamenca']}]
如何最好地合并这两套数据.我有这样的东西:
How best can I join these two sets of data. I have something like this:
joins = []
for like in person_likes:
for acc in person_accounts:
if like['person_id'] == acc['person_id']:
join = {}
join.update(like)
join.update(acc)
joins.append(join)
print(joins)
这看起来很好用(我还没有对它进行广泛的测试),乍一看看起来我们可以做的最好-但是我想知道是否有一种已知的算法可以实现更高的性能,以及是否还有更多的算法可以惯用还是Pythonic的方式?
This appears to work fine (I haven't tested it extensively), and at first glance looks like the best we can do - but I wonder if there is a know algorithm which is more performant and also if there is a more idiomatic or Pythonic way of doing this?
推荐答案
在这里,熊猫似乎是一个显而易见的答案.
Pandas seems like an obvious answer here.
import pandas as pd
accounts = pd.DataFrame(person_accounts)
likes = pd.DataFrame(person_likes)
pd.merge(accounts, likes, on='person_id')
blogs person_id food pastimes
# 0 [swimming digest, cooking puddings] 1 ice_cream swimming
# 1 [learn flamenca] 2 paella banjo
这篇关于如何在 pandas 中实现内在联系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!