pandas 三向连接列上的多个数据框 [英] pandas three-way joining multiple dataframes on columns

查看:36
本文介绍了 pandas 三向连接列上的多个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 3 个 CSV 文件.每个都将第一列作为人的(字符串)名称,而每个数据框中的所有其他列都是该人的属性.

I have 3 CSV files. Each has the first column as the (string) names of people, while all the other columns in each dataframe are attributes of that person.

如何将所有三个 CSV 文档连接"在一起以创建单个 CSV,其中每一行都具有该人的字符串名称的每个唯一值的所有属性?

How can I "join" together all three CSV documents to create a single CSV with each row having all the attributes for each unique value of the person's string name?

pandas 中的 join() 函数指定我需要一个多索引,但我对分层索引方案与基于单个索引进行连接有什么关系感到困惑.

The join() function in pandas specifies that I need a multiindex, but I'm confused about what a hierarchical indexing scheme has to do with making a join based on a single index.

推荐答案

假设导入:

import pandas as pd

John Galt 的回答 基本上是一个 reduce 操作.如果我有多个数据框,我会将它们放在这样的列表中(通过列表推导或循环或诸如此类的方式生成):

John Galt's answer is basically a reduce operation. If I have more than a handful of dataframes, I'd put them in a list like this (generated via list comprehensions or loops or whatnot):

dfs = [df0, df1, df2, dfN]

假设他们有一些共同的列,比如你的例子中的 name,我会做以下事情:

Assuming they have some common column, like name in your example, I'd do the following:

df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), dfs)

这样,您的代码应该可以处理您想要合并的任何数量的数据帧.

That way, your code should work with whatever number of dataframes you want to merge.

Edit August 1, 2016:对于使用 Python 3 的用户:reduce 已移至 functools.所以要使用这个函数,你首先需要导入那个模块:

Edit August 1, 2016: For those using Python 3: reduce has been moved into functools. So to use this function, you'll first need to import that module:

from functools import reduce

这篇关于 pandas 三向连接列上的多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆