pandas 三向联接列上的多个数据框 [英] pandas three-way joining multiple dataframes on columns
问题描述
我有3个CSV文件.每个人都有第一列作为人物的(字符串)名称,而每个数据框中的所有其他列都是该人物的属性.
如何将所有三个CSV文档连接"在一起以创建一个CSV,而每一行都具有该人的字符串名称的每个唯一值的所有属性?
pandas中的join()
函数指定我需要一个多索引,但是我对层次化索引方案与基于单个索引进行联接有何关系感到困惑.
假定的进口:
import pandas as pd
John Galt的答案基本上是reduce
操作.如果我有几个数据帧,则将它们放在这样的列表中(通过列表推导或循环或其他方式生成):
dfs = [df0, df1, df2, dfN]
假设它们有一些共同的列,例如您的示例中的name
,我将执行以下操作:
df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), dfs)
这样,您的代码应可以与要合并的任意数量的数据框一起使用.
编辑2016年8月1日:对于使用Python 3的用户:reduce
已移至functools
.因此,要使用此功能,您首先需要导入该模块:
from functools import reduce
I have 3 CSV files. Each has the first column as the (string) names of people, while all the other columns in each dataframe are attributes of that person.
How can I "join" together all three CSV documents to create a single CSV with each row having all the attributes for each unique value of the person's string name?
The join()
function in pandas specifies that I need a multiindex, but I'm confused about what a hierarchical indexing scheme has to do with making a join based on a single index.
Assumed imports:
import pandas as pd
John Galt's answer is basically a reduce
operation. If I have more than a handful of dataframes, I'd put them in a list like this (generated via list comprehensions or loops or whatnot):
dfs = [df0, df1, df2, dfN]
Assuming they have some common column, like name
in your example, I'd do the following:
df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), dfs)
That way, your code should work with whatever number of dataframes you want to merge.
Edit August 1, 2016: For those using Python 3: reduce
has been moved into functools
. So to use this function, you'll first need to import that module:
from functools import reduce
这篇关于 pandas 三向联接列上的多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!