Python: pandas 合并多个数据帧 [英] Python: pandas merge multiple dataframes

查看:28
本文介绍了Python: pandas 合并多个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有不同的数据框,需要根据日期列将它们合并在一起.如果我只有两个数据帧,我可以使用 df1.merge(df2, on='date'),用三个数据帧来做,我使用 df1.merge(df2.merge(df3, on='date'), on='date'),但是使用多个数据帧来执行它变得非常复杂且不可读.

I have diferent dataframes and need to merge them together based on the date column. If I only had two dataframes, I could use df1.merge(df2, on='date'), to do it with three dataframes, I use df1.merge(df2.merge(df3, on='date'), on='date'), however it becomes really complex and unreadable to do it with multiple dataframes.

所有数据帧都有一个共同的列 -date,但它们的行数和列数都不相同,我只需要每个数据帧中每个日期都相同的那些行.

All dataframes have one column in common -date, but they don't have the same number of rows nor columns and I only need those rows in which each date is common to every dataframe.

所以,我正在尝试编写一个递归函数,该函数返回一个包含所有数据的数据帧,但它不起作用.那么我应该如何合并多个数据帧?

So, I'm trying to write a recursion function that returns a dataframe with all data but it didn't work. How should I merge multiple dataframes then?

我尝试了不同的方法,但遇到了超出范围keyerror 0/1/2/3无法将DataFrame与类型为<class 'NoneType'>.

I tried diferent ways and got errors like out of range, keyerror 0/1/2/3 and can not merge DataFrame with instance of type <class 'NoneType'>.

这是我写的脚本:

dfs = [df1, df2, df3] # list of dataframes

def mergefiles(dfs, countfiles, i=0):
    if i == (countfiles - 2): # it gets to the second to last and merges it with the last
        return

    dfm = dfs[i].merge(mergefiles(dfs[i+1], countfiles, i=i+1), on='date')
    return dfm

print(mergefiles(dfs, len(dfs)))

一个例子:df_1:

May 19, 2017;1,200.00;0.1%
May 18, 2017;1,100.00;0.1%
May 17, 2017;1,000.00;0.1%
May 15, 2017;1,901.00;0.1%

df_2:

May 20, 2017;2,200.00;1000000;0.2%
May 18, 2017;2,100.00;1590000;0.2%
May 16, 2017;2,000.00;1230000;0.2%
May 15, 2017;2,902.00;1000000;0.2%

df_3:

May 21, 2017;3,200.00;2000000;0.3%
May 17, 2017;3,100.00;2590000;0.3%
May 16, 2017;3,000.00;2230000;0.3%
May 15, 2017;3,903.00;2000000;0.3%

预期的合并结果:

May 15, 2017;  1,901.00;0.1%;  2,902.00;1000000;0.2%;   3,903.00;2000000;0.3%   

推荐答案

下面是在不涉及复杂查询的情况下合并多个数据帧的最干净、最易于理解的方式.

Below, is the most clean, comprehensible way of merging multiple dataframe if complex queries aren't involved.

只需简单地以 DATE 为索引合并,然后使用 OUTER 方法合并(获取所有数据).

Just simply merge with DATE as the index and merge using OUTER method (to get all the data).

import pandas as pd
from functools import reduce

df1 = pd.read_table('file1.csv', sep=',')
df2 = pd.read_table('file2.csv', sep=',')
df3 = pd.read_table('file3.csv', sep=',')

现在,基本上将您拥有的所有文件作为数据框加载到列表中.然后,使用 mergereduce 函数合并文件.

Now, basically load all the files you have as data frame into a list. And, then merge the files using merge or reduce function.

# compile the list of dataframes you want to merge
data_frames = [df1, df2, df3]

注意:您可以在上面的列表中添加任意数量的数据框.这是这种方法的优点.不涉及复杂的查询.

Note: you can add as many data-frames inside the above list. This is the good part about this method. No complex queries involved.

要保留属于同一日期的值,您需要在 DATE

To keep the values that belong to the same date you need to merge it on the DATE

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['DATE'],
                                            how='outer'), data_frames)

# if you want to fill the values that don't exist in the lines of merged dataframe simply fill with required strings as

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['DATE'],
                                            how='outer'), data_frames).fillna('void')

  • 现在,输出将来自同一行的同一日期的值.
  • 您可以使用 fillna() 为不同的列填充来自不同帧的不存在的数据.
  • 然后根据需要将合并的数据写入 csv 文件.

    Then write the merged data to the csv file if desired.

    pd.DataFrame.to_csv(df_merged, 'merged.txt', sep=',', na_rep='.', index=False)
    

    这应该给你

    DATE VALUE1 VALUE2 VALUE3 ....

    这篇关于Python: pandas 合并多个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆