pandas -合并多个DataFrames [英] pandas - merging multiple DataFrames

查看:59
本文介绍了 pandas -合并多个DataFrames的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个多部分的问题.我似乎无法将所有内容结合在一起.目的是创建一个我可以按如下方式访问的DataFrame(使用MultiIndex进行猜测):

This is a multi-part question. I just can't seem to combine everything together. The goal is to to create one DataFrame (guessing using MultiIndex) that I can access as follows:

ticker = 'GOLD'
date = pd.to_datetime('1978/03/31')
current_bar = df.ix[ticker].ix[date]

然后我可以说:current_bar.Last吗?

Can I then just say: current_bar.Last ?

无论如何,这是文件,以及我如何加载它们.

Anyway, here are the files, and how I load them.

In [108]: df = pd.read_csv('GOLD.csv', parse_dates='Date', index_col='Date')
In [109]: df
Out[109]: 
            Exp       Last     Volume
Date
1978-03-30  198002    995.6    54
1978-03-31  198002    999.5    78

In [110]: df2 = pd.read_csv('SPX.csv', parse_dates='Date', index_col='Date')
In [111]: df2
Out[111]: 
            Exp       Last     Volume
Date
1978-03-30  198003    215.5    25
1978-03-31  198003    214.1    99

理想情况下,我希望它看起来像这样:

Ideally, I want it to look like this (I think):

ticker      GOLD                            SPX
values      Exp       Last     Volume       Exp       Last     Volume
Date
1978-03-30  198002    995.6    54           198003    215.5    25
1978-03-31  198002    999.5    78           198003    214.1    99

  1. 我想我的问题是:
    • 如何进行分层(每个文件的实际数据具有20多个相同的列)
    • 然后我如何合并文件(我需要将大约100个文件全部放入1个DataFrame中)
    • 我的假设是正确的,然后我可以做:current_bar.Last获取值吗?
  1. I guess my questions are:
    • How do I make this Hierarchical (the actual data has 20+ identical columns for each file)
    • How do I then combine the files (I have about 100 that need to all go in 1 DataFrame)
    • Is my assumption correct that I can then just do: current_bar.Last to get values?

非常感谢.

推荐答案

您可以使用pd.concat连接 DataFrame. (并置将DataFrame融合在一起,而 merging 则根据共同的索引或列将DataFrame合并在一起).提供keys参数时,您将获得一个层次结构索引:

You can use pd.concat to concatenate DataFrames. (Concatenating smushes DataFrames together, while merging joins DataFrames based on common indices or columns). When you supply the keys parameter, you get a hierarchical index:

import pandas as pd
df = pd.read_csv('GOLD.csv', parse_dates='Date', index_col='Date', sep='\s+')
df2 = pd.read_csv('SPX.csv', parse_dates='Date', index_col='Date', sep='\s+')
result = pd.concat([df, df2], keys=['GOLD', 'SPX'], names=['ticker']).unstack('ticker')
result = result.reorder_levels([1, 0], axis=1).sortlevel(level=0, axis=1)
print(result)

收益

ticker        GOLD                    SPX               
               Exp   Last  Volume     Exp   Last  Volume
Date                                                    
1978-03-30  198002  995.6      54  198003  215.5      25
1978-03-31  198002  999.5      78  198003  214.1      99

result['Last']产生DataFrame:

result['Last'] yields the DataFrame:

In [147]: result['Last']
Out[147]: 
ticker       GOLD    SPX
Date                    
1978-03-30  995.6  215.5
1978-03-31  999.5  214.1

我建议避免使用语法result.Last,因为它与result.last太近了,它会返回DataFrame方法.

I'd recommend avoiding the syntax result.Last because it is too close to result.last, which returns a DataFrame method.

要处理更多文件,您可以使用如下代码:

To handle more files, you might use code like this:

import pandas as pd
dfs = list()
for filename in filenames:
    df = pd.read_csv(filename, parse_dates='Date', index_col='Date')
    # compute moving_mean
    dfs.append(df)

keys = [filename[:-4] for filename in filenames]
result = pd.concat(dfs, keys=keys, names=['ticker']).unstack('ticker')

请注意,这确实需要足够的内存来保存内存中所有DataFrame的列表,再加上足够的内存来保存result.

Note that this does require enough memory to hold a list of all the DataFrames in memory plus enough memory to hold result.

这篇关于 pandas -合并多个DataFrames的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆