如何在循环中附加多个 pandas DataFrame？ [英] How to append multiple pandas DataFrames in a loop?

查看：220 发布时间：2020/6/6 19:24:48 python pandas csv dataframe append

本文介绍了如何在循环中附加多个 pandas DataFrame？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

一段时间以来，我一直在努力解决这个Python问题，并陷入困境。我正在循环浏览多个csv文件，并希望一个数据框以这样的方式附加csv文件：每个csv文件中的一列是列名，并设置date_time的公共索引。

I've been banging my head on this python problem for a while and am stuck. I am for-looping through several csv files and want one data frame that appends the csv files in a way that one column from each csv file is a column name and sets a common index of a date_time.

除了不同的 value 和之外，共有11个类似此数据框的csv文件pod 号，但 time_stamp 对于所有csv都是相同的。

There are 11 csv files that look like this data frame except for different value and pod number, but the time_stamp is the same for all the csvs.

数据

    pod time_stamp  value
0   97  2016-02-22  3.048000
1   97  2016-02-29  23.622001
2   97  2016-03-07  13.970001
3   97  2016-03-14  6.604000
4   97  2016-03-21  NaN

这是-到目前为止，我有一个循环：

And this is the for-loop that I have so far:

import glob
import pandas as pd

filenames = sorted(glob.glob('*.csv'))

new = []

for f in filenames:
    data = pd.read_csv(f)

    time_stamp = [pd.to_datetime(d) for d in time_stamp]

    new.append(data)

my_df = pd.DataFrame(new, columns=['pod','time_stamp','value'])

什么我想要的是一个看起来像这样的数据框，其中每一列都是每个csv文件中 value 的结果。

What I want is a data frame that looks like this where each column is the result of value from each of the csv files.

time_stamp  97        98       99 ...
2016-02-22  3.04800   4.20002  3.5500
2016-02-29. 23.62201  24.7392  21.1110
2016-03-07 13.97001   11.0284  12.0000

但是现在的输出 my_df 是非常错误的，看起来像这样。有什么想法我做错了吗？

But right now the output of my_df is very wrong and looks like this. Any ideas of where I went wrong?

    0
0   pod time_stamp value 0 22 2016-...
1   pod time_stamp value 0 72 2016-...
2   pod time_stamp value 0 79 2016-0...
3   pod time_stamp value 0 86 2016-...
4   pod time_stamp value 0 87 2016-...
5   pod time_stamp value 0 88 2016-...
6   pod time_stamp value 0 90 2016-0...
7   pod time_stamp value 0 93 2016-0...
8   pod time_stamp value 0 95 2016-...

推荐答案

我建议先将所有数据框与 pd.concat 串联在一起，然后再做一个最终的 pivot 操作。

I'd recommend first concatenating all your dataframes together with pd.concat, and then doing one final pivot operation.

filenames = sorted(glob.glob('*.csv'))

new = [pd.read_csv(f, parse_dates=['time_stamp']) for f in filenames]
df = pd.concat(new) # omit axis argument since it is 0 by default

df = df.pivot(index='time_stamp', columns='pod')

请注意，我是在加载数据帧时，将 read_csv 解析为 time_stamp ，因此不再需要加载后的解析

Note that I'm forcing read_csv to parse time_stamp when loading the dataframe, so parsing after loading is no longer required.

MCVE

df

   pod  time_stamp      value
0   97  2016-02-22   3.048000
1   97  2016-02-29  23.622001
2   97  2016-03-07  13.970001
3   97  2016-03-14   6.604000
4   97  2016-03-21        NaN

df.pivot(index='time_stamp', columns='pod')

                value
pod                97
time_stamp           
2016-02-22   3.048000
2016-02-29  23.622001
2016-03-07  13.970001
2016-03-14   6.604000
2016-03-21        NaN

这篇关于如何在循环中附加多个 pandas DataFrame？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在循环中附加多个 pandas DataFrame？ [英] How to append multiple pandas DataFrames in a loop?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在循环中附加多个 pandas DataFrame？ [英] How to append multiple pandas DataFrames in a loop?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭