Python pandas - pd.melt具有datetime索引的数据帧导致NaN [英] Python pandas - pd.melt a dataframe with datetime index results in NaN

查看:646
本文介绍了Python pandas - pd.melt具有datetime索引的数据帧导致NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框(sim_2005):

 日期ELEM1 ELEM2 ... ELEM1133 
2005-01 -01 0.021 2.455 ... 345.2
2005-01-02 0.321 2.331 ... 355.1
... ... ... ...
2005-12- 31 0.789 3.456 ... 459.9
[365行×1133列]

with 日期是一个 pandas.tseries.index.DatetimeIndex 。借助于使用大熊猫融合功能的 @ ami-tavory ,我改变了它:

  sim_2005_melted = pd.melt(sim_2005,id_vars = sim_2005.index.name,value_vars = list(sim_2005.columns.values),var_name ='ELEM',value_name ='Q_sim')。sort(columns ='Date')

其中导致:

  ID日期ELEM Q_sim 
1 NaN ELEM1 0.021
2 NaN ELEM1 0.321
...
366 NaN ELEM2 2.455
367 NaN ELEM2 2.331
...
402983 NaN ELEM1133 345.2
402984 NaN ELEM1133 355.1
pre>

由于某种原因,datetime索引没有传输,列填充有NaN。任何帮助或想法有什么问题?

解决方案

这是一种使用 .stack()来解决你的问题。

  import pandas as pd 
import numpy as np

#尝试模拟您的数据
columns = ['ELEM'+ str(x)for n in np.arange(1,1134,1)]
sim_2005 = pd.DataFrame(np .random.randn(365,1133),index = pd.date_range('2005-01-01',periods = 365,freq ='D'),columns = columns

processed_sim_2005 = sim_2005 .stack()。reset_index()
processed_sim_2005.columns = ['Date','ELEM','Q_sim']

输出[82]:
日期ELEM Q_sim
0 2005-01-01 ELEM1 0.6221
1 2005-01-01 ELEM2 0.1862
2 2005-01-01 ELEM3 -1.0736
3 2005-01-01 ELEM4 -0.9756
4 2005-01-01 ELEM5 0.8397
... ... ...
413540 2005-12-31 ELEM1129 0.0345
413541 2005-12-31 ELEM1130 0.5522
413542 2005-12-31 ELEM1131 -0.6900
413543 2005-12-31 ELEM1132 -0.2269
413544 2005-12-31 ELEM1133 0.1243

[413545行×3列]


I have the following dataframe (sim_2005):

Date         ELEM1 ELEM2 ... ELEM1133
2005-01-01   0.021 2.455 ... 345.2
2005-01-02   0.321 2.331 ... 355.1
...          ...   ...   ... ...
2005-12-31   0.789 3.456 ... 459.9
[365 rows x 1133 columns]

with Date being a pandas.tseries.index.DatetimeIndex. I transformed it with the help of @ami-tavory using pandas melt function:

 sim_2005_melted = pd.melt(sim_2005, id_vars=sim_2005.index.name, value_vars=list(sim_2005.columns.values), var_name='ELEM', value_name='Q_sim').sort(columns='Date')

Which results in:

ID     Date   ELEM     Q_sim
1      NaN    ELEM1    0.021
2      NaN    ELEM1    0.321
...
366    NaN    ELEM2    2.455
367    NaN    ELEM2    2.331
...
402983 NaN    ELEM1133 345.2
402984 NaN    ELEM1133 355.1

For some reason the datetime index is not transported over and the column is filled with NaN's. Any help or idea what's wrong?

解决方案

Here is one way to use .stack() to solve your question.

import pandas as pd
import numpy as np

# try to simulate your data
columns = ['ELEM' + str(x) for x in np.arange(1, 1134, 1)]
sim_2005 = pd.DataFrame(np.random.randn(365, 1133), index=pd.date_range('2005-01-01', periods=365, freq='D'), columns=columns)

processed_sim_2005 = sim_2005.stack().reset_index()
processed_sim_2005.columns = ['Date', 'ELEM', 'Q_sim']

Out[82]: 
             Date      ELEM   Q_sim
0      2005-01-01     ELEM1  0.6221
1      2005-01-01     ELEM2  0.1862
2      2005-01-01     ELEM3 -1.0736
3      2005-01-01     ELEM4 -0.9756
4      2005-01-01     ELEM5  0.8397
...           ...       ...     ...
413540 2005-12-31  ELEM1129  0.0345
413541 2005-12-31  ELEM1130  0.5522
413542 2005-12-31  ELEM1131 -0.6900
413543 2005-12-31  ELEM1132 -0.2269
413544 2005-12-31  ELEM1133  0.1243

[413545 rows x 3 columns]

这篇关于Python pandas - pd.melt具有datetime索引的数据帧导致NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆