使用来自另一个Daframe的通用索引值来完成一个空的数据框 [英] fulfill an empty dataframe with common index values from another Daframe

查看:157
本文介绍了使用来自另一个Daframe的通用索引值来完成一个空的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个daframe,其周期为1个月,频率为1秒.

I have a daframe with a series of period 1 month and frequency one second.

记录之间的时间步长不总是1秒的问题.

The problem the time step between records is not always 1 second.

time                c1  c2
2013-01-01 00:00:01 5   3
2013-01-01 00:00:03 7   2
2013-01-01 00:00:04 1   5
2013-01-01 00:00:05 4   3
2013-01-01 00:00:06 5   6
2013-01-01 00:00:09 4   2
2013-01-01 00:00:10 7   8

然后,我想创建一个具有相同列的空数据框,并在整个期间内进行更正.这意味着一个月有几秒钟的记录.原则上,此空数据帧使用nan值来实现:

Then I want to create an empty dataframe with the same columns and for the whole period corrected. That means with as many records as seconds has a month. This empty dataframe is fulfilled in principle with nan values:

time                c1  c2
2013-01-01 00:00:01 nan nan
2013-01-01 00:00:02 nan nan
2013-01-01 00:00:03 nan nan
2013-01-01 00:00:04 nan nan
2013-01-01 00:00:05 nan nan
2013-01-01 00:00:06 nan nan
2013-01-01 00:00:07 nan nan
2013-01-01 00:00:08 nan nan
2013-01-01 00:00:09 nan nan
2013-01-01 00:00:10 nan nan

然后将两者与我的第一个数据帧的公共行进行比较,并完成空的行.不常见的应该保留nan值.

Then compare both, and fulfill the empty one, with the common rows with my first dataframe. The non-common should remain with nan values.

time                c1  c2
2013-01-01 00:00:01 5   3
2013-01-01 00:00:02 nan nan
2013-01-01 00:00:03 7   2
2013-01-01 00:00:04 1   5
2013-01-01 00:00:05 4   3
2013-01-01 00:00:06 5   6
2013-01-01 00:00:07 nan nan
2013-01-01 00:00:08 nan nan
2013-01-01 00:00:09 4   2
2013-01-01 00:00:10 7   8

我的尝试:

#Read from a file the first dataframe
df1=pd.read_table(fin,parse_dates=0],names=ch,index_col=0,header=0,decimal='.',skiprows=c)
#create an empty dataframe 
N=86400 * 31#seconds per month
index=pd.date_range(df1.index[0], periods=N-1, freq='1s')
df2=pd.DataFrame(index=index, columns=df1.columns)

现在,我尝试使用merge或concat,但没有预期的结果:

Now I try with merge or concat but without the expected result:

df2.merge(df1, how='outer')
pd.concat([df2,df1], axis=0, join='outer')

推荐答案

我认为您不需要第二个数据框.如果您呼叫 resample ,而没有fill_method,它将在缺少的时间段内存储NaN:

I don't think you need a second dataframe. If you call resample without a fill_method, it will store NaNs for the missing periods:

df.resample("s").max()
Out[62]: 
                      c1   c2
time                         
2013-01-01 00:00:01  5.0  3.0
2013-01-01 00:00:02  NaN  NaN
2013-01-01 00:00:03  7.0  2.0
2013-01-01 00:00:04  1.0  5.0
2013-01-01 00:00:05  4.0  3.0
2013-01-01 00:00:06  5.0  6.0
2013-01-01 00:00:07  NaN  NaN
2013-01-01 00:00:08  NaN  NaN
2013-01-01 00:00:09  4.0  2.0
2013-01-01 00:00:10  7.0  8.0

max()这里只是一个任意方法,因此它返回一个数据帧.假设没有重复项,可以将其替换为均值,最小值等.如果您有重复项,它们将通过该函数进行汇总.

max() here is just an arbitrary method so that it returns a dataframe. You can replace it with mean, min etc. assuming you have no duplicates. If you have duplicates, they will be aggregated by that function.

正如Paul H在评论中建议的那样,您可以使用df.resample("s").asfreq()而不进行任何汇总.它跳过了不必要的聚合步骤,因此可能更有效.如果索引中的值重复,则会引发错误.

As Paul H suggested in the comments, you can use df.resample("s").asfreq() without any aggregation. It skips an unnecessary step of aggregation so it is probably more efficient. It will raise an error if you have duplicate values in the index.

这篇关于使用来自另一个Daframe的通用索引值来完成一个空的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆