如何在 pandas DataFrame中填写缺失的小时值 [英] How to fill the missing hour Values in a pandas DataFrame

查看:82
本文介绍了如何在 pandas DataFrame中填写缺失的小时值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个pandas数据框,它是一个返回小时值的sql查询的输出 如果这些值不符合特定的阈值.

I have a pandas dataframe which is the output of a sql query which returns hourly values if the values do not meet a particular Threshold.


 date_date  | hour24 | column
------------------------------------
 2017-10-29 | 00:00  | 5.8055152395
 2017-10-29 | 01:00  | 1.2578616352
 2017-10-29 | 02:00  | -1.5197568389
 2017-10-29 | 03:00  | -12.5560538117
 2017-10-29 | 04:00  | -15.6862745098
 2017-10-29 | 05:00  | -18.487394958
 2017-10-29 | 06:00  | -13.2911392405
 2017-10-29 | 07:00  | -9.3385214008
 2017-10-29 | 08:00  | -15.3846153846
 2017-10-28 | 00:00  | 6.9666182874
 2017-10-28 | 01:00  | 8.3857442348
 2017-10-28 | 02:00  | 8.8145896657
 2017-10-28 | 03:00  | 4.0358744395
 2017-10-28 | 04:00  | 13.0718954248
 2017-10-28 | 05:00  | 0
 2017-10-28 | 06:00  | 13.9240506329
 2017-10-28 | 07:00  | 24.513618677

我使用此输出来创建报告. 因此,对于查询每小时返回的值,我希望将其标记为失败",但我也希望 值未超过将被标记为通过"的阈值. 例如

I use this output to create a report. So for each hour if the query returns a value I want it to be marked as Failed, but I also want the hours for which the values didn't cross the threshold to be marked as Passed. For e.g.


 date_date  | hour24 | Result
------------------------------
 2017-10-29 | 00:00  | Failed
 2017-10-29 | 01:00  | Failed
 2017-10-29 | 02:00  | Failed
 2017-10-29 | 03:00  | Failed
 2017-10-29 | 04:00  | Failed
 2017-10-29 | 05:00  | Failed
 2017-10-29 | 06:00  | Failed
 2017-10-29 | 07:00  | Failed
 2017-10-29 | 08:00  | Failed
 2017-10-29 | 09:00  | Passed
 2017-10-29 | 10:00  | Passed
 2017-10-29 | 11:00  | Passed
 2017-10-29 | 12:00  | Passed
 2017-10-29 | 13:00  | Passed
 2017-10-29 | 14:00  | Passed
 2017-10-29 | 15:00  | Passed
 2017-10-29 | 16:00  | Passed
 2017-10-29 | 17:00  | Passed
 2017-10-29 | 18:00  | Passed
 2017-10-29 | 19:00  | Passed
 2017-10-29 | 20:00  | Passed
 2017-10-29 | 21:00  | Passed
 2017-10-29 | 22:00  | Passed
 2017-10-29 | 23:00  | Passed
 2017-10-28 | 00:00  | Failed
 2017-10-28 | 01:00  | Failed
.
.
.

推荐答案

您可以创建一个示例数据框,其中包含报告所需的列,例如

You can create a sample dataframe with columns required for reporting like

In [1]: reporting_df.columns
Out[1]: Index(['date_date', 'Hour'], dtype='object')`

然后将data_date列上SQL查询输出中的report_df与数据框合并

And merge the reporting_df with the dataframe from your SQL query output on data_date column

In [2]: out_df = pd.merge(left=reporting_df, right=query_df, on='date_date', how='inner')
Out[2]: out_df.head(3)
date_date hour24 column
2017-10-29 00:00 5.8055152395
2017-10-29 01:00 1.2578616352
2017-10-29 02:00 -1.5197568389
2017-10-29 03:00 -12.5560538117
2017-10-29 04:00 NaN

并使用np.where来获取状态

and use np.where for getting status

In [3]: out_df['Status'] = np.where(pd.isnull(out_df['column']), 'Success', 'Fail')

并删除不需要的列

In [4]: out_df.drop('column', axis=1, inplace=True)
In [5]: out_df.head(3)
Out[5]: 
date_date hour24 status
2017-10-29 00:00 Fail
2017-10-29 01:00 Fail
2017-10-29 02:00 Fail
2017-10-29 03:00 Fail
2017-10-29 04:00 Pass

这篇关于如何在 pandas DataFrame中填写缺失的小时值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆