如何在 pandas DataFrame中填写缺失的小时值 [英] How to fill the missing hour Values in a pandas DataFrame
问题描述
我有一个pandas数据框,它是一个返回小时值的sql查询的输出 如果这些值不符合特定的阈值.
I have a pandas dataframe which is the output of a sql query which returns hourly values if the values do not meet a particular Threshold.
date_date | hour24 | column
------------------------------------
2017-10-29 | 00:00 | 5.8055152395
2017-10-29 | 01:00 | 1.2578616352
2017-10-29 | 02:00 | -1.5197568389
2017-10-29 | 03:00 | -12.5560538117
2017-10-29 | 04:00 | -15.6862745098
2017-10-29 | 05:00 | -18.487394958
2017-10-29 | 06:00 | -13.2911392405
2017-10-29 | 07:00 | -9.3385214008
2017-10-29 | 08:00 | -15.3846153846
2017-10-28 | 00:00 | 6.9666182874
2017-10-28 | 01:00 | 8.3857442348
2017-10-28 | 02:00 | 8.8145896657
2017-10-28 | 03:00 | 4.0358744395
2017-10-28 | 04:00 | 13.0718954248
2017-10-28 | 05:00 | 0
2017-10-28 | 06:00 | 13.9240506329
2017-10-28 | 07:00 | 24.513618677
我使用此输出来创建报告. 因此,对于查询每小时返回的值,我希望将其标记为失败",但我也希望 值未超过将被标记为通过"的阈值. 例如
I use this output to create a report. So for each hour if the query returns a value I want it to be marked as Failed, but I also want the hours for which the values didn't cross the threshold to be marked as Passed. For e.g.
date_date | hour24 | Result
------------------------------
2017-10-29 | 00:00 | Failed
2017-10-29 | 01:00 | Failed
2017-10-29 | 02:00 | Failed
2017-10-29 | 03:00 | Failed
2017-10-29 | 04:00 | Failed
2017-10-29 | 05:00 | Failed
2017-10-29 | 06:00 | Failed
2017-10-29 | 07:00 | Failed
2017-10-29 | 08:00 | Failed
2017-10-29 | 09:00 | Passed
2017-10-29 | 10:00 | Passed
2017-10-29 | 11:00 | Passed
2017-10-29 | 12:00 | Passed
2017-10-29 | 13:00 | Passed
2017-10-29 | 14:00 | Passed
2017-10-29 | 15:00 | Passed
2017-10-29 | 16:00 | Passed
2017-10-29 | 17:00 | Passed
2017-10-29 | 18:00 | Passed
2017-10-29 | 19:00 | Passed
2017-10-29 | 20:00 | Passed
2017-10-29 | 21:00 | Passed
2017-10-29 | 22:00 | Passed
2017-10-29 | 23:00 | Passed
2017-10-28 | 00:00 | Failed
2017-10-28 | 01:00 | Failed
.
.
.
推荐答案
您可以创建一个示例数据框,其中包含报告所需的列,例如
You can create a sample dataframe with columns required for reporting like
In [1]: reporting_df.columns
Out[1]: Index(['date_date', 'Hour'], dtype='object')`
然后将data_date列上SQL查询输出中的report_df与数据框合并
And merge the reporting_df with the dataframe from your SQL query output on data_date column
In [2]: out_df = pd.merge(left=reporting_df, right=query_df, on='date_date', how='inner')
Out[2]: out_df.head(3)
date_date hour24 column
2017-10-29 00:00 5.8055152395
2017-10-29 01:00 1.2578616352
2017-10-29 02:00 -1.5197568389
2017-10-29 03:00 -12.5560538117
2017-10-29 04:00 NaN
并使用np.where来获取状态
and use np.where for getting status
In [3]: out_df['Status'] = np.where(pd.isnull(out_df['column']), 'Success', 'Fail')
并删除不需要的列
In [4]: out_df.drop('column', axis=1, inplace=True)
In [5]: out_df.head(3)
Out[5]:
date_date hour24 status
2017-10-29 00:00 Fail
2017-10-29 01:00 Fail
2017-10-29 02:00 Fail
2017-10-29 03:00 Fail
2017-10-29 04:00 Pass
这篇关于如何在 pandas DataFrame中填写缺失的小时值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!