在Python中连续数据的箱形图 [英] Box plot for continuous data in Python

查看：325 发布时间：2020/9/23 2:33:35 python seaborn boxplot continuous

本文介绍了在Python中连续数据的箱形图的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含两列的csv文件：

col1- 时间戳 data（yyyy-mm-dd hh：mm：ss.ms（8个月数据））

col2：热量数据（连续变量）。

由于记录将近5万，我想将col1（timestamp col）分为几个月或几周，然后将箱形图应用于热量数据时间戳。我在R中尝试过
，需要很长时间。需要帮助以Python进行。我想我需要使用 seaborn.boxplot 。

请指导。

解决方案

按频率分组，然后地块组

第一个

  heat = np.random.random（24 * 300）* 100 
 date = pd.date_range（'1/1/2011'， period = 24 * 300，freq ='H'）
 df = pd.DataFrame（{'time'：dates，'temp'：heat}）
 df = df.set_index （时间）

要将数据划分为五个时间段，然后每周获取每个的箱图：

确定总时间跨度；除以五创建频率别名；然后groupby

  dt = df.index [-1]-df.index [0] 
 dt = dt / 5 
别名= f'{dt.total_seconds（）} S'
 gb = df.groupby（pd.Grouper（freq = alias））

每个组都是一个DataFrame，因此可以遍历这些组；在每个组中创建每周组，并对其进行框线绘制。

 为gb中的g，d_frame：
 gb_tmp = d_frame.groupby（pd.Grouper（freq ='7D'））
 ax = gb_tmp.boxplot（subplots = False）
 plt.setp（ax.xaxis.get_ticklabels（），rotation = 90）
 plt.show（）
 plt.close（）

有这样做可能是更好的方法，如果这样的话，我会发布它，或者有人会免费填写以进行编辑。看起来这可能导致最后一组没有完整的数据集。 ...

如果您知道数据是周期性的，则可以使用切片将其拆分。

  n = len（df）// 5 
 for tmp_df in（df [i：i + n] for i在范围（0，len（df），n）） ：
 gb_tmp = tmp_df.groupby（pd.Grouper（freq ='7D'））
 ax = gb_tmp.boxplot（subplots = False）
 plt.setp（ax.xaxis.get_ticklabels（ ），rotation = 90）
 plt.show（）
 plt.close（）

频率别名

 pandas.read_csv（）

pandas.Grouper（）

I have a csv file with 2 columns:

col1- Timestamp data(yyyy-mm-dd hh:mm:ss.ms (8 months data))
col2 : Heat data (continuous variable) .

Since there are almost 50k record, I would like to partition the col1(timestamp col) into months or weeks and then apply box plot on the heat data w.r.t timestamp. I tried in R,it takes a long time. Need help to do in Python. I think I need to use seaborn.boxplot.

Please guide.

解决方案

Group by Frequency then plot groups

First Read your csv data into a Pandas DataFrame

import numpy as np
import Pandas as pd
from matplotlib import pyplot as plt

# assumes NO header line in csv
df = pd.read_csv('\file\path', names=['time','temp'], parse_dates=[0])

I will use some fake data, 30 days of hourly samples.

heat = np.random.random(24*30) * 100
dates = pd.date_range('1/1/2011', periods=24*30, freq='H')
df = pd.DataFrame({'time':dates,'temp':heat})

Set the timestamps as the DataFrame's index

df = df.set_index('time')

Now group by by the period you want, seven days for this example

gb = df.groupby(pd.Grouper(freq='7D'))

Now you can plot each group separately

for g, week in gb2:
    #week.plot()
    week.boxplot()
    plt.title(f'Week Of {g.date()}')
    plt.show()
    plt.close()

And... I didn't realize you could do this but it is pretty cool

ax = gb.boxplot(subplots=False)
plt.setp(ax.xaxis.get_ticklabels(),rotation=30)
plt.show()
plt.close()

heat = np.random.random(24*300) * 100
dates = pd.date_range('1/1/2011', periods=24*300, freq='H')
df = pd.DataFrame({'time':dates,'temp':heat})
df = df.set_index('time')

To partition the data in five time periods then get weekly boxplots of each:

Determine the total timespan; divide by five; create a frequency alias; then groupby

dt = df.index[-1] - df.index[0]
dt = dt/5
alias = f'{dt.total_seconds()}S'
gb = df.groupby(pd.Grouper(freq=alias))

Each group is a DataFrame so iterate over the groups; create weekly groups from each and boxplot them.

for g,d_frame in gb:
    gb_tmp = d_frame.groupby(pd.Grouper(freq='7D'))
    ax = gb_tmp.boxplot(subplots=False)
    plt.setp(ax.xaxis.get_ticklabels(),rotation=90)
    plt.show()
    plt.close()

There might be a better way to do this, if so I'll post it or maybe someone will fill free to edit this. Looks like this could lead to the last group not having a full set of data. ...

If you know that your data is periodic you can just use slices to split it up.

n = len(df) // 5
for tmp_df in (df[i:i+n] for i in range(0, len(df), n)):
    gb_tmp = tmp_df.groupby(pd.Grouper(freq='7D'))
    ax = gb_tmp.boxplot(subplots=False)
    plt.setp(ax.xaxis.get_ticklabels(),rotation=90)
    plt.show()
    plt.close()

Frequency aliases
pandas.read_csv()
pandas.Grouper()

这篇关于在Python中连续数据的箱形图的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Python中连续数据的箱形图 [英] Box plot for continuous data in Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Python中连续数据的箱形图 [英] Box plot for continuous data in Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭