Pandas GroupBy 的绘图结果 [英] Plotting results of Pandas GroupBy
问题描述
我开始学习 Pandas,并试图找到最 Pythonic(或 panda-thonic?)的方法来完成某些任务.
I'm starting to learn Pandas and am trying to find the most Pythonic (or panda-thonic?) ways to do certain tasks.
假设我们有一个包含 A、B 和 C 列的 DataFrame.
Suppose we have a DataFrame with columns A, B, and C.
- A 列包含布尔值:每一行的 A 值要么是真要么是假.
- B 列有一些我们想要绘制的重要值.
我们想要发现的是 A 设置为 false 的行的 B 值与 A 设置为 true 的行的 B 值之间的细微区别.
What we want to discover is the subtle distinctions between B values for rows that have A set to false, vs. B values for rows that have A is true.
换句话说,如何按 A 列的值(真或假)进行分组,然后在同一图表上绘制两组 B 列的值? 这两个数据集应该用不同的颜色来区分点.
In other words, how can I group by the value of column A (either true or false), then plot the values of column B for both groups on the same graph? The two datasets should be colored differently to be able to distinguish the points.
接下来,让我们为该程序添加另一个功能:在绘制图形之前,我们要为每一行计算另一个值并将其存储在 D 列中.该值是在 A 之前整个五分钟内存储在 B 中的所有数据的平均值记录 - 但我们只包括存储在 A 中的具有相同布尔值的行.
Next, let's add another feature to this program: before graphing, we want to compute another value for each row and store it in column D. This value is the mean of all data stored in B for the entire five minutes before a record - but we only include rows that have the same boolean value stored in A.
换句话说,如果我有一行 A=True
和 time=t
,我想计算 D 列的值,即从时间 t-5
到 t
具有相同 A=True
的所有记录的 B 平均值.
In other words, if I have a row where A=True
and time=t
, I want to compute a value for column D that is the mean of B for all records from time t-5
to t
that have the same A=True
.
在这种情况下,我们如何对 A 的值执行 groupby,然后将此计算应用于每个单独的组,最后绘制这两个组的 D 值?
In this case, how can we execute the groupby on values of A, then apply this computation to each individual group, and finally plot the D values for the two groups?
推荐答案
我认为 @herrfz 达到了所有的高点.我将充实细节:
I think @herrfz hit all the high points. I'll just flesh out the details:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
sin = np.sin
cos = np.cos
pi = np.pi
N = 100
x = np.linspace(0, pi, N)
a = sin(x)
b = cos(x)
df = pd.DataFrame({
'A': [True]*N + [False]*N,
'B': np.hstack((a,b))
})
for key, grp in df.groupby(['A']):
plt.plot(grp['B'], label=key)
grp['D'] = pd.rolling_mean(grp['B'], window=5)
plt.plot(grp['D'], label='rolling ({k})'.format(k=key))
plt.legend(loc='best')
plt.show()
这篇关于Pandas GroupBy 的绘图结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!