绘制Pandas Group的结果 [英] Plotting results of Pandas GroupBy

查看:169
本文介绍了绘制Pandas Group的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始学习熊猫,并试图找到最具Pythonic(或panda-thonic?)方式来完成某些任务。

假设我们有一个DataFrame包含列A,B和C.


  • 列A包含布尔值:每行的A值为true或false。 li>
  • B列有一些我们想要绘制的重要值。


我们想要发现的是如果行的A值设置为false,那么B值之间的细微差别与A的行的B值为true。 换句话说,怎么能我按列A的值(无论是true还是false)进行分组,然后在同一个图上绘制两个组的列B的值?这两个数据集的颜色应该不同,以便能够区分这些点。 / p>




接下来,让我们为该程序添加另一个特性:在绘制图形之前,我们要计算每行的另一个值并存储它在D列。这个值是在记录之前的整个五分钟内存储在B中的所有数据的平均值 - 但我们只包括具有存储在A中的相同布尔值的行。换句话说,换句话说, ,如果我有一行,其中 A = True time = t ,我想计算一个值对于D列来说,它是从时间 t-5 t 的所有记录的B的均值, code> A = True 。



在这种情况下,我们如何执行groupby A,然后将这个计算应用到每个单独的组,然后绘制这两个组的D值?解析方案

我认为@herrfz命中所有的高点。

 导入pandas作为pd 
导入numpy作为np
导入matplotlib.pyplot为plt

sin = np.sin
cos = np.cos
pi = np.pi
N = 100

x = np.linspace(0,pi,N)
a = sin(x)
b = cos(x)

df = pd.DataFrame({
' A':[True] * N + [False] * N,
'B':np.hstack((a,b))
})

for key, grp in df.groupby(['A']):
plt.plot(grp ['B'],label = key)
grp ['D'] = pd.rolling_mean(grp [ B'],window = 5)
plt.plot(grp ['D'],label ='rolling({k})'。format(k = key))
plt.legend(loc ='best')
plt.show()


I'm starting to learn Pandas and am trying to find the most Pythonic (or panda-thonic?) ways to do certain tasks.

Suppose we have a DataFrame with columns A, B, and C.

  • Column A contains boolean values: each row's A value is either true or false.
  • Column B has some important values we want to plot.

What we want to discover is the subtle distinctions between B values for rows that have A set to false, vs. B values for rows that have A is true.

In other words, how can I group by the value of column A (either true or false), then plot the values of column B for both groups on the same graph? The two datasets should be colored differently to be able to distinguish the points.


Next, let's add another feature to this program: before graphing, we want to compute another value for each row and store it in column D. This value is the mean of all data stored in B for the entire five minutes before a record - but we only include rows that have the same boolean value stored in A.

In other words, if I have a row where A=True and time=t, I want to compute a value for column D that is the mean of B for all records from time t-5 to t that have the same A=True.

In this case, how can we execute the groupby on values of A, then apply this computation to each individual group, and finally plot the D values for the two groups?

解决方案

I think @herrfz hit all the high points. I'll just flesh out the details:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

sin = np.sin
cos = np.cos
pi = np.pi
N = 100

x = np.linspace(0, pi, N)
a = sin(x)
b = cos(x)

df = pd.DataFrame({
    'A': [True]*N + [False]*N,
    'B': np.hstack((a,b))
    })

for key, grp in df.groupby(['A']):
    plt.plot(grp['B'], label=key)
    grp['D'] = pd.rolling_mean(grp['B'], window=5)    
    plt.plot(grp['D'], label='rolling ({k})'.format(k=key))
plt.legend(loc='best')    
plt.show()

这篇关于绘制Pandas Group的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆