将 pandas 数据框处理为小提琴图 [英] Process pandas dataframe into violinplot

查看:109
本文介绍了将 pandas 数据框处理为小提琴图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有从Excel电子表格中读取的数据.数据针对S1至S6六个场景中的每一个都有许多观测值.当我将数据读入数据框df时,它看起来如下:

I have data I'm reading from an Excel spreadsheet. The data has a number of observations for each of six scenarios, S1 to S6. When I read in the data to my dataframe df, it looks as follows:

      Scenario        LMP
0           S1 -21.454544
1           S1 -20.778094
2           S1 -20.027689
3           S1 -19.747170
4           S1 -20.814405
5           S1 -21.955406
6           S1 -23.018960
...
12258       S6 -34.089906
12259       S6 -34.222814
12260       S6 -26.712010
12261       S6 -24.555973
12262       S6 -23.062616
12263       S6 -20.488411

我想创建一个小提琴图,该小提琴图针对六个场景中的每一个都具有不同的小提琴.我是Pandas和Dataframe的新手,尽管在过去的一天中进行了大量研究/测试,但我仍无法找出一种优雅的方式将一些参考传递给我的Dataframe(针对每种情况将其分为不同的系列) )可以在axes.violinplot()语句中使用.例如,我尝试了以下方法,但不起作用.我在axes.violinplot语句上收到"ValueError:无法将大小为1752的序列复制到尺寸为2的数组轴".

I want to create a violinplot that has a different violin for each of the six scenarios. I'm new to Pandas and dataframes, and despite much research/testing over the last day, I can't quite figure out an elegant way to pass some reference(s) to my dataframe (to split it into different series for each scenario) that will work in the axes.violinplot() statement. For instance, I've tried the following, which doesn't work. I get a "ValueError: cannot copy sequence with size 1752 to array axis with dimension 2" on my axes.violinplot statement.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# load data into a dataframe
df = pd.read_excel('Modeling analysis charts.xlsx',
                   sheetname='lmps',
                   parse_cols=[7,12],
                   skiprows=0,
                   header=1)

fontsize = 10

fig, axes = plt.subplots()

axes.violinplot(dataset = [[df.loc[df.Scenario == 'S1']],
                           [df.loc[df.Scenario == 'S2']],
                           [df.loc[df.Scenario == 'S3']],
                           [df.loc[df.Scenario == 'S4']],
                           [df.loc[df.Scenario == 'S5']],
                           [df.loc[df.Scenario == 'S6']]
                          ]
                )
axes.set_title('Day Ahead Market')

axes.yaxis.grid(True)
axes.set_xlabel('Scenario')
axes.set_ylabel('LMP ($/MWh)')

plt.show()

推荐答案

您需要小心如何创建要绘制的数据集.在问题代码中,您具有一个数据框的列表列表.但是,您只需要一个一列数据帧的列表即可.

You need to be careful how to create the dataset to plot. In the code from the question you have a list of lists of one dataframe. However you need simply a list of one-column dataframes.

因此,您还只需要从过滤的数据框中获取"LMP"列,否则小提琴图将不知道要绘制哪一列.

You would therefore also need to take only the "LMP" column from the filtered dataframes, otherwise the violinplot wouldn't know which column to plot.

这是一个与原始代码非常接近的有效示例:

Here is a working example which stays close to the original code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


x = np.random.poisson(lam =3, size=100)
y = np.random.choice(["S{}".format(i+1) for i in range(6)], size=len(x))
df = pd.DataFrame({"Scenario":y, "LMP":x})

fig, axes = plt.subplots()

axes.violinplot(dataset = [df[df.Scenario == 'S1']["LMP"].values,
                           df[df.Scenario == 'S2']["LMP"].values,
                           df[df.Scenario == 'S3']["LMP"].values,
                           df[df.Scenario == 'S4']["LMP"].values,
                           df[df.Scenario == 'S5']["LMP"].values,
                           df[df.Scenario == 'S6']["LMP"].values ] )

axes.set_title('Day Ahead Market')
axes.yaxis.grid(True)
axes.set_xlabel('Scenario')
axes.set_ylabel('LMP ($/MWh)')

plt.show()

这篇关于将 pandas 数据框处理为小提琴图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆