Python:Matplotlib-多个数据集的概率图 [英] Python: Matplotlib - probability plot for several data set

查看:461
本文介绍了Python:Matplotlib-多个数据集的概率图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个数据集(分布),如下所示:

I have several data sets (distribution) as follows:

set1 = [1,2,3,4,5]
set2 = [3,4,5,6,7]
set3 = [1,3,4,5,8]

我该如何用上面的数据集绘制散点图,而y轴是概率(即集合中分布的百分位数:0%-100%),x轴是数据集名称? 在JMP中,它称为分位数图".

How do I plot a scatter plot with the data sets above with the y-axis being the probability (i.e. the percentile of the distribution in set: 0%-100% ) and the x-axis being the data set names? in JMP, it is called 'Quantile Plot'.

类似图片的附件:

请进行教育.谢谢.

我的数据在csv中是这样的

My data is in csv as such:

使用JMP分析工具,我可以绘制概率分布图(如下图所示的QQ图/正态分位数图):

Using JMP analysis tool, I'm able to plot the probability distribution plot (QQ-plot/Normal Quantile Plot as figure far below):

我相信Joe Kington几乎可以解决我的问题,但是,我想知道如何将原始csv数据处理为概率或百分位数的数组.

I believe Joe Kington almost has my problem solved but, I'm wondering how to process the raw csv data into arrays of probalility or percentiles.

我这样做是为了在Python中自动进行一些统计分析,而不是依赖JMP进行绘图.

I doing this to automate some stats analysis in Python rather than depending on JMP for plotting.

推荐答案

我不清楚您想要什么,所以我想在这里...

I'm not entirely clear on what you want, so I'm going to guess, here...

您希望概率/百分位数"值是累积直方图吗?

You want the "Probability/Percentile" values to be a cumulative histogram?

那么对于一个情节,您会有类似的东西吗? (如上所示,使用标记来绘制它,而不是使用更传统的阶梯图...)

So for a single plot, you'd have something like this? (Plotting it with markers as you've shown above, instead of the more traditional step plot...)

import scipy.stats
import numpy as np
import matplotlib.pyplot as plt

# 100 values from a normal distribution with a std of 3 and a mean of 0.5
data = 3.0 * np.random.randn(100) + 0.5

counts, start, dx, _ = scipy.stats.cumfreq(data, numbins=20)
x = np.arange(counts.size) * dx + start

plt.plot(x, counts, 'ro')
plt.xlabel('Value')
plt.ylabel('Cumulative Frequency')

plt.show()

如果这大致是您想要的单个图,那么有多种方法可以在一个图形上绘制多个图.最简单的就是使用子图.

If that's roughly what you want for a single plot, there are multiple ways of making multiple plots on a figure. The easiest is just to use subplots.

在这里,我们将生成一些数据集,并将它们绘制在具有不同符号的不同子图上……

Here, we'll generate some datasets and plot them on different subplots with different symbols...

import itertools
import scipy.stats
import numpy as np
import matplotlib.pyplot as plt

# Generate some data... (Using a list to hold it so that the datasets don't 
# have to be the same length...)
numdatasets = 4
stds = np.random.randint(1, 10, size=numdatasets)
means = np.random.randint(-5, 5, size=numdatasets)
values = [std * np.random.randn(100) + mean for std, mean in zip(stds, means)]

# Set up several subplots
fig, axes = plt.subplots(nrows=1, ncols=numdatasets, figsize=(12,6))

# Set up some colors and markers to cycle through...
colors = itertools.cycle(['b', 'g', 'r', 'c', 'm', 'y', 'k'])
markers = itertools.cycle(['o', '^', 's', r'$\Phi$', 'h'])

# Now let's actually plot our data...
for ax, data, color, marker in zip(axes, values, colors, markers):
    counts, start, dx, _ = scipy.stats.cumfreq(data, numbins=20)
    x = np.arange(counts.size) * dx + start
    ax.plot(x, counts, color=color, marker=marker, 
            markersize=10, linestyle='none')

# Next we'll set the various labels...
axes[0].set_ylabel('Cumulative Frequency')
labels = ['This', 'That', 'The Other', 'And Another']
for ax, label in zip(axes, labels):
    ax.set_xlabel(label)

plt.show()

如果我们希望它看起来像一个连续的图,我们可以将子图挤压在一起并关闭一些边界.只需在调用plt.show()

If we want this to look like one continuous plot, we can just squeeze the subplots together and turn off some of the boundaries. Just add the following in before calling plt.show()

# Because we want this to look like a continuous plot, we need to hide the
# boundaries (a.k.a. "spines") and yticks on most of the subplots
for ax in axes[1:]:
    ax.spines['left'].set_color('none')
    ax.spines['right'].set_color('none')
    ax.yaxis.set_ticks([])
axes[0].spines['right'].set_color('none')

# To reduce clutter, let's leave off the first and last x-ticks.
for ax in axes:
    xticks = ax.get_xticks()
    ax.set_xticks(xticks[1:-1])

# Now, we'll "scrunch" all of the subplots together, so that they look like one
fig.subplots_adjust(wspace=0)

希望这可以有所帮助!

如果您想要百分位值,而不是累积直方图(我真的不应该使用100作为样本量!),这很容易做到.

If you want percentile values, instead a cumulative histogram (I really shouldn't have used 100 as the sample size!), it's easy to do.

只需执行以下操作(使用numpy.percentile而不是手工进行标准化):

Just do something like this (using numpy.percentile instead of normalizing things by hand):

# Replacing the for loop from before...
plot_percentiles = range(0, 110, 10)
for ax, data, color, marker in zip(axes, values, colors, markers):
    x = np.percentile(data, plot_percentiles)
    ax.plot(x, plot_percentiles, color=color, marker=marker, 
            markersize=10, linestyle='none')

这篇关于Python:Matplotlib-多个数据集的概率图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆