峰度,条形图的倾斜度? - Python [英] Kurtosis,Skewness of a bar graph? - Python

查看:60
本文介绍了峰度,条形图的倾斜度? - Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在python中确定条形图的偏斜/峰度的有效方法是什么?考虑到条形图没有进行分箱(与直方图不同),这个问题没有多大意义,但是我想做的是确定图形的高度与距离(而不是频率与箱)的对称性.换句话说,给定沿距离(x)测得的heights(y)值,即

What is an efficient method for determining the skew/kurtosis of a bar graph in python? Considering that bar graphs are not binned (unlike histograms) this question would not make a lot of sense but what I am trying to do is to determine the symmetry of a graph's height vs distance (rather than frequency vs bins). In other words, given a value of heights(y) measured along distance(x) i.e.

y = [6.18, 10.23, 33.15, 55.25, 84.19, 91.09, 106.6, 105.63, 114.26, 134.24, 137.44, 144.61, 143.14, 150.73, 156.44, 155.71, 145.88, 120.77, 99.81, 85.81, 55.81, 49.81, 37.81, 25.81, 5.81]
x = [0.03, 0.08, 0.14, 0.2, 0.25, 0.31, 0.36, 0.42, 0.48, 0.53, 0.59, 0.64, 0.7, 0.76, 0.81, 0.87, 0.92, 0.98, 1.04, 1.09, 1.15, 1.2, 1.26, 1.32, 1.37]

在距离(x)上测得的高度(y)分布(偏度)和峰值(峰度)的对称性是什么?偏度/峰度是否适合确定实值的正态分布?还是scipy/numpy为这种类型的测量提供了类似的东西?

What is the symmetry of that height(y) distribution (skewness) and peakness (kurtosis) as measured over distance(x)? Are skewness/kurtosis appropriate measurements for determining the normal distribution of real values? Or does scipy/numpy offer something similar for that type of measurement?

我可以通过以下方法获得沿距离(x)划分的高度(y)频率值的偏斜/峰度估计

I can achieve a skew/kurtosis estimate of height(y) frequency values binned along distance(x) by the following

freq=list(chain(*[[x_v]*int(round(y_v)) for x_v,y_v in zip(x,y)]))
x.extend([x[-1:][0]+x[0]])          #add one extra bin edge 
hist(freq,bins=x)
ylabel("Height Frequency")
xlabel("Distance(km) Bins")
print "Skewness,","Kurtosis:",stats.describe(freq)[4:]

Skewness, Kurtosis: (-0.019354300509997705, -0.7447085398785758)

在这种情况下,高度分布在中点距离周围是对称的(倾斜0.02),并具有扁平核(-0.74峰度,即宽)分布.

In this case the height distribution is symmetrical (skew 0.02) around the midpoint distance and characterized by a platykurtic (-0.74 kurtosis i.e. broad) distribution.

考虑到我将x值的每个出现值乘以它们的高度y来创建频率,结果列表的大小有时会变得非常大.我想知道是否有更好的方法来解决此问题?我想我总是可以尝试将数据集y标准化为大约0-100的范围,而又不会丢失有关数据集偏斜/峰度的过多信息.

Considering that I multiply each occurrence of x value by their height y to create a frequency, the size of the result list can sometimes get very large. I was wondering if there was a better method to approach this problem? I suppose that I could always try to normalize dataset y to a range of perhaps 0 - 100 without loosing too much information on the datasets skew/kurtosis.

推荐答案

这不是python问题,也不是真正的编程问题,但答案很简单.首先,让我们考虑基于较低时刻的更简单的值,而不是偏斜和峰度,平均值标准偏差.为了使其具体化并适合您的问题,让我们假设您的数据如下所示:

This isn't a python question, nor is it really a programming question but the answer is simple nonetheless. Instead of skew and kurtosis, let's first consider the easier values based off the lower moments, the mean and standard deviation. To make it concrete, and to fit with your question, let's assume your data looks like:

X = 3, 3, 5, 5, 5, 7 = x1, x2, x3 ....

哪个会给出一个如下所示的条形图":

Which would give a "bar graph" that looks like:

{3:2, 5:3, 7:1} = {k1:p1, k2:p2, k3:p3}

均值u由

E[X] = (1/N) * (x1 + x2 + x3 + ...) = (1/N) * (3 + 3 + 5 + ...)

但是,我们的数据具有重复的值,因此可以将其重写为

Our data, however, has repeated values, so this can be rewritten as

E[X] = (1/N) * (p1*k1 + p2*k2 + ...) = (1/N) * (3*2 + 5*3 + 7*1)

下一个术语,即标准dev.s,简单地

The next term, the standard dev., s, is simply

sqrt(E[(X-u)^2]) = sqrt((1/N)*( (x1-u)^2 + (x2-u)^3 + ...))

但是我们可以对E[(X-u)^2]项应用相同的归约法,并将其写为

But we can apply the same reduction to the E[(X-u)^2] term and write it as

E[(X-u)^2] = (1/N)*( p1*(k1-u)^2 + p2*(k2-u)^2 + ... )
           = (1/6)*( 2*(3-u)^2 + 3*(5-u)^2 + 1*(7-u)^2 )

这意味着我们不必像您在问题中指出的那样,每个数据项都具有多个副本.

倾斜

The skew and kurtosis are quite simple as this point:

skew     = E[(x-u)^3] / (E[(x-u)^2])^(3/2)
kurtosis = ( E[(x-u)^4] / (E[(x-u)^2])^2 ) - 3

这篇关于峰度,条形图的倾斜度? - Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆