如何计算歪斜和峰度 [英] how to calculate coskew and cokurtosis
问题描述
您可以使用以下方法计算偏度和峰度
You can calculate skew and kurtosis with the the methods
但是,没有方便的方法来计算变量之间的偏斜或峰度.甚至更好的是偏斜或峰度矩阵.
However, there is no convenient way to calculate the coskew or cokurtosis between variables. Or even better, the coskew or cokurtosis matrix.
考虑pd.DataFrame
df
import pandas as pd
import numpy as np
np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(10, 2), columns=list('ab'))
df
a b
0 0.444939 0.407554
1 0.460148 0.465239
2 0.462691 0.016545
3 0.850445 0.817744
4 0.777962 0.757983
5 0.934829 0.831104
6 0.879891 0.926879
7 0.721535 0.117642
8 0.145906 0.199844
9 0.437564 0.100702
如何计算a
和b
的偏斜度和峰度?
How do I calculate the coskew and cokurtosis of a
and b
?
推荐答案
参考
- Coskewness
- Cokurtosis
我对偏斜的解释是一个序列与另一个序列的方差之间的相关性".这样,根据我们要计算其方差的序列,实际上可以有两种类型的偏斜.维基百科显示了这两个公式
My interpretation of coskew is the "correlation" between one series and the variance of another. As such, you can actually have two types of coskew depending on which series we are calculating the variance of. Wikipedia shows these two formula
'left'
'right'
'left'
'right'
幸运的是,当我们计算倾斜矩阵时,一个是另一个的转置.
Fortunately, when we calculate the coskew matrix, one is the transpose of the other.
def coskew(df, bias=False):
v = df.values
s1 = sigma = v.std(0, keepdims=True)
means = v.mean(0, keepdims=True)
# means is 1 x n (n is number of columns
# this difference broacasts appropriately
v1 = v - means
s2 = sigma ** 2
v2 = v1 ** 2
m = v.shape[0]
skew = pd.DataFrame(v2.T.dot(v1) / s2.T.dot(s1) / m, df.columns, df.columns)
if not bias:
skew *= ((m - 1) * m) ** .5 / (m - 2)
return skew
示范
coskew(df)
a b
a -0.369380 0.096974
b 0.325311 0.067020
我们可以将其与df.skew()
进行比较,并检查对角线是否相同
We can compare this to df.skew()
and check that the diagonals are the same
df.skew()
a -0.36938
b 0.06702
dtype: float64
计算cokurtosis
我对色度的解释是两种之一
Calculating cokurtosis
My interpretation of cokurtosis is one of two
-
系列和另一个系列的偏斜之间的
- 相关性" 两个系列的方差之间的
- 相关性"
- "correlation" between a series and the skew of another
- "correlation" between the variances of two series
对于选项1,我们再次具有左右变体,它们在矩阵形式中是彼此转置的.因此,我们将只关注左侧变体.这样一来,我们就可以计算总共两个变体.
For option 1. we again have both a left and right variant that in matrix form are transposes of one another. So, we will only focus on the left variant. That leaves us with calculating a total of two variations.
'left'
'middle'
'left'
'middle'
def cokurt(df, bias=False, fisher=True, variant='middle'):
v = df.values
s1 = sigma = v.std(0, keepdims=True)
means = v.mean(0, keepdims=True)
# means is 1 x n (n is number of columns
# this difference broacasts appropriately
v1 = v - means
s2 = sigma ** 2
s3 = sigma ** 3
v2 = v1 ** 2
v3 = v1 ** 3
m = v.shape[0]
if variant in ['left', 'right']:
kurt = pd.DataFrame(v3.T.dot(v1) / s3.T.dot(s1) / m, df.columns, df.columns)
if variant == 'right':
kurt = kurt.T
elif variant == 'middle':
kurt = pd.DataFrame(v2.T.dot(v2) / s2.T.dot(s2) / m, df.columns, df.columns)
if not bias:
kurt = kurt * (m ** 2 - 1) / (m - 2) / (m - 3) - 3 * (m - 1) ** 2 / (m - 2) / (m - 3)
if not fisher:
kurt += 3
return kurt
示范
cokurt(df, variant='middle', bias=False, fisher=False)
a b
a 1.882817 0.86649
b 0.866490 1.63200
cokurt(df, variant='left', bias=False, fisher=False)
a b
a 1.882817 0.19175
b -0.020567 1.63200
对角线应等于kurtosis
df.kurtosis() + 3
a 1.882817
b 1.632000
dtype: float64
这篇关于如何计算歪斜和峰度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!