如何计算歪斜和峰度 [英] how to calculate coskew and cokurtosis

查看:216
本文介绍了如何计算歪斜和峰度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您可以使用以下方法计算偏度和峰度

You can calculate skew and kurtosis with the the methods

但是,没有方便的方法来计算变量之间的偏斜或峰度.甚至更好的是偏斜或峰度矩阵.

However, there is no convenient way to calculate the coskew or cokurtosis between variables. Or even better, the coskew or cokurtosis matrix.

考虑pd.DataFrame df

import pandas as pd
import numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(10, 2), columns=list('ab'))

df

          a         b
0  0.444939  0.407554
1  0.460148  0.465239
2  0.462691  0.016545
3  0.850445  0.817744
4  0.777962  0.757983
5  0.934829  0.831104
6  0.879891  0.926879
7  0.721535  0.117642
8  0.145906  0.199844
9  0.437564  0.100702

如何计算ab的偏斜度和峰度?

How do I calculate the coskew and cokurtosis of a and b?

推荐答案

参考

  • Coskewness
  • Cokurtosis

我对偏斜的解释是一个序列与另一个序列的方差之间的相关性".这样,根据我们要计算其方差的序列,实际上可以有两种类型的偏斜.维基百科显示了这两个公式

My interpretation of coskew is the "correlation" between one series and the variance of another. As such, you can actually have two types of coskew depending on which series we are calculating the variance of. Wikipedia shows these two formula

'left'

'right'

'left'

'right'

幸运的是,当我们计算倾斜矩阵时,一个是另一个的转置.

Fortunately, when we calculate the coskew matrix, one is the transpose of the other.

def coskew(df, bias=False):
    v = df.values
    s1 = sigma = v.std(0, keepdims=True)
    means = v.mean(0, keepdims=True)

    # means is 1 x n (n is number of columns
    # this difference broacasts appropriately
    v1 = v - means

    s2 = sigma ** 2

    v2 = v1 ** 2

    m = v.shape[0]

    skew = pd.DataFrame(v2.T.dot(v1) / s2.T.dot(s1) / m, df.columns, df.columns)

    if not bias:
        skew *= ((m - 1) * m) ** .5 / (m - 2)

    return skew

示范

coskew(df)

          a         b
a -0.369380  0.096974
b  0.325311  0.067020

我们可以将其与df.skew()进行比较,并检查对角线是否相同

We can compare this to df.skew() and check that the diagonals are the same

df.skew()

a   -0.36938
b    0.06702
dtype: float64

计算cokurtosis

我对色度的解释是两种之一

Calculating cokurtosis

My interpretation of cokurtosis is one of two

    系列和另一个系列的偏斜之间的
  1. 相关性"
  2. 两个系列的方差之间的
  3. 相关性"
  1. "correlation" between a series and the skew of another
  2. "correlation" between the variances of two series

对于选项1,我们再次具有左右变体,它们在矩阵形式中是彼此转置的.因此,我们将只关注左侧变体.这样一来,我们就可以计算总共两个变体.

For option 1. we again have both a left and right variant that in matrix form are transposes of one another. So, we will only focus on the left variant. That leaves us with calculating a total of two variations.

'left'

'middle'

'left'

'middle'

def cokurt(df, bias=False, fisher=True, variant='middle'):
    v = df.values
    s1 = sigma = v.std(0, keepdims=True)
    means = v.mean(0, keepdims=True)

    # means is 1 x n (n is number of columns
    # this difference broacasts appropriately
    v1 = v - means

    s2 = sigma ** 2
    s3 = sigma ** 3

    v2 = v1 ** 2
    v3 = v1 ** 3

    m = v.shape[0]

    if variant in ['left', 'right']:
        kurt = pd.DataFrame(v3.T.dot(v1) / s3.T.dot(s1) / m, df.columns, df.columns)
        if variant == 'right':
            kurt = kurt.T
    elif variant == 'middle':
        kurt = pd.DataFrame(v2.T.dot(v2) / s2.T.dot(s2) / m, df.columns, df.columns)

    if not bias:
        kurt = kurt * (m ** 2 - 1) / (m - 2) / (m - 3) - 3 * (m - 1) ** 2 / (m - 2) / (m - 3)
    if not fisher:
        kurt += 3

    return kurt

示范

cokurt(df, variant='middle', bias=False, fisher=False)

          a        b
a  1.882817  0.86649
b  0.866490  1.63200

cokurt(df, variant='left', bias=False, fisher=False)

          a        b
a  1.882817  0.19175
b -0.020567  1.63200

对角线应等于kurtosis

df.kurtosis() + 3

a    1.882817
b    1.632000
dtype: float64

这篇关于如何计算歪斜和峰度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆