将数组转换为百分位数 [英] Convert array into percentiles

查看:128
本文介绍了将数组转换为百分位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要转换为百分位数的数组.例如,假设我有一个正态分布的数组:

import numpy as np
import matplotlib.pyplot as plt

arr = np.random.normal(0, 1, 1000)
plt.hist(arr)

对于该数组中的每个值,我想计算该值的百分位数(例如0是上述分布的第50个百分位数,所以0-> 0.5).由于每个百分数应具有相等的权重,因此结果应均匀分布.

我找到了np.percentile,但是此函数返回给定数组和分位数的值,而我需要返回给定数组和分数值的分位数.

有没有相对有效的方法来做到这一点?

解决方案

from scipy.stats import percentileofscore
import pandas as pd

# generate example data
arr = np.random.normal(0, 1, 10)

# pre-sort array
arr_sorted =  sorted(arr)

# calculate percentiles using scipy func percentileofscore on each array element
s = pd.Series(arr)
percentiles = s.apply(lambda x: percentileofscore(arr_sorted, x))

检查结果是否正确:

df = pd.DataFrame({'data': s, 'percentiles': percentiles})    
df.sort_values(by='data')

       data   pcts
3 -1.692881   10.0
8 -1.395427   20.0
7 -1.162031   30.0
6 -0.568550   40.0
9  0.047298   50.0
5  0.296661   60.0
0  0.534816   70.0
4  0.542267   80.0
1  0.584766   90.0
2  1.185000  100.0

I have an array that I want to convert to percentiles. For example, say I have a normally distributed array:

import numpy as np
import matplotlib.pyplot as plt

arr = np.random.normal(0, 1, 1000)
plt.hist(arr)

For each value in that array, I want to calculate the percentile of that value (e.g. 0 is the 50th percentile of the above distribution so 0 -> 0.5). The result should be uniformly distributed since each percentile should have equal weight.

I found np.percentile but this function returns a value given an array and quantile and what I need is to return a quantile given an array and value.

Is there a relatively efficient way to do this?

解决方案

from scipy.stats import percentileofscore
import pandas as pd

# generate example data
arr = np.random.normal(0, 1, 10)

# pre-sort array
arr_sorted =  sorted(arr)

# calculate percentiles using scipy func percentileofscore on each array element
s = pd.Series(arr)
percentiles = s.apply(lambda x: percentileofscore(arr_sorted, x))

checking that the results are correct:

df = pd.DataFrame({'data': s, 'percentiles': percentiles})    
df.sort_values(by='data')

       data   pcts
3 -1.692881   10.0
8 -1.395427   20.0
7 -1.162031   30.0
6 -0.568550   40.0
9  0.047298   50.0
5  0.296661   60.0
0  0.534816   70.0
4  0.542267   80.0
1  0.584766   90.0
2  1.185000  100.0

这篇关于将数组转换为百分位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆