如何从频率表计算百分位数? [英] How to compute percentiles from frequency table?

查看：727 发布时间：2020/5/18 22:13:52 python numpy pandas statistics

本文介绍了如何从频率表计算百分位数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有CSV文件:

fr id
 1 10000152
 1 10000212
 1 10000847
 1 10001018
 2 10001052
 2 10001246
14 10001908
...........

这是一个频率表，其中id是整数变量，而fr是给定值的出现次数.文件按值升序排序. 我想计算变量的百分位数(即90％，80％，70％... 10％).

This is a frequency table, where id is integer variable and fr is number of occurrences given value. File is sorted ascending by value. I would like to compute percentiles (ie. 90%, 80%, 70% ... 10%) of variable.

我已经在纯Python中完成了此操作，类似于以下伪代码:

I have done this in pure Python, similar to this pseudocode:

bucket=sum(fr)/10.0
percentile=1
sum=0
for (current_fr, current_id) in zip(fr,id):
   sum=sum+current_fr
   if (sum > percentile*bucket):
      print "%i percentile: %i" % (percentile*10,current_id)
      percentile=percentile+1

但是这段代码非常原始:它没有考虑到百分位数应位于集合值之间，不能退后等.

But this code is very raw: it doesn't take into account that percentile should be between values from the set, it can't step back etc.

还有更优雅，通用的现成解决方案吗?

Is there any more elegant, universal, ready-made solution?

推荐答案

似乎您要累积fr的总和.你可以做

Seems like you want cumulative sum of fr. You can do

cumfr = [sum(fr[:i+1]) for i in range(len(fr))]

那么百分位数是

percentile = [100*i/cumfr[-1] for i in cumfr]

这篇关于如何从频率表计算百分位数?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从频率表计算百分位数? [英] How to compute percentiles from frequency table?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从频率表计算百分位数? [英] How to compute percentiles from frequency table?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭