在MySQL中计算百分等级 [英] Calculating percentile rank in MySQL

查看:148
本文介绍了在MySQL中计算百分等级的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MySQL中有一个很大的测量数据表,我需要为这些值中的每一个计算百分位等级. Oracle似乎有一个名为percent_rank的函数,但是我找不到与MySQL类似的函数.当然,我可以使用Python对其进行暴力破解,无论如何我都要使用它来填充表格,但我怀疑这样做效率不高,因为一个样本可能有200.000个观察值.

I have a very big table of measurement data in MySQL and I need to compute the percentile rank for each and every one of these values. Oracle appears to have a function called percent_rank but I can't find anything similar for MySQL. Sure I could just brute-force it in Python which I use anyways to populate the table but I suspect that would be quite inefficient because one sample might have 200.000 observations.

推荐答案

这是一个相对丑陋的答案,对此我感到内gui.也就是说,这可能会帮助您解决问题.

This is a relatively ugly answer, and I feel guilty saying it. That said, it might help you with your issue.

一种确定百分比的方法是对所有行进行计数,并对大于您提供的行数的行数进行计数.您可以计算大于或小于此值,并根据需要进行逆运算.

One way to determine the percentage would be to count all of the rows, and count the number of rows that are greater than the number you provided. You can calculate either greater or less than and take the inverse as necessary.

在您的号码上创建索引. 总数=选择计数(); less_equal =选择count(),其中value> indexed_number;

Create an index on your number. total = select count(); less_equal = select count() where value > indexed_number;

该百分比类似于:less_equal/总计或(total-less_equal)/total

The percentage would be something like: less_equal / total or (total - less_equal)/total

确保它们两个都使用您创建的索引.如果不是,请对其进行调整,直到它们出现为止.说明查询的右栏中应有使用索引".对于select count(*),对于InnoDB应该使用索引,对于MyISAM应该使用const. MyISAM将随时知道此值,而无需计算它.

Make sure that both of them are using the index that you created. If they are not, tweak them until they are. The explain query should have "using index" in the right hand column. In the case of the select count(*) it should be using index for InnoDB and something like const for MyISAM. MyISAM will know this value at any time without having to calculate it.

如果需要将百分比存储在数据库中,则可以使用上面的设置来提高性能,然后通过将第二个查询用作内部选择来计算每一行的值.可以将第一个查询的值设置为常量.

If you needed to have the percentage stored in the database, you can use the setup from above for performance and then calculate the value for each row by using the second query as an inner select. The first query's value can be set as a constant.

有帮助吗?

Jacob

这篇关于在MySQL中计算百分等级的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆