SQL:返回每个人的最常见值 [英] SQL: Returning the most common value for each person

查看:45
本文介绍了SQL:返回每个人的最常见值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用MySQL,我发现了另一个帖子,也有同样的问题,但是它在Postgres中.我需要MySQL.

I'm using MySQL, I found another post with the same question, but it's in Postgres; I require MySQL.

获取SQL中另一列的每个值的最常见值

在广泛搜索本网站和其他网站之后,我提出了这个问题,但没有找到符合我预期目的的结果.

I ask this question after extensive searching of this site and others but have not found a result that works as I intend it to.

我有一个人表(recordid,personid,transactionid)和一个事务表(transactionid,rating).我需要一条SQL语句,该语句可以返回每个人拥有的最常见的评分.

I have a table of people (recordid, personid, transactionid) and a transaction table (transactionid, rating). I require a single SQL statement that can return the most common rating each person has.

我目前有一条SQL语句,该语句返回指定人员ID的最常见等级.它有效,也许可以帮助其他人.

I currently have this SQL statement that returns the most common rating for a specified person id. It works and perhaps it may help others.

SELECT transactionTable.rating as MostCommonRating 
FROM personTable, transactionTable 
WHERE personTable.transactionid = transactionTable.transactionid 
AND personTable.personid = 1
GROUP BY transactionTable.rating 
ORDER BY COUNT(transactionTable.rating) desc 
LIMIT 1

但是我需要一个声明,该声明要对personTable中的每个personid进行上述操作.

However I require a statement that does what the above statement does for each personid in personTable.

我的尝试在下面;但是,它使我的MySQL服务器超时.

My attempt is below; however, it times out my MySQL server.

SELECT personid AS pid, 
(SELECT transactionTable.rating as MostCommonRating 
FROM personTable, transactionTable 
WHERE personTable.transactionid = transactionTable.transactionid 
AND personTable.personid = pid
GROUP BY transactionTable.rating 
ORDER BY COUNT(transactionTable.rating) desc 
LIMIT 1)
FROM persontable
GROUP BY personid

您能给我的任何帮助将非常有必要.谢谢.

Any help you can give me would be much obliged. Thanks.

PERSONTABLE :

PERSONTABLE:

RecordID,   PersonID,   TransactionID
1,      Adam,       1
2,      Adam,       2
3,      Adam,       3
4,      Ben,        1
5,      Ben,        3
6,      Ben,        4
7,      Caitlin,    4
8,      Caitlin,    5
9,      Caitlin,    1

TRANSACTIONTABLE :

TRANSACTIONTABLE:

TransactionID,  Rating
1       Good
2       Bad
3       Good
4       Average
5       Average

我要搜索的SQL语句的输出为:

The output of the SQL statement I am searching for would be:

输出:

PersonID,   MostCommonRating
Adam        Good
Ben         Good
Caitlin     Average

推荐答案

初步评论

请学习使用显式JOIN表示法,而不要使用旧的(1992年前)隐式联接表示法.

Preliminary comment

Please learn to use the explicit JOIN notation, not the old (pre-1992) implicit join notation.

旧样式:

SELECT transactionTable.rating as MostCommonRating 
FROM personTable, transactionTable 
WHERE personTable.transactionid = transactionTable.transactionid 
AND personTable.personid = 1
GROUP BY transactionTable.rating 
ORDER BY COUNT(transactionTable.rating) desc 
LIMIT 1

首选样式:

SELECT transactionTable.rating AS MostCommonRating 
  FROM personTable
  JOIN transactionTable 
    ON personTable.transactionid = transactionTable.transactionid 
 WHERE personTable.personid = 1
 GROUP BY transactionTable.rating 
 ORDER BY COUNT(transactionTable.rating) desc 
 LIMIT 1

每个JOIN都需要一个ON条件.

You need an ON condition for each JOIN.

此外,数据中的personID值是字符串,而不是数字,因此您需要编写

Also, the personID values in the data are strings, not numbers, so you'd need to write

 WHERE personTable.personid = "Ben"

例如,使查询在显示的表上起作用.

for example, to get the query to work on the tables shown.

您要查找的是一个集合的一个集合:在这种情况下,是一个计数的最大值.因此,任何通用解决方案都将同时涉及MAX和COUNT.您不能将MAX直接应用于COUNT,但是可以将MAX应用于子查询中的某个列,而该子查询恰好是COUNT.

You're seeking to find an aggregate of an aggregate: in this case, the maximum of a count. So, any general solution is going to involve both MAX and COUNT. You can't apply MAX directly to COUNT, but you can apply MAX to a column from a sub-query where the column happens to be a COUNT.

使用测试驱动的查询设计TDQD建立查询.

Build the query up using Test-Driven Query Design — TDQD.

SELECT p.PersonID, t.Rating, t.TransactionID
  FROM PersonTable AS p
  JOIN TransactionTable AS t
    ON p.TransactionID = t.TransactionID

选择人员,等级和等级出现次数

SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
  FROM PersonTable AS p
  JOIN TransactionTable AS t
    ON p.TransactionID = t.TransactionID
 GROUP BY p.PersonID, t.Rating

此结果将成为子查询.

SELECT s.PersonID, MAX(s.RatingCount)
  FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
          FROM PersonTable AS p
          JOIN TransactionTable AS t
            ON p.TransactionID = t.TransactionID
         GROUP BY p.PersonID, t.Rating
       ) AS s
 GROUP BY s.PersonID

现在我们知道每个人的最大数量.

Now we know which is the maximum count for each person.

要获得结果,我们需要从子查询中选择具有最大计数的行.请注意,如果某人具有2个好和2个差的评分(其中2个是该人的同一类型的最大评分数),那么将显示该人的两条记录.

To get the result, we need to select the rows from the sub-query which have the maximum count. Note that if someone has 2 Good and 2 Bad ratings (and 2 is the maximum number of ratings of the same type for that person), then two records will be shown for that person.

SELECT s.PersonID, s.Rating
  FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
          FROM PersonTable AS p
          JOIN TransactionTable AS t
            ON p.TransactionID = t.TransactionID
         GROUP BY p.PersonID, t.Rating
       ) AS s
  JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount
          FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
                  FROM PersonTable AS p
                  JOIN TransactionTable AS t
                    ON p.TransactionID = t.TransactionID
                 GROUP BY p.PersonID, t.Rating
               ) AS s
         GROUP BY s.PersonID
       ) AS m
    ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount

如果您也希望获得实际评分,则很容易选择.

If you want the actual rating count too, that's easily selected.

那是一段相当复杂的SQL.我不想尝试从头开始编写.确实,我可能不会打扰.我将逐步开发它,如图所示.但是,因为我们已经在较大的表达式中使用子查询之前对其进行了调试,所以我们对答案很有信心.

That's a fairly complex piece of SQL. I would hate to try writing that from scratch. Indeed, I probably wouldn't bother; I'd develop it step-by-step, more or less as shown. But because we've debugged the sub-queries before we use them in bigger expressions, we can be confident of the answer.

请注意,Standard SQL提供了一个WITH子句,该子句以SELECT语句为前缀,命名了子查询. (它也可以用于递归查询,但是我们在这里不需要.)

Note that Standard SQL provides a WITH clause that prefixes a SELECT statement, naming a sub-query. (It can also be used for recursive queries, but we aren't needing that here.)

WITH RatingList AS
     (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
        FROM PersonTable AS p
        JOIN TransactionTable AS t
          ON p.TransactionID = t.TransactionID
       GROUP BY p.PersonID, t.Rating
     )
SELECT s.PersonID, s.Rating
  FROM RatingList AS s
  JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount
          FROM RatingList AS s
         GROUP BY s.PersonID
       ) AS m
    ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount

这更容易编写.不幸的是,MySQL还不支持WITH子句.

This is simpler to write. Unfortunately, MySQL does not yet support the WITH clause.

上面的SQL现在已经针对在Mac OS X 10.7.4上运行的IBM Informix Dynamic Server 11.70.FC2进行了测试.该测试暴露了初步评论中诊断出的问题.主要答案的SQL可以正常工作,而无需更改.

The SQL above has now been tested against IBM Informix Dynamic Server 11.70.FC2 running on Mac OS X 10.7.4. That test exposed the problem diagnosed in the preliminary comment. The SQL for the main answer worked correctly without needing to be changed.

这篇关于SQL:返回每个人的最常见值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆