高级MySQL:查找轮询响应之间的相关性 [英] Advanced MySQL: Find correlations between poll responses

查看:89
本文介绍了高级MySQL:查找轮询响应之间的相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有四个MySQL表:

I've got four MySQL tables:

用户(ID,名称)
投票(ID,文本)
选项(id,poll_id,文本)
响应(id,poll_id,option_id,user_id)

users (id, name)
polls (id, text)
options (id, poll_id, text)
responses (id, poll_id, option_id, user_id)

鉴于特定的民意调查和特定的选项,我想生成一个表,该表显示其他民意调查中哪些选项之间的相关性最强.

Given a particular poll and a particular option, I'd like to generate a table that shows which options from other polls are most strongly correlated.

假设这是我们的数据集:

Suppose this is our data set:

TABLE users:
+------+-------+
| id   | name  |
+------+-------+
|    1 | Abe   |
|    2 | Bob   |
|    3 | Che   |
|    4 | Den   |
+------+-------+

TABLE polls:
+------+-----------------------+
| id   | text                  |
+------+-----------------------+
|    1 | Do you like apples?   |
|    2 | What is your gender?  |
|    3 | What is your height?  |
|    4 | Do you like polls?    |
+------+-----------------------+

TABLE options:

+------+----------+---------+
| id   | poll_id  | text    |
+------+----------+---------+
|    1 | 1        | Yes     |
|    2 | 1        | No      |
|    3 | 2        | Male    |
|    4 | 2        | Female  |
|    5 | 3        | Short   |
|    6 | 3        | Tall    |
|    7 | 4        | Yes     |
|    8 | 4        | No      |
+------+----------+---------+

TABLE responses:

+------+----------+------------+----------+
| id   | poll_id  | option_id  | user_id  |
+------+----------+------------+----------+
|    1 | 1        | 1          | 1        |
|    2 | 1        | 2          | 2        |
|    3 | 1        | 2          | 3        |
|    4 | 1        | 2          | 4        |
|    5 | 2        | 3          | 1        |
|    6 | 2        | 3          | 2        |
|    7 | 2        | 3          | 3        |
|    8 | 2        | 4          | 4        |
|    9 | 3        | 5          | 1        |
|   10 | 3        | 6          | 2        |
|   10 | 3        | 5          | 3        |
|   10 | 3        | 6          | 4        |
|   10 | 4        | 7          | 1        |
|   10 | 4        | 7          | 2        |
|   10 | 4        | 7          | 3        |
|   10 | 4        | 7          | 4        |
+------+----------+------------+----------+

给定轮询ID 1和选项ID 2,生成的表应如下所示:

Given the poll ID 1 and the option ID 2, the generated table should be something like this:

+----------+------------+-----------------------+
| poll_id  | option_id  | percent_correlated    |
+----------+------------+-----------------------+
| 4        | 7          | 100                   |
| 2        | 3          | 66.66                 |
| 3        | 6          | 66.66                 |
| 2        | 4          | 33.33                 |
| 3        | 5          | 33.33                 |
| 4        | 8          | 0                     |
+----------+------------+-----------------------+

因此,基本上,我们正在确定所有对民意调查ID 1和选择选项ID 2做出响应的用户,并且正在调查所有其他民意调查,以查看其中有多少百分比的人还选择了其他选项.

So basically, we're identifying all of the users who responded to poll ID 1 and selected option ID 2, and we're looking through all the other polls to see what percentage of them also selected each other option.

推荐答案

这似乎为我提供了正确的结果:

This seems to give the right results for me:

select poll_stats.poll_id,
       option_stats.option_id,
       (100 * option_responses / poll_responses) as percent_correlated
from (select response.poll_id,
             count(*) as poll_responses
      from response selecting_response
           join response on response.user_id = selecting_response.user_id
      where selecting_response.poll_id = 1 and selecting_response.option_id = 2
      group by response.poll_id) poll_stats
      join (select options.poll_id,
                   options.id as option_id,
                   count(response.id) as option_responses
            from options
                 left join response on response.poll_id = options.poll_id
                           and response.option_id = options.id
                           and exists (
                            select 1 from response selecting_response
                            where selecting_response.user_id = response.user_id
                                  and selecting_response.poll_id = 1
                                  and selecting_response.option_id = 2)
            group by options.poll_id, options.id
           ) as option_stats
       on option_stats.poll_id = poll_stats.poll_id
where poll_stats.poll_id <> 1
order by 3 desc, option_responses desc

这篇关于高级MySQL:查找轮询响应之间的相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆