SQL:如何选择具有最多已知值的行? [英] SQL: how to select the row with most known values?

查看:126
本文介绍了SQL:如何选择具有最多已知值的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的用户名(username,gender,date_of_birth,zip)的用户名是永久性的,但用户可能会在过去多次注册,有时候他填写了所有的数据,有时候还没有注册。此外,他可以改变居民身份(在这种情况下邮政可以改变)。

I have the table of users (username, gender, date_of_birth, zip) where the user's id is permanent but the user could be registered many times in the past where sometimes he filled out all the data and sometimes not. Besides that, he could change the residency (in this case zip can change).

所以查询

SELECT username, sex, date_birth, zip FROM users_log WHERE username IN('user1', 'user2', 'user3')

返回以下结果:

"user1";"M";"1982-10-04 00:00:00";"6320"
"user2";"";"";"1537"
"user3";"";"";"1537"
"user3";"";"";"1000"
"user3";"";"";"1000"
"user3";"";"1979-05-29 00:00:00";"1000"
"user3";"";"";"1537"
"user3";"";"1979-05-29 00:00:00";"1000"
"user1";"";"";"1000"
"user3";"";"";"1537"

在这种情况下,user1已更改了住所;邮政编码改变了; 属于他的第二行不包含人口统计数据。 User3还有多个记录,只有两个记录包含人口统计数据。

In this case the user1 has changed the residence; the zip code changed; and the second row that 'belongs' to him does not contain demographic data. User3 has also multiple records and only two records contain demographic data.

我想做的是将用户绑定到包含最多数据的行,并考虑行中包含最多已知值的zip。有没有人知道如何编写适当的查询?

What I would like to do is to bind users with the row that contains the most data about him and consider the zip included in the row with the most known values. Does anyone know how to write the appropriate query?

谢谢!

推荐答案

这将是痛苦的;非常痛苦。

It's gonna be painful; very painful.

您的问题不清楚此问题,但我假定您所指的用户ID是用户名。如果出现错误,则会作出相应的修改。

Your question isn't clear about this issue, but I'm assuming that the 'user id' you're referring to is the user name. There are consequential modifications to make if that's wrong.

与任何复杂查询一样,分阶段构建。

As with any complex query, build it up in stages.

阶段1:每个记录有多少非空字段?

Stage 1: How many non-null fields are there per record?

SELECT username, sex, date_of_birth, zip,
       CASE WHEN sex           IS NULL THEN 0 ELSE 1 END +
       CASE WHEN date_of_birth IS NULL THEN 0 ELSE 1 END +
       CASE WHEN zip           IS NULL THEN 0 ELSE 1 END AS num_non_null_fields
  FROM users_log

阶段2:给定用户名的最大字段数是多少?

Stage 2: Which is the maximum such number of fields for a given user name?

SELECT username, MAX(num_non_null_fields) AS num_non_null_fields
  FROM (SELECT username, sex, date_of_birth, zip,
               CASE WHEN sex           IS NULL THEN 0 ELSE 1 END +
               CASE WHEN date_of_birth IS NULL THEN 0 ELSE 1 END +
               CASE WHEN zip           IS NULL THEN 0 ELSE 1 END AS num_non_null_fields
          FROM users_log
       ) AS u
 GROUP BY username

阶段3:选择(全部)具有最大非空字段数的给定用户的行:

Stage 3: Select (all) the rows for a given user with that maximal number of non-null fields:

SELECT u.username, u.sex, u.date_of_birth, u.zip
  FROM (SELECT username, MAX(num_non_null_fields) AS num_non_null_fields
          FROM (SELECT username, sex, date_of_birth, zip,
                       CASE WHEN sex           IS NULL THEN 0 ELSE 1 END +
                       CASE WHEN date_of_birth IS NULL THEN 0 ELSE 1 END +
                       CASE WHEN zip           IS NULL THEN 0 ELSE 1 END AS num_non_null_fields
                  FROM users_log
               ) AS u
         GROUP BY username
       ) AS v
  JOIN (SELECT username, sex, date_of_birth, zip,
               CASE WHEN sex           IS NULL THEN 0 ELSE 1 END +
               CASE WHEN date_of_birth IS NULL THEN 0 ELSE 1 END +
               CASE WHEN zip           IS NULL THEN 0 ELSE 1 END AS num_non_null_fields
          FROM users_log
       ) AS u
    ON u.username = v.username AND u.num_non_null_fields = v.num_non_null_fields;

现在,如果有人拥有多个行(说)填写所有三个字段,那么所有这些行将被退回。但是,您尚未指定在这些行之间进行选择的任何条件。

Now, if someone has multiple rows with (say) all three fields filled in, then all those rows will be returned. However, you've not specified any criteria by which to choose between those rows.

这里的基本技术可以适应任何更改的要求。关键是在您访问时构建和测试子查询。

The basic techniques here can be adapted to any changed requirements. The key is to build and test the sub-queries as you go.

此SQL中没有一个已经靠近DBMS;可能会出现错误。

None of this SQL has been near a DBMS; there could be bugs in it.

您尚未指定使用哪个DBMS。但是,似乎Oracle不喜欢用于表别名的AS表示法,尽管它对列别名没有问题。如果您正在使用任何其他DBMS,则不必担心这种轻微的偏心。

You've not specified which DBMS you are using. However, it seems that Oracle won't like the AS notation used for table aliases, though it has no problem with AS on column aliases. If you're using any other DBMS, you shouldn't have to worry about that minor eccentricity.

这篇关于SQL:如何选择具有最多已知值的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆