MySQL按“最完整的字段"查询顺序 [英] MySQL query order by "most completed fields"

查看:75
本文介绍了MySQL按“最完整的字段"查询顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个有 45 列的表格女巫,但其中只有少数尚未完成.此表会不断更新和添加等.在我的自动完成功能中,我想选择按最完整字段排序的这些记录(希望您理解)?

解决方案之一是创建另一个字段(排名"字段)并创建一个 php 函数,该函数选择 * 记录并为每条记录提供排名.

...但我想知道是否有更简单的方法来做到这一点,只需一个 ORDER BY?

解决方案

据我所知,MySQL 没有统计一行非 NULL 字段个数的功能.

所以我能想到的唯一方法是使用显式条件:

SELECT * FROM mytableORDER BY (IF( column1 IS NULL, 0, 1)+IF(第 2 列是 NULL, 0, 1)...+IF(column45 IS NULL, 0, 1)) DESC;

...它像罪一样丑陋,但应该可以解决问题.

您还可以设计一个 TRIGGER 来增加一个额外的列fields_filled".触发器在 UPDATE 上花费你,45 个 IF 在 SELECT 上伤害你;你必须对更方便的东西进行建模.

请注意,索引所有字段以加快 SELECT 将在更新时花费您(并且 45 个不同的索引可能花费与 select 上的表扫描一样多,并不是说索引字段是 SELECT代码>VARCHAR).运行一些测试,但我相信 45-IF 解决方案可能是整体上最好的.

更新:如果您可以重新设计您的表结构以对其进行标准化,您可以将这些字段放在一个 my_values 表中.然后你会有一个标题表"(可能只有一个唯一的 ID)和一个数据表".空字段根本不存在,然后您可以使用 RIGHT JOIN 按填充字段的数量排序,使用 COUNT() 计算填充字段.这也将大大加快 UPDATE 操作,并允许您有效地使用索引.

示例(从表格设置到两个标准化表格设置):

假设我们有一组 Customer 记录.我们将拥有一小部分强制性"数据,例如 ID、用户名、密码、电子邮件等;那么我们将拥有一个可能更大的可选"数据子集,例如昵称、头像、出生日期等.作为第一步,让我们假设所有这些数据都是 varchar(乍一看,与每列可能有自己的数据类型的单表解决方案相比,这似乎是一个限制).>

所以我们有一张像,

ID 用户名....1 jdoe 等2 jqaverage 等3 jkilroy 等

然后我们有可选数据表.在这里,John Doe 填补了所有领域,Joe Q.平均只有两个,而 Kilroy 没有(即使他在这里).

userid var val1 名约翰1 名出生的埃文河畔斯特拉特福1 时 11-07-19742 姓名乔昆汀2 时 09-04-1962

为了在 MySQL 中重现单表"输出,我们必须创建一个非常复杂的 VIEW,其中包含许多 LEFT JOIN.尽管如此,如果我们有一个基于 (userid, var) 的索引(如果我们对 var 的数据类型使用数字常量或 SET 而不是 varchar,那就更好了):

CREATE OR REPLACE VIEW usertable AS SELECT users.*,names.val AS name//(1)来自用户LEFT JOIN userdata AS names ON ( users.id = names.id AND names.var = 'name')//(2);

我们逻辑模型中的每个字段,例如name",将包含在可选数据表中的元组( id, 'name', value )中.

并且它将在上述查询的第 (1) 部分中生成形式为 s.val AS 的行,引用形式为 LEFT JOIN userdata AS <FIELDNAME>s ON ( users.id = <FIELDNAME>s.id AND <FIELDNAME>s.var = '<FIELDNAME>') 在第 (2) 节中.因此,我们可以通过将上述查询的第一个文本行与动态 Section 1、文本FROM users"和动态构建的 Section 2 连接起来来动态构建查询.

一旦我们这样做,视图上的 SELECT 就与之前完全相同——但现在它们通过 JOIN 从两个规范化表中获取数据.

EXPLAIN SELECT * FROM usertable;

会告诉我们,在此设置中添加列不会明显减慢操作速度,即此解决方案的扩展性相当好.

INSERT 将必须被修改(我们只插入强制性数据,并且只在第一个表中)和 UPDATE 以及:我们要么更新强制性数据表,要么更新可选数据表的单行.但如果目标行不存在,则必须将其插入.

所以我们必须更换

UPDATE usertable SET name = 'John Doe', Born = 'New York' WHERE id = 1;

在这种情况下使用upsert"

INSERT INTO userdata VALUES( 1, 'name', 'John Doe' ),( 1, '出生', '纽约' )重复密钥更新 val = VALUES(val);

(我们需要一个 UNIQUE INDEX on userdata(id, var) 才能使 ON DUPLICATE KEY 工作).

根据行大小和磁盘问题,此更改可能会带来可观的性能提升.

请注意,如果不执行此修改,现有查询将不会产生错误 - 它们将无声地失败.

这里以我们修改两个用户的名字为例;一个确实有名字记录,另一个有 NULL.第一个被修改,第二个没有.

mysql>SELECT * FROM 用户表;+------+-----------+--------------+------+------+|身份证 |用户名 |姓名 |出生 |年龄 |+------+-----------+--------------+------+------+|1 |jdoe |约翰·多伊 |空 |空 ||2 |jq平均|空 |空 |空 ||3 |jtkilroy |空 |空 |空 |+------+-----------+--------------+------+------+3 行(0.00 秒)mysql>UPDATE usertable SET name = 'John Doe II' WHERE username = 'jdoe';查询正常,1 行受影响(0.00 秒)匹配行数:1 更改:1 警告:0mysql>UPDATE usertable SET name = 'James T. Kilroy' WHERE username = 'jtkilroy';查询正常,0 行受影响(0.00 秒)行匹配:0 更改:0 警告:0mysql>从用户表中选择*;+------+-----------+--------------+------+------+|身份证 |用户名 |姓名 |出生 |年龄 |+------+-----------+--------------+------+------+|1 |jdoe |约翰·多伊二世 |空 |空 ||2 |jq平均|空 |空 |空 ||3 |jtkilroy |空 |空 |空 |+------+-----------+--------------+------+------+3 行(0.00 秒)

要知道每一行的排名,对于那些有排名的用户,我们只需检索每个 id 的用户数据行数:

SELECT id, COUNT(*) AS rank FROM userdata GROUP BY id

现在要以填充状态"顺序提取行,我们执行以下操作:

SELECT usertable.* FROM usertableLEFT JOIN (SELECT id, COUNT(*) AS rank FROM userdata GROUP BY id ) AS 排名ON (usertable.id = rating.id)ORDER BY 等级 DESC, id;

LEFT JOIN 确保也能检索到无等级的个体,而按 id 的额外排序确保具有相同等级的人总是以相同的顺序出现.>

I have a table witch has 45 columns but only a few of these are yet completed. This table is continuously updated and added etc. In my auto-complete function i want to select these records ordered by the most completed fields(i hope you understand)?

One of the solutions is to create another filed (the "rank" field) and create a php function that selects * the records and gives a rank for each record.

... but i was wondering if there is a more simple way of doing this only whit a single ORDER BY?

解决方案

MySQL has no function to count the number of non-NULL fields on a row, as far as I know.

So the only way I can think of is to use an explicit condition:

SELECT * FROM mytable
    ORDER BY (IF( column1 IS NULL, 0, 1)
             +IF( column2 IS NULL, 0, 1)
             ...
             +IF( column45 IS NULL, 0, 1)) DESC;

...it is ugly as sin, but should do the trick.

You could also devise a TRIGGER to increment an extra column "fields_filled". The trigger costs you on UPDATE, the 45 IFs hurt you on SELECT; you'll have to model what is more convenient.

Note that indexing all fields to speed up SELECT will cost you when updating (and 45 different indexes probably cost as much as a table scan on select, not to say that the indexed field is a VARCHAR). Run some tests, but I believe that the 45-IF solution is likely to be the best overall.

UPDATE: If you can rework your table structure to normalize it somewhat, you could put the fields in a my_values table. Then you would have a "header table" (maybe with only a unique ID) and a "data table". Empty fields would not exist at all, and then you could sort by how many filled fields are there by using a RIGHT JOIN, counting the filled fields with COUNT(). This would also greatly speed up UPDATE operations, and would allow you to efficiently employ indexes.

EXAMPLE (from table setup to two normalized tables setup):

Let us say we have a set of Customer records. We will have a short subset of "mandatory" data such as ID, username, password, email, etc.; then we will have a maybe much larger subset of "optional" data such as nickname, avatar, date of birth, and so on. As a first step let us assume that all these data are varchar (this, at first sight, looks like a limitation when compared to the single table solution where each column may have its own datatype).

So we have a table like,

ID   username    ....
1    jdoe        etc.
2    jqaverage   etc.
3    jkilroy     etc.

Then we have the optional-data table. Here John Doe has filled all fields, Joe Q. Average only two, and Kilroy none (even if he was here).

userid  var   val
1       name  John
1       born  Stratford-upon-Avon
1       when  11-07-1974
2       name  Joe Quentin
2       when  09-04-1962

In order to reproduce the "single table" output in MySQL we have to create a quite complex VIEW with lots of LEFT JOINs. This view will nonetheless be very fast if we have an index based on (userid, var) (even better if we use a numeric constant or a SET instead of a varchar for the datatype of var:

CREATE OR REPLACE VIEW usertable AS SELECT users.*,
    names.val AS name // (1)
FROM users
    LEFT JOIN userdata AS names ON ( users.id = names.id AND names.var = 'name') // (2)
;

Each field in our logical model, e.g., "name", will be contained in a tuple ( id, 'name', value ) in the optional data table.

And it will yield a line of the form <FIELDNAME>s.val AS <FIELDNAME> in the section (1) of the above query, referring to a line of the form LEFT JOIN userdata AS <FIELDNAME>s ON ( users.id = <FIELDNAME>s.id AND <FIELDNAME>s.var = '<FIELDNAME>') in section (2). So we can construct the query dynamically by concatenating the first textline of the above query with a dynamic Section 1, the text 'FROM users ' and a dynamically-built Section 2.

Once we do this, SELECTs on the view are exactly identical to before -- but now they fetch data from two normalized tables via JOINs.

EXPLAIN SELECT * FROM usertable;

will tell us that adding columns to this setup does not slow down appreciably operations, i.e., this solution scales reasonably well.

INSERTs will have to be modified (we only insert mandatory data, and only in the first table) and UPDATEs as well: we either UPDATE the mandatory data table, or a single row of the optional data table. But if the target row isn't there, then it must be INSERTed.

So we have to replace

UPDATE usertable SET name = 'John Doe', born = 'New York' WHERE id = 1;

with an 'upsert', in this case

INSERT INTO userdata VALUES
        ( 1, 'name', 'John Doe' ),
        ( 1, 'born', 'New York' )
    ON DUPLICATE KEY UPDATE val = VALUES(val);

(We need a UNIQUE INDEX on userdata(id, var) for ON DUPLICATE KEY to work).

Depending on row size and disk issues, this change might yield an appreciable performance gain.

Note that if this modification is not performed, the existing queries will not yield errors - they will silently fail.

Here for example we modify the names of two users; one does have a name on record, the other has NULL. The first is modified, the second is not.

mysql> SELECT * FROM usertable;
+------+-----------+-------------+------+------+
| id   | username  | name        | born | age  |
+------+-----------+-------------+------+------+
|    1 | jdoe      | John Doe    | NULL | NULL |
|    2 | jqaverage | NULL        | NULL | NULL |
|    3 | jtkilroy  | NULL        | NULL | NULL |
+------+-----------+-------------+------+------+
3 rows in set (0.00 sec)
mysql> UPDATE usertable SET name = 'John Doe II' WHERE username = 'jdoe';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> UPDATE usertable SET name = 'James T. Kilroy' WHERE username = 'jtkilroy';
Query OK, 0 rows affected (0.00 sec)
Rows matched: 0  Changed: 0  Warnings: 0
mysql> select * from usertable;
+------+-----------+-------------+------+------+
| id   | username  | name        | born | age  |
+------+-----------+-------------+------+------+
|    1 | jdoe      | John Doe II | NULL | NULL |
|    2 | jqaverage | NULL        | NULL | NULL |
|    3 | jtkilroy  | NULL        | NULL | NULL |
+------+-----------+-------------+------+------+
3 rows in set (0.00 sec)

To know the rank of each row, for those users that do have a rank, we simply retrieve the count of userdata rows per id:

SELECT id, COUNT(*) AS rank FROM userdata GROUP BY id

Now to extract rows in "filled status" order, we do:

SELECT usertable.* FROM usertable
    LEFT JOIN ( SELECT id, COUNT(*) AS rank FROM userdata GROUP BY id ) AS ranking
ON (usertable.id = ranking.id)
ORDER BY rank DESC, id;

The LEFT JOIN ensures that rankless individuals get retrieved too, and the additional ordering by id ensures that people with identical rank always come out in the same order.

这篇关于MySQL按“最完整的字段"查询顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆