百分位COUNT(DISTINCT),相关WHERE仅适用于视图(或不使用DISTINCT) [英] percentile by COUNT(DISTINCT) with correlated WHERE only works with a view (or without DISTINCT)

查看:300
本文介绍了百分位COUNT(DISTINCT),相关WHERE仅适用于视图(或不使用DISTINCT)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个奇怪的,我不知道这是我的语法(这看起来很直接)或一个错误(或只是不支持)。



这里是我的查询,但是是不必要的慢:

  table1 
SET table1column1 =
(SELECT COUNT(DISTINCT table2column1)FROM table2view WHERE table2column1< =(SELECT table2column1 FROM table2 WHERE table2.id = table1.id))
/
(SELECT COUNT(DISTINCT table2column1)FROM table2)

+(SELECT COUNT(DISTINCT table2column2)FROM table2view WHERE table2column2< =(SELECT table2column2 FROM table2 WHERE table2.id = table1.id))$ b $ table $($)$ b /
(SELECT COUNT(DISTINCT table2column2)FROM table2)

+(SELECT COUNT(DISTINCT table2column3)FROM table2view WHERE table2column3< =(SELECT table2column3 FROM table2 WHERE table2.id = table1.id))
/(SELECT COUNT(DISTINCT table2column3)FROM table2);

这只是三个百分位数(table2column1,table2column2和table2column3)的总和, / p>

这里是奇怪的地方。我必须使用一个视图来使用 WHERE 在子查询上工作,或者它将只有 UPDATE 第一行 table1 ,并将剩余行 table1column1 设置为 0 table2view table2 完全重复。是的,很奇怪。



如果我不使用 DISTINCT ,我可以没有视图。那有意义吗? 请注意:我必须拥有 DISTINCT ,因为我有很多重复项。



我试图让它 SELECT 只是从视图,但是减慢了更糟。 b
$ b

有没有人知道什么是问题是什么和最好的方式重做这个查询,所以它不需要这么长时间?它位于 TRIGGER 中,更新的数据很漂亮。



非常感谢! >

详细



我在phpMyAdmin的命令行中测试速度。



我确定退化是来自视图,因为更多的视图和更少的实际表我使用,它得到更慢。



当我做一个没有 DISTINCT 的程序时,它闪电很快。



strong>只适用于视图?



好,我只是设置了 table2 。我试着先做原始查询,用副本代替视图。没有。



我试图用副本而不是视图来执行查询。没有。



希望这些常量的介绍能更好地显示我想要做什么。

  SET @ table2column1_distinct_count =(SELECT COUNT(DISTINCT table2column1)FROM table2); 
SET @ table2column2_distinct_count =(SELECT COUNT(DISTINCT table2column2)FROM table2);
SET @ table2column3_distinct_count =(SELECT COUNT(DISTINCT table2column3)FROM table2);
UPDATE table1,table2
SET table1.table1column1 =(SELECT COUNT(DISTINCT table2column1)FROM table2view WHERE table2column1< = table2.table2column1)/ @ table2column1_distinct_count
+ FROM table2view WHERE table2column2< = table2.table2column2)/ @ table2column2_distinct_count
+(SELECT COUNT(DISTINCT table2column3)FROM table2view WHERE table2column3< = table2.table2column3)/ @ table2column3_distinct_count
WHERE table1.id = table2 。ID;

再次,当我使用 table2 table2view ,它只更新第一行并正确设置所有其他行table1.table1column1 = 0 p>

数学



我试图设置 table1。 table1column1 = table2column1 table2column2 和<$ c的百分位数之和$ c> table2column3 通过 id



table2columnX < =到当前 table2columnX 的明确值 table2columnX s)。



我使用 DISTINCT



这里是 SELECT

/ code>。这有帮助吗?

  CREATE VIEW myTable.table2view AS SELECT 
table2.table2column1 AS table2column1,
table2 .table2column2 AS table2column2,
table2.table2column2 AS table2column3,
FROM table2
GROUP BY table2.id;

GROUP BY 在视图的 SELECT 中,这使得这项工作(我没有看到)?

解决方案

我可能会说,查询很慢,因为它是重复访问表时触发器触发。



我不是SQL专家,但我试过使用临时表来组合查询。您可以看到它是否有助于加快查询。我在下面的代码示例中使用了不同但类似的声音列名。



编辑:我的早期代码中存在计算错误。现在更新。

  SELECT COUNT(id)INTO @no_of_attempts from tb2; 

- 如果存在S1Percentiles;
- DROP TABLE IF EXISTS S2Percentiles;
- DROP TABLE IF EXISTS S3Percentiles;

CREATE TEMPORARY TABLE S1Percentiles(
s1 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);

CREATE TEMPORARY TABLE S2Percentiles(
s2 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);

CREATE TEMPORARY TABLE S3Percentiles(
s3 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);



INSERT INTO S1Percentiles(s1,percentile)
SELECT A.s1,((COUNT(B.s1)/ @ no_of_attempts)* 100)
FROM(SELECT DISTINCT s1 from tb2)A
INNER JOIN tb2 B
ON B.s1 <= A.s1
GROUP BY A.s1;

INSERT INTO S2Percentiles(s2,percentile)
SELECT A.s2,((COUNT(B.s2)/ @ no_of_attempts)* 100)
FROM(SELECT DISTINCT s2 from tb2)A
INNER JOIN tb2 B
ON B.s2 <= A.s2
GROUP BY A.s2;

INSERT INTO S3Percentiles(s3,percentile)
SELECT A.s3,((COUNT(B.s3)/ @ no_of_attempts)* 100)
FROM(SELECT DISTINCT s3 from tb2)A
INNER JOIN tb2 B
ON B.s3 <= A.s3
GROUP BY A.s3;

- select * from S1Percentiles;
- select * from S2Percentiles;
- select * from S3Percentiles;

UPDATE tb1 A
INNER JOIN

SELECT B.tb1_id AS id,(C.percentile + D.percentile + E.percentile)AS sum from tb2 B
INNER JOIN S1Percentiles C
ON B.s1 = C.s1
INNER JOIN S2Percentiles D
ON B.s2 = D.s2
INNER JOIN S3Percentiles E
ON B.s3 = E.s3
)F
ON A.id = F.id

SET A.sum = F.sum;

- SELECT * FROM tb1;

DROP TABLE S1Percentiles;
DROP TABLE S2Percentiles;
DROP TABLE S3Percentiles;

这样做是记录每个分数组的百分位数, c $ c> tb1 列中包含必需数据,而不是重新计算每个学生行的百分位数。



$ c> s1 , s2 s3 / p>

注意:请根据您的数据库模式更新列名称。还要注意,每个百分位数计算已乘以 100 ,因为我相信百分位数通常是这样计算的。


I've got a weird one, and I don't know if it's my syntax (which seems straightforward) or a bug (or just unsupported).

Here's my query that works but is needlessly slow:

UPDATE table1 
    SET table1column1 = 
        (SELECT COUNT(DISTINCT table2column1) FROM table2view WHERE table2column1 <= (SELECT table2column1 FROM table2 WHERE table2.id = table1.id) )
        / 
        (SELECT COUNT(DISTINCT table2column1) FROM table2) 

       + (SELECT COUNT(DISTINCT table2column2) FROM table2view WHERE table2column2 <= (SELECT table2column2 FROM table2 WHERE table2.id = table1.id) ) 
        / 
        (SELECT COUNT(DISTINCT table2column2) FROM table2) 

       + (SELECT COUNT(DISTINCT table2column3) FROM table2view WHERE table2column3 <= (SELECT table2column3 FROM table2 WHERE table2.id = table1.id) ) 
        / (SELECT COUNT(DISTINCT table2column3) FROM table2);

It's just the sum of three percentiles (of table2column1, table2column2, and table2column3) with duplicates removed.

Here's where it gets weird. I have to use a view for this to work on the subquery with the WHERE or it will only UPDATE the first row of table1, and set the rest of the rows' table1column1 to 0. That table2view is an exact duplicate of table2. Yeah, weird.

If I don't use DISTINCT, I can do it without the view. Does that make sense? Note: I have to have DISTINCT because I have lots of duplicates.

I tried making it SELECT only from the view, but that slowed it down worse.

Does anyone know what the problem is and the best way to rework this query so it doesn't take so long? It's in a TRIGGER, and the updated data is pretty on demand.

Many thanks in advance!

Details

I'm testing the speed in phpMyAdmin's command line.

I'm pretty sure the degradation is coming from the view since the more of the view and the less of the actual table I use, the slower it gets.

When I do the one without DISTINCT, it's lightning fast.

Only works on views?

OK, so I just set up a copy of table2. I tried first to do the original query substituting the view with the copy. No go.

I tried to do the query below with the copy instead of the view. No go.

Hopefully the introduction of these constants will better show what I'm trying to do.

SET @table2column1_distinct_count = (SELECT COUNT(DISTINCT table2column1) FROM table2);
SET @table2column2_distinct_count = (SELECT COUNT(DISTINCT table2column2) FROM table2);
SET @table2column3_distinct_count = (SELECT COUNT(DISTINCT table2column3) FROM table2);
UPDATE table1, table2
    SET table1.table1column1 = (SELECT COUNT(DISTINCT table2column1) FROM  table2view WHERE table2column1 <= table2.table2column1) / @table2column1_distinct_count 
    + (SELECT COUNT(DISTINCT table2column2) FROM  table2view WHERE table2column2 <= table2.table2column2) / @table2column2_distinct_count 
    + (SELECT COUNT(DISTINCT table2column3) FROM  table2view WHERE table2column3 <= table2.table2column3) / @table2column3_distinct_count 
        WHERE table1.id = table2.id;

Again, when I use table2 instead of the table2view, it only updates the first row properly and sets all other rows' table1.table1column1 = 0.

Math

I'm trying to set table1.table1column1 = to the sum of the percentiles of table2column1, table2column2, and table2column3 by id.

I do a percentile by (counting the distinct values of a table2columnX <= to the current table2columnX ) / (the total count of distinct table2columnXs).

I use DISTINCT to get rid of the excessive duplicates.

View

Here's the SELECT for the view. Does this help?

CREATE VIEW myTable.table2view AS SELECT
    table2.table2column1 AS table2column1,
    table2.table2column2 AS table2column2,
    table2.table2column2 AS table2column3,
FROM table2
GROUP BY table2.id;

Is there something special about the GROUP BY in the view's SELECT that makes this work (that I'm not seeing)?

解决方案

I would probably say that the query is slow because it is repeatedly accessing the table when the trigger fires.

I am no SQL expert but I have tried to put together a query using temporary tables. You can see if it helps speed up the query. I have used different but similar sounding column names in my code sample below.

EDIT : There was a calculation error in my earlier code. Updated now.

SELECT COUNT(id) INTO @no_of_attempts from tb2;

-- DROP TABLE IF EXISTS S1Percentiles;
-- DROP TABLE IF EXISTS S2Percentiles;
-- DROP TABLE IF EXISTS S3Percentiles;

CREATE TEMPORARY TABLE S1Percentiles (
    s1 FLOAT NOT NULL,
    percentile FLOAT NOT NULL DEFAULT 0.00
);

CREATE TEMPORARY TABLE S2Percentiles (
    s2 FLOAT NOT NULL,
    percentile FLOAT NOT NULL DEFAULT 0.00
);

CREATE TEMPORARY TABLE S3Percentiles (
    s3 FLOAT NOT NULL,
    percentile FLOAT NOT NULL DEFAULT 0.00
);



INSERT INTO S1Percentiles (s1, percentile)
    SELECT A.s1, ((COUNT(B.s1)/@no_of_attempts)*100)
    FROM (SELECT DISTINCT s1 from tb2) A
    INNER JOIN tb2 B
    ON B.s1 <= A.s1
    GROUP BY A.s1;

INSERT INTO S2Percentiles (s2, percentile)
    SELECT A.s2, ((COUNT(B.s2)/@no_of_attempts)*100)
    FROM (SELECT DISTINCT s2 from tb2) A
    INNER JOIN tb2 B
    ON B.s2 <= A.s2
    GROUP BY A.s2;

INSERT INTO S3Percentiles (s3, percentile)
    SELECT A.s3, ((COUNT(B.s3)/@no_of_attempts)*100)
    FROM (SELECT DISTINCT s3 from tb2) A
    INNER JOIN tb2 B
    ON B.s3 <= A.s3
    GROUP BY A.s3;

-- select * from S1Percentiles;
-- select * from S2Percentiles;
-- select * from S3Percentiles;

UPDATE tb1 A
    INNER JOIN
    (
    SELECT B.tb1_id AS id, (C.percentile + D.percentile + E.percentile) AS sum FROM tb2 B
        INNER JOIN S1Percentiles C
        ON B.s1 = C.s1
        INNER JOIN S2Percentiles D
        ON B.s2 = D.s2
        INNER JOIN S3Percentiles E
        ON B.s3 = E.s3
    ) F
    ON A.id = F.id

    SET A.sum = F.sum;

-- SELECT * FROM tb1;

DROP TABLE S1Percentiles;
DROP TABLE S2Percentiles;
DROP TABLE S3Percentiles;

What this does is that it records the percentile for each score group and then finally just updates the tb1 column with the requisite data instead of recalculating the percentile for each student row.

You should also index columns s1, s2 and s3 for optimizing the queries on these columns.

Note: Please update the column names according to your db schema. Also note that each percentile calculation has been multiplied by 100 as I believe that percentile is usually calculated that way.

这篇关于百分位COUNT(DISTINCT),相关WHERE仅适用于视图(或不使用DISTINCT)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆