MySQL GROUP BY不能按预期工作 [英] MySQL GROUP BY not working as expected
问题描述
我编写了一系列嵌套程序,为员工执行一些薪酬计算。在整个过程中,数据存储在临时表中。
现在,当我在其中一个临时表上执行GROUP BY时,问题就出现了一些意外的行为。你可以在这里看到一个例子:
http:// www。 sqlfiddle.com/#!9/f4095/2
预期的行为是为该GROUP BY输出 employeeid列表
和相应的聚合。 sqlfiddle按预期工作。但是,发生的情况是,我在输出中获得了一行,其中包含空白 employeeid
和所有行的汇总值。但是,如果我在前面的步骤中过滤数据以包含一个唯一的 employeeid
,那么我在输出中获得一行(如预期的那样)并使用 employeeid
列包含一个值,而不是空白。
最初,我认为这可能是一个内存问题,因为我使用很多的临时表(我的理解是这些都存储在内存中),并没有放弃它们。所以我修改了所有的程序,在不再需要时立即删除任何临时表。我仍然面临同样的问题。我的表格大约有2000行和大约100列,GROUP BY的输出应该只有大约300行和7列。
我将继续思考不同的方式我可以测试这一点,并会更新,如果我发现任何东西,但我的问题是:有人知道任何可能导致此类行为的任何事情吗?
编辑:我在上面的sqlfiddle中提供了一个简化的版本,但是这里是导致问题的语句(被分类的表 temp_rawpaycalc
包含 shift_id
, employeeid
以及所有正在汇总的列。 employeeid
位于 SELECT
和 GROUP BY
和 shift_id
。
$ b
DROP TEMPORARY TABLE IF EXISTS temp_topupaggr;
CREATE TEMPORARY TABLE temp_topupaggr
SELECT employeeid,
SUM(小时工)AS TotalHours,
SUM(minguaranteepayamoun t)AS TotalMinGuarantee,
(
SUM(baseWagePayAmount)+
SUM(basicAppPayAmount)+
...
SUM(suppPayAmount))AS TotalEarnings
SUM(reported_baseWagePayAmount)+
SUM(reported_basicAppPayAmount)+
... $(已报告的小时数)AS报告的总时数,
SUM(reported_minGuaranteePayAmount)AS reported_TotalMinGuarantee,
b $ b SUM(reported_suppPayAmount))AS reported_TotalEarnings
FROM temp_rawpaycalc
GROUP BY employeeid;
存储过程参数似乎比存储过程中表的字段名称具有更高的优先级;如果在存储过程中使用的查询中存在字段名称和参数的模糊标识符,则可能会导致难以识别问题。
解决方案:始终限定此类字段名称(无论如何,最佳实践)和/或不要将标识符用于存储过程上的参数,这些参数可能与表中的字段名称冲突(即使字段名称始终是合格的,也是一般可读性的良好实践)。
I have written a series of nested procedures which perform some pay calculations for employees. Throughout these procedures data is stored in temp tables.
Now the issue is some unexpected behaviour when I perform a GROUP BY on one of these temp tables. You can see an example of that step here: http://www.sqlfiddle.com/#!9/f4095/2
The expected behaviour is for that GROUP BY to output a list of employeeid
and the corresponding aggregations. The sqlfiddle works as expected. However, what is happening is that I am getting a single row in the output with a blank employeeid
and aggregated values for all rows. However, if I filter the data in a previous step to include a single unique employeeid
, then I get a single row in the output (as expected) and with the employeeid
column containing a value instead of being blank.
Initially, I thought this could be a memory issue since I am using lots of temporary tables (my understanding is that these are stored in memory) and wasn't dropping them. So I modified all of the procedures to drop any temporary tables as soon as they were no longer required. I am still facing the same issue. My tables are about 2000 rows and around 100 columns and the output of the GROUP BY should only be about 300 rows and 7 columns.
I'm going to continue thinking of different ways I can test this and will update if I find anything, but my question is: Does anyone know of anything that could potentially cause this type of behaviour?
EDIT: I've provided a simplified version in the sqlfiddle above, but here is the statement causing the problem (the table temp_rawpaycalc
which is being GROUPED contains shift_id
, employeeid
, and all of the columns which are being aggregated. employeeid
is in the SELECT
and the GROUP BY
, and shift_id
is excluded from both.
DROP TEMPORARY TABLE IF EXISTS temp_topupaggr;
CREATE TEMPORARY TABLE temp_topupaggr
SELECT employeeid,
SUM(hoursworked) AS TotalHours,
SUM(minguaranteepayamount) AS TotalMinGuarantee,
(
SUM(baseWagePayAmount) +
SUM(basicAppPayAmount) +
...
SUM(suppPayAmount)) AS TotalEarnings,
SUM(reported_hoursworked) AS reported_TotalHours,
SUM(reported_minGuaranteePayAmount) AS reported_TotalMinGuarantee,
(
SUM(reported_baseWagePayAmount) +
SUM(reported_basicAppPayAmount) +
...
SUM(reported_suppPayAmount)) AS reported_TotalEarnings
FROM temp_rawpaycalc
GROUP BY employeeid;
Stored procedure parameters seem to get higher priority than tables' field names inside stored procedures; if there are ambiguous identifiers for field names and parameters in queries used in stored procedures, it can cause hard to identify problems.
Solution: always qualify such field names (good practice anyway) and/or do not use identifiers for parameters on stored procedures that might conflict with field names in tables the procedure references (also a good practice for general readability even if field names are always qualified).
这篇关于MySQL GROUP BY不能按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!