在 SQLite 中制作数据透视表的最佳方法? [英] Best way to do a pivot table in SQLite?

查看:119
本文介绍了在 SQLite 中制作数据透视表的最佳方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 C# 和 SQLite 对大量数据进行切片,我经常需要以数据透视表的形式显示我的数据.通过使用 C# 从另一个查询创建 SQL 命令,我可以轻松地使我的数据透视成为动态,但我仍然无法决定以哪种方式进行数据透视,所以我想听听经验丰富的程序员对此问题的一些意见我..

I'm using C# and SQLite to slice large amounts of data, and I often need to display my data in pivot table form. I can easily make my pivots dynamic by using C# to create the SQL command from another query, but I still can't decide which way to do the pivoting itself, so I would like to hear some opinions on that matter from programmers more experienced than me..

我想到了三种方法.假设我们有一个名为 tData 的简单表,它包含三列:row"表示该数据的行号,col"表示列号,val"表示值.

I have three methods in mind. Lets say we have a simple table named tData with three columns: "row" represents the row number of that data,"col" represents the column number, and "val" represents the value.

正统的方法是使用 CASE 表达式:

The orthodox method is to use CASE expressions:

SELECT
      row,
      sum(CASE col WHEN 1 THEN val END) AS col1,
      sum(CASE col WHEN 2 THEN val END) AS col2,
      sum(CASE col WHEN 3 THEN val END) AS col3
FROM tData
GROUP BY row

但是,我在想,如果我放弃 CASE 语句并直接在值上使用逻辑表达式,利用 true==1 和 false==0 的事实可能会更快:

However, I was thinking maybe it could be faster if I ditch the CASE statements and use a logical expression directly on the value, utilizing the fact that true==1 and false==0:

SELECT
      row,
      sum((col=1)*val) AS col1,
      sum((col=2)*val) AS col2,
      sum((col=3)*val) AS col3
FROM tData
GROUP BY row

我怀疑这个方法应该更快,因为 CASE 表达式应该有一些开销,但我不太确定.

I suspect this method should be faster, since the CASE expression should have some overhead, but I'm not really sure.

第三种方法稍微复杂一点:它使用 JOIN 进行旋转:

The third method is a bit more complex: it uses JOINs to do the pivoting:

SELECT
      rows.row,
      col1.valSum AS col1,
      col2.valSum AS col2,
      col3.valSum AS col3
FROM
    (SELECT row FROM tData GROUP BY row) AS rows
LEFT JOIN
    (SELECT row,sum(val) AS valSum FROM tData WHERE col=1 GROUP BY row) AS col1
    ON rows.row=col1.row
LEFT JOIN
    (SELECT row,sum(val) AS valSum FROM tData WHERE col=2 GROUP BY row) AS col2
    ON rows.row=col2.row
LEFT JOIN
    (SELECT row,sum(val) AS valSum FROM tData WHERE col=3 GROUP BY row) AS col3
    ON rows.row=col3.row

确实,那些 JOIN 有严重的开销,但根据我在处理大表时的有限经验,SQL 实现可以比自定义数据操作每行操作快得多的简单过滤器组和求和操作,这不仅仅是弥补了这一开销.问题是,这些类型的 SQL 语句生成起来更复杂,因为每列出现在语句中的两个位置 - 一次在 fields 子句中,一次在 FROM 子句中,而不是像前两种方法那样只出现在 fields 子句中.另外,我需要小心所有这些临时表的名称.

True, those JOINs have a serious overhead, but from my limited experience when dealing with large tables SQL implementations can do simple filter-group-and-sum operations much faster than custom-data-manipulation-on-each-row operations, and that more than makes up to that overhead. The problem is, those kind of SQL statements are more complex to generate, since each column appears in two places in the statement - once in the fields clause and once in the FROM clause, instead of just in the fields clause like the first two methods. Plus I need to be careful with all those temp table's names.

所以,有什么意见吗?

推荐答案

我希望 case 语句方法比对您的表执行尽可能多的 groupbys-and-joins 更快,因为 column-in 中有不同的值-问题.前者是 CPU 密集型,后者是磁盘密集型.例如.如果要成为列标题的列值包含一周中的某一天,则您将有七个数据透视列和七个选择分组.那可能很贵;这将取决于桌子的大小.

I would expect the case statement approach to perform more quickly than doing as many groupbys-and-joins against your table as there are distinct values in the column-in-question. The former is CPU intensive, the latter is disk-intensive. E.g. if your column value that is to become the column-header contained a day of the week, you'd have seven pivot columns, and seven selects-groupbys. That could be expensive; it would depend on the size of the table.

这篇关于在 SQLite 中制作数据透视表的最佳方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆