积在许多列的SQL表 [英] Dot product in an SQL table with many columns
问题描述
我想通过单一的指定行乘以每一行,再总结为每行该产物(点积)。我SQL表是名称和相关的高维向量的列表。该表有1串+ 1000的数值列。通常有几百万行。在矢量/阵列的每个浮子处于一个新列:
I would like to multiply each row by a single specified row, and then sum that product for each row (a dot product.) My SQL table is a list of names and associated high-dimensional vectors. The table has 1 string + 1000 numerical columns. There are usually a few million rows. Each float in the vectors/arrays is in a new column:
+--------+------+-------+------+---+--------+
| NAME | COL0 | COL1 | COL2 | … | COL999 |
+--------+------+-------+------+---+--------+
| TOPAZ | 0.73 | 0.77 | 0.15 | | 0.55 |
| GARDEN | 0.41 | 0.57 | 0.61 | | 0.00 |
| HAVE | 0.40 | 0.32 | 0.23 | | 0.52 |
+--------+------+-------+------+---+--------+
在阵列数学,这将仅仅是总和(表*行,轴= 1)
。这是一个大阵,并需要几秒钟时,它用C写的,生活在纯净的内存来执行,所以速度是非常重要的。但它变得太大,是在本地内存,现在,数据需要生活在一个SQL服务器。如何计算表* GARDEN一种有效的方式,这样我可以得到每一行的点积带花园(下面的例子输出)?
In array math, this would simply be sum(table * row, axis=1)
. It's a large array, and takes a few seconds to execute when it's written in C and lives in pure memory, so speed is very important. But it's gotten too big to be in local memory now, and the data needs to live in a SQL server. How do I calculate table * GARDEN in an efficient way such that I can get the dot product of every row with GARDEN (example output below)?
+--------+------+-------+------+---+--------+------+
| NAME | COL0 | COL1 | COL2 | … | COL999 | DOT |
+--------+------+-------+------+---+--------+------+
| TOPAZ | 0.57 | 0.24 | 0.34 | | 0.21 | 0.46 |
| GARDEN | 0.42 | 0.43 | 0.12 | | 0.66 | 0.48 |
| HAVE | 0.31 | 0.37 | 0.30 | | 0.47 | 0.40 |
+--------+------+-------+------+---+--------+------+
非常感谢!
推荐答案
的结构是:
select t.*,
(t.col0 * garden.col0 +
t.col1 * garden.col1 + . . .
t.col999 * garden.col999
) as DOT
from t cross join
(select t.*
from t
where name = 'GARDEN'
) garden;
这仍然不会特别快。在C#几秒钟可能是许多分钟,除非你有并行的硬件和可以利用它一个很好的SQL数据库。
This still won't be particularly fast. "A few seconds" in C# might be many minutes, unless you have parallel hardware and a good SQL database that can take advantage of it.
这篇关于积在许多列的SQL表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!