Snowflake SQL中的多元回归 [英] Multiple Regression in Snowflake SQL

查看:12
本文介绍了Snowflake SQL中的多元回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试按照这篇博客文章中引用的结构(http://sqldatamine.blogspot.com/2013/12/true-multiple-regression-using-sql.html)在Snowflake中构建一个多元回归模型,但我正在努力使其适应Snowflake的SQL结构,特别是使用Java脚本中的存储过程。

以下是我试图复制的博客帖子的部分:

declare @p int
set @p = 1

while @p <= (select max(xn) from #x)
 begin  
  insert into #c
  select  xn cxn,  zn czn, sum(xv*zv)/sum(zv*zv) cv 
   from #x join  #z on  xid = zid where zn = @p-1 and xn>zn group by xn, zn
  insert into #z
  select zid, xn,xv- sum(cv*zv) 
   from #x join #z on xid = zid   join  #c  on  czn = zn and cxn = xn  where xn = @p and zn<xn  group by zid, xn,xv
  set @p = @p +1

 end

这是我的尝试:

CREATE TEMP TABLE TEST_TABLE (ID int, AREA float, ROOMS float, ODD float, PRICE float);
    INSERT INTO TEST_TABLE SELECT 1, 2202, 3, 1, 400;
    INSERT INTO TEST_TABLE SELECT 2, 1600, 3, 0, 330;
    INSERT INTO TEST_TABLE SELECT 3, 2400, 3, 1, 369;
    INSERT INTO TEST_TABLE SELECT 4, 1416, 2, 1, 232;
    INSERT INTO TEST_TABLE SELECT 5, 3000, 4, 0, 540;

--INDEPENDENT VARIABLE VECTOR--
CREATE TEMP TABLE X_VAR AS
  SELECT ID xid, 0 xn, 1 xv FROM TEST_TABLE
    UNION ALL
    SELECT ID, 1, ROOMS FROM TEST_TABLE
    UNION ALL
    SELECT ID, 2, AREA FROM TEST_TABLE
    UNION ALL
    SELECT ID, 3, ODD FROM TEST_TABLE;

--DEPENDANT VARIABLE VECTOR--
CREATE TEMP TABLE Y_VAR AS
  SELECT ID yid, 0 yn, PRICE yv FROM TEST_TABLE;

--ORTHOGONAL PROCESSED VALUES--
CREATE TEMP TABLE Z_VAR (zid int, zn int, zv float);
    INSERT INTO Z_VAR SELECT ID, 0 zn, 1 zv FROM TEST_TABLE;

--ORTHOGONALIZATION COEFFICIENTS--
CREATE TEMP TABLE C_VAR (cxn int, czn int, cv float);
    INSERT INTO C_VAR SELECT ID, 0 zn, 1 zv from TEST_TABLE;

--REGRESSION COEFFICIENTS--
CREATE TEMP TABLE B_VAR (bn int, bv float);

--FIRST LOOP: ORTHOGONALIZATION COEFFICIENT CALC--
CREATE OR REPLACE PROCEDURE ORTH()
    RETURNS FLOAT NOT NULL
    LANGUAGE JAVASCRIPT
    AS
  $$
        var sql_counter =
            `SELECT MAX(XN) FROM X_VAR`;
        var sql_bulk =
            `INSERT INTO C_VAR
                    SELECT XN CXN, ZN CZN, SUM(XV*ZV)/SUM(ZV*ZV) CV
                        FROM X_VAR
                    JOIN Z_VAR ON XID = ZID
                        WHERE ZN = p-1
                        AND XN > ZN
                        GROUP BY XN, ZN
                    INSERT INTO Z_VAR
                    SELECT ZID, XN, XV-SUM(CV*ZV)
                        FROM X_VAR
                    JOIN Z_VAR ON XID = ZID
                    JOIN C_VAR ON CZN = ZN AND CXN = XN
                        WHERE
                        1=1
                        AND XN = P
                        AND ZN < XN
                    GROUP BY ZID, XN, XV`;
        var p = 1;
        while (p <= snowflake.execute(sql_counter)) {
            snowflake.execute ({sqlText: sql_bulk})
            p = p + 1
            }
    $$
;

CALL ORTH();
SELECT * FROM C_VAR;

我一直在Snowlake.Execute行遇到一个空参数错误。我做错了什么?

推荐答案

这会运行,但我不确定它试图做什么。问题出在您的产品线上:

while (p <= snowflake.execute(sql_counter)) {

问题是查询没有正确执行。您必须向其发送一个语句对象,该对象是您在下一行上的{sqlText:SQL_Bulk}。

--FIRST LOOP: ORTHOGONALIZATION COEFFICIENT CALC--
CREATE OR REPLACE PROCEDURE ORTH()
    RETURNS FLOAT NOT NULL
    LANGUAGE JAVASCRIPT
    AS
  $$
        var sql_counter =
            `SELECT MAX(XN) as C FROM X_VAR`;
        var sql_bulk =
            `INSERT INTO C_VAR
                    SELECT XN CXN, ZN CZN, SUM(XV*ZV)/SUM(ZV*ZV) CV
                        FROM X_VAR
                    JOIN Z_VAR ON XID = ZID
                        WHERE ZN = p-1
                        AND XN > ZN
                        GROUP BY XN, ZN
                    INSERT INTO Z_VAR
                    SELECT ZID, XN, XV-SUM(CV*ZV)
                        FROM X_VAR
                    JOIN Z_VAR ON XID = ZID
                    JOIN C_VAR ON CZN = ZN AND CXN = XN
                        WHERE
                        1=1
                        AND XN = P
                        AND ZN < XN
                    GROUP BY ZID, XN, XV`;
        var p = 1;
        while (p <= ExecuteSingleValueQuery('C', sql_counter)) {
            snowflake.execute ({sqlText: sql_bulk})
            p = p + 1
            }

// Added a helper function

function ExecuteSingleValueQuery(columnName, queryString) {
    var out;
    cmd1 = {sqlText: queryString};
    stmt = snowflake.createStatement(cmd1);
    var rs;

    rs = stmt.execute();
    rs.next();
    return rs.getColumnValue(columnName);
$$
;

CALL ORTH();
SELECT * FROM C_VAR;

SELECT MAX(XN) FROM X_VAR

这篇关于Snowflake SQL中的多元回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆