在内部连接中迭代使用 MySQL LAG() [英] Use MySQL LAG() within inner join iteratively

查看:64
本文介绍了在内部连接中迭代使用 MySQL LAG()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个表,我需要在循环中迭代地连接它们以创建值来填充其中一个表.我主要是寻找最简洁的方式来进行单周期移位连接,详情如下. 下面两个输入表的示例:

I have two tables that I need to join iteratively within a loop to create values to populate one of the tables. I'm primarily looking for the cleanest way to do a one-period-shifted join, details below. Sample of the two input tables below:

DROP TABLE IF EXISTS holdings;

CREATE TABLE holdings
(date DATE NOT NULL
,ticker CHAR(4) NOT NULL
,wgt DECIMAL(5,2)
,PRIMARY KEY(date,ticker)
);

INSERT INTO holdings VALUES
('2019-03-29','MTUM',0.2),
('2019-03-29','QUAL',0.2),
('2019-03-29','SIZE',0.2),
('2019-03-29','USMV',0.2),
('2019-03-29','VLUE',0.2),
('2019-06-28','MTUM',0.2),
('2019-06-28','QUAL',0.2),
('2019-06-28','SIZE',0.2),
('2019-06-28','USMV',0.2),
('2019-06-28','VLUE',0.2);

DROP TABLE IF EXISTS returns;

CREATE TABLE returns
(monthEnd  DATE NOT NULL
,ticker CHAR(4) NOT NULL
,ret DECIMAL(11,8) NOT NULL
,PRIMARY KEY(monthend,ticker)
);

INSERT INTO returns VALUES
('2019-03-29',  'USMV' ,   0.02715291),
('2019-03-29',  'SIZE' ,   0.00512113),
('2019-03-29',  'VLUE' ,  -0.00943159),
('2019-03-29',  'MTUM' ,   0.02118479),
('2019-03-29',  'QUAL' ,   0.02533432),
('2019-04-30',  'USMV' ,   0.02176873),
('2019-04-30',  'SIZE' ,   0.03818616),
('2019-04-30',  'VLUE' ,   0.03418481),
('2019-04-30',  'MTUM' ,   0.02255305),
('2019-04-30',  'QUAL' ,   0.03794464),
('2019-05-31',  'VLUE' ,  -0.09601646),
('2019-05-31',  'MTUM' ,  -0.02196844),
('2019-05-31',  'QUAL' ,  -0.06582526),
('2019-05-31',  'USMV' ,  -0.01614514),
('2019-05-31',  'SIZE' ,  -0.06918445),
('2019-06-28',  'QUAL' ,   0.07073081),
('2019-06-28',  'VLUE' ,   0.09571038),
('2019-06-28',  'MTUM' ,   0.06121113),
('2019-06-28',  'USMV' ,   0.04984654),
('2019-06-28',  'SIZE' ,   0.07531133),
('2019-07-31',  'QUAL' ,   0.013775  ),
('2019-07-31',  'MTUM' ,   0.01795953),
('2019-07-31',  'SIZE' ,   0.01208791),
('2019-07-31',  'VLUE' ,   0.01601182),
('2019-07-31',  'USMV' ,   0.01668555);

这里的第一步是将持有加入到回报,将日期对齐移动一个周期,这样第一次迭代的输出如下:

First step here is to join holdings to returns, shifting date alignment by one period, such that output from the first iteration is as follows:

date    portName    ticker  wgt monthEnd    ticker  ret
2019-03-29  test    MTUM    0.2 2019-04-30  MTUM    0.02255305
2019-03-29  test    QUAL    0.2 2019-04-30  QUAL    0.03794464
2019-03-29  test    SIZE    0.2 2019-04-30  SIZE    0.03818616
2019-03-29  test    USMV    0.2 2019-04-30  USMV    0.02176873
2019-03-29  test    VLUE    0.2 2019-04-30  VLUE    0.03418481

此时,wgtret 组合在一起,为每个条目提供一个新的 wgt(为简单起见,省略了计算).这些新的权重被插入到持仓表中,这些记录如下所示:

At this point, wgt and ret are combined to give a new wgt for each entry (calculation omitted for simplicity). These new weights are inserted into the holdings table and these records look as follows:

date    portName    ticker  wgt
2019-04-30  test    MTUM    0.201442484998052
2019-04-30  test    QUAL    0.202261035805858
2019-04-30  test    SIZE    0.198273711216605
2019-04-30  test    USMV    0.202619777232855
2019-04-30  test    VLUE    0.19540299074663

下一步是应用与之前相同的程序,采用这些权重,加入单周期偏移回报,产生如下所示的输出:

Next step is to apply the same procedure as before, taking these weights, joining with one-period-shifted return, to produce output that looks like this:

date    portName    ticker  wgt monthEnd    ticker  ret
2019-04-30  test    MTUM    0.201442484998052   2019-05-31  MTUM    -0.02196844
2019-04-30  test    QUAL    0.202261035805858   2019-05-31  QUAL    -0.06582526
2019-04-30  test    SIZE    0.198273711216605   2019-05-31  SIZE    -0.06918445
2019-04-30  test    USMV    0.202619777232855   2019-05-31  USMV    -0.01614514
2019-04-30  test    VLUE    0.19540299074663    2019-05-31  VLUE    -0.09601646

我们再次结合 wgtret 来确定新的 wgt 值并将这些值插入到 holdings 表中.

We again combine wgt and ret to determine new wgt values and insert these into the holdings table.

这将持续所有日期,直到我们到达日期列表的末尾(鉴于上述示例,持有 表中的最后一个条目将具有 date 2019-07-31).

This continues for all dates until we reach the end of our list of dates (given the above example, the last entry in the holdings table would have date 2019-07-31).

需要注意的是,当日期条目已经存在时,此过程用于填充持有.换句话说,我们会对 4/30、5/31 和 7/31 执行此过程,但会将 6/28 的记录按原样保留在 持仓 表中.

One caveat, this process is used to populate the holdings table except when a date entry already exists. In other words, we'd do this process for 4/30, 5/31, and 7/31, but would leave the 6/28 records as-is in the holdings table.

所需的样本输出(对于子集)如下所示:

Desired sample output (for a subset) would look like this:

date    portName    ticker  wgt
2019-03-29  test    MTUM    0.2
2019-03-29  test    QUAL    0.2
2019-03-29  test    SIZE    0.2
2019-03-29  test    USMV    0.2
2019-03-29  test    VLUE    0.2
2019-04-30  test    MTUM    0.201442484998052
2019-04-30  test    QUAL    0.202261035805858
2019-04-30  test    SIZE    0.198273711216605
2019-04-30  test    USMV    0.202619777232855
2019-04-30  test    VLUE    0.19540299074663
2019-05-31  test    MTUM    0.205430582
2019-05-31  test    QUAL    0.201024226
2019-05-31  test    SIZE    0.198113682
2019-05-31  test    USMV    0.204236958
2019-05-31  test    VLUE    0.205066864
2019-06-28  test    MTUM    0.2
2019-06-28  test    QUAL    0.2
2019-06-28  test    SIZE    0.2
2019-06-28  test    USMV    0.2
2019-06-28  test    VLUE    0.2

我的基本方法是在唯一的日期列表上使用循环,首先检查currentDate"值是否已经在 holdings 表中,然后在按照所述进行计算之前加入.专门寻找以最简洁的方式进行上面引用的单周期移位连接的输入.我来自 SQL Server,我通常会在其中创建一个我可以使用的虚拟rowNum"列(即,a.rowNum = b.rowNum+1)进行移位.想知道是否有一种方法可以在连接中使用 LAG()(或其他运算符),或者是否有另一种更简洁的方法.

My basic approach is to use a loop over a unique list of dates, first checking to see if the 'currentDate' value is already in the holdings table, and then joining before doing calculation as described. Looking specifically for input on the cleanest way to do the one-period-shifted join referenced above. I'm coming from SQL Server, where I would typically create a dummy 'rowNum' column which I could use (ie, a.rowNum = b.rowNum+1) to do the shift. Wondering if there's a way to use the LAG() (or other operator) within the join, of if there's another approach that would be cleaner still.

我认为没有更简洁的(即基于非循环的)方法,因为每组条目都依赖于最后一个条目.欢迎对方法提出其他想法.

I don't think there's a cleaner (ie, non-loop-based) approach as each set of entries depend on the last. Welcome other thoughts on approach.

推荐答案

考虑以下内容;我还在使用 MySQL 8.0 之前的版本(我知道),所以这里没有窗口函数...

Consider the following; I'm still using a pre-8.0 version of MySQL (I know), so no windowing functions here...

DROP TABLE IF EXISTS holdings;

CREATE TABLE holdings
(date DATE NOT NULL
,ticker CHAR(4) NOT NULL
,wgt DECIMAL(5,2)
,PRIMARY KEY(date,ticker)
);

INSERT INTO holdings VALUES
('2019-03-29','MTUM',0.2),
('2019-03-29','QUAL',0.2),
('2019-03-29','SIZE',0.2),
('2019-03-29','USMV',0.2),
('2019-03-29','VLUE',0.2),
('2019-06-28','MTUM',0.2),
('2019-06-28','QUAL',0.2),
('2019-06-28','SIZE',0.2),
('2019-06-28','USMV',0.2),
('2019-06-28','VLUE',0.2);

DROP TABLE IF EXISTS returns;

CREATE TABLE returns
(monthEnd  DATE NOT NULL
,ticker CHAR(4) NOT NULL
,ret DECIMAL(11,8) NOT NULL
,PRIMARY KEY(monthend,ticker)
);

INSERT INTO returns VALUES
('2019-03-29',  'USMV' ,   0.02715291),
('2019-03-29',  'SIZE' ,   0.00512113),
('2019-03-29',  'VLUE' ,  -0.00943159),
('2019-03-29',  'MTUM' ,   0.02118479),
('2019-03-29',  'QUAL' ,   0.02533432),
('2019-04-30',  'USMV' ,   0.02176873),
('2019-04-30',  'SIZE' ,   0.03818616),
('2019-04-30',  'VLUE' ,   0.03418481),
('2019-04-30',  'MTUM' ,   0.02255305),
('2019-04-30',  'QUAL' ,   0.03794464),
('2019-05-31',  'VLUE' ,  -0.09601646),
('2019-05-31',  'MTUM' ,  -0.02196844),
('2019-05-31',  'QUAL' ,  -0.06582526),
('2019-05-31',  'USMV' ,  -0.01614514),
('2019-05-31',  'SIZE' ,  -0.06918445),
('2019-06-28',  'QUAL' ,   0.07073081),
('2019-06-28',  'VLUE' ,   0.09571038),
('2019-06-28',  'MTUM' ,   0.06121113),
('2019-06-28',  'USMV' ,   0.04984654),
('2019-06-28',  'SIZE' ,   0.07531133),
('2019-07-31',  'QUAL' ,   0.013775  ),
('2019-07-31',  'MTUM' ,   0.01795953),
('2019-07-31',  'SIZE' ,   0.01208791),
('2019-07-31',  'VLUE' ,   0.01601182),
('2019-07-31',  'USMV' ,   0.01668555);

SELECT a.date
     , a.ticker a_ticker
     , a.wgt
     , b.monthend
     , b.ticker b_ticker
     , b.ret
  FROM 
     ( SELECT h.*
            , MIN(monthend) monthend 
         FROM holdings h 
         LEFT -- if appropriate
         JOIN returns r 
           ON r.monthend > h.date 
          AND r.ticker = h.ticker 
        GROUP 
           BY h.date
            , h.ticker
     ) a 
  LEFT -- if appropriate
  JOIN returns b 
    ON b.monthend = a.monthen
d AND b.ticker = a.ticker;
+------------+--------+------+------------+------------+--------+------------+
| date       | ticker | wgt  | monthend   | monthEnd   | ticker | ret        |
+------------+--------+------+------------+------------+--------+------------+
| 2019-03-29 | MTUM   | 0.20 | 2019-04-30 | 2019-04-30 | MTUM   | 0.02255305 |
| 2019-03-29 | QUAL   | 0.20 | 2019-04-30 | 2019-04-30 | QUAL   | 0.03794464 |
| 2019-03-29 | SIZE   | 0.20 | 2019-04-30 | 2019-04-30 | SIZE   | 0.03818616 |
| 2019-03-29 | USMV   | 0.20 | 2019-04-30 | 2019-04-30 | USMV   | 0.02176873 |
| 2019-03-29 | VLUE   | 0.20 | 2019-04-30 | 2019-04-30 | VLUE   | 0.03418481 |
| 2019-06-28 | MTUM   | 0.20 | 2019-07-31 | 2019-07-31 | MTUM   | 0.01795953 |
| 2019-06-28 | QUAL   | 0.20 | 2019-07-31 | 2019-07-31 | QUAL   | 0.01377500 |
| 2019-06-28 | SIZE   | 0.20 | 2019-07-31 | 2019-07-31 | SIZE   | 0.01208791 |
| 2019-06-28 | USMV   | 0.20 | 2019-07-31 | 2019-07-31 | USMV   | 0.01668555 |
| 2019-06-28 | VLUE   | 0.20 | 2019-07-31 | 2019-07-31 | VLUE   | 0.01601182 |
+------------+--------+------+------------+------------+--------+------------+

冲洗并重复.

这篇关于在内部连接中迭代使用 MySQL LAG()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆