Matlab中的财务代码和日期汇总重复组合表 [英] Aggregate Duplicate Combinations in Table for Financial Ticker and Dates in Matlab

查看:113
本文介绍了Matlab中的财务代码和日期汇总重复组合表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑描述N个金融交易的三个N x 1向量:tickersdatesvolumes.这些向量的来源是这样的表格:

Consider three N by 1 vectors describing N financial transactions: tickers, dates, and volumes. The source for these vectors is a table like this:

Tickers    Dates    Volumes
-------    -----    -------
TICKER1    1        200
TICKER1    1        400
TICKER1    2        100
TICKER2    1        300
...        ...      ...

源表首先按代码分类,然后按日期排序.

The source table is sorted firstly by tickers, and secondly by dates.

我想合并(即计算总和)给定公司在给定日期内发生的所有交易;这意味着将消除给定日期内与一个公司内的交易相对应的所有报价和日期的重复,而将与这些交易相对应的交易量全部加在一起并保存在唯一剩余的条目中.最终输出应如下所示:

I would like to consolidate (i.e. compute the sum of) all transactions that happened within a given day for a given company; meaning that the duplicates of all tickers and dates corresponding to transactions within one company within a given day are eliminated, while the volumes corresponding to these transactions are all added together and saved in the only remaining entry. The final output should look like this:

Tickers    Dates    Volumes
-------    -----    -------
TICKER1    1        600
TICKER1    2        100
TICKER2    1        300

请注意,由于不同公司(此处为TICKER1TICKER2)可以在同一天(此处为1)进行交易,因此单独的Dates向量仍包含非唯一条目.同样,Tickers仍包含非唯一条目,因为同一家公司(此处为TICKER1)可以在不同日期(此处为12)交易.寻求实现仅针对TickersDates的组合键"来定义.

Note that the Dates vector alone still contains non-unique entries because different companies (here TICKER1 and TICKER2) can trade on the same day (here 1); similarly, the Tickers still contain non-unique entries because the same company (here TICKER1) can trade on different days (here 1 and 2. The kind of uniqueness I am looking to achieve is only defined with respect to the combined "key" of Tickers and Dates.

到目前为止,我的想法是继续进行以下操作:

My idea so far has been to proceed as follows:

  1. 标识volumes中所有与之对应的股票代号不唯一的系数的坐标.
  2. volume中属于该非唯一条目序列的所有系数求和,并将和保存为非唯一序列中的第一个条目.
  3. 删除属于该非唯一序列的所有后续条目以及它们在datestickers中的相应条目.
  1. Identify the coordinates of all coefficients in volumes for which the corresponding ticker and the corresponding date are non-unique.
  2. Sum over all coefficients in volume that belong to this series of non-unique entries and save the sum as the first entry in the non-unique series..
  3. Delete all subsequent entries that belong to this non-unique series along with their corresponding entries in dates and tickers.

到目前为止,我一直在尝试[~,idx] = unique(),但没有成功.此函数仅返回一系列非唯一条目中第一个的坐标.

So far I have been experimenting with [~,idx] = unique() but without much success. This function returns only the coordinate of the first of any series of non-unique entries.

我的问题有两个方面:(1)鉴于我的目标,上述伪代码"在逻辑上是正确的吗?如果不是,如何对其进行纠正才能表现出所需的效果? (2)如何在MATLAB中实现?

My question is two-fold: (1) Given my objective is the above "pseudocode" logically correct? If not, how would it have to be corrected in order to behave as desired? (2) How can this be implemented in MATLAB?

请注意,为了方便演示,我将向量显示为一个表变量.我正在使用三个单独的阵列,并希望使用最底层的解决方案.

Note that I displayed the vectors as one table variable for easier presentation. I am working with three separate arrays and prefer the most low-level solution possible.

任何建议将不胜感激!

推荐答案

您只需使用container.Map即可简单地将代码自动映射到一个数字.然后,使用映射来构建包含您的数据的矩阵.然后,您可以使用代码ID和日期的唯一组合来汇总总和.最后,您重新构建一个新表,并将股票代号重新映射回股票代号.以下代码经过大量注释,以指导您完成整个过程.

You can simply map your tickers to a number first by using a container.Map. Then use the mapping to construct a matrix with your data. You can then use the unique combination of the ticker ID and the date to aggregate the sum. Finally you reconstruct a new table and remap the ticker IDs back into ticker names. The following code is heavily commented to guide you through the process.

您将需要我的超级有用 自定义row2cell.m函数.

% Dummy Data
T = table({'a','a','a','b'}',[1 1 2 1]', [1 1 1 1]' , [1 1 1 1]'*10);

% Find unique ticker name
C = unique( table2cell( T(:,1)));

% Create map of ticker name to num
M = containers.Map( C, 1:length(C) );
I = 1:length(C);

% Transform Table to Array
F = [cellfun( @(x) M(x), table2cell( T(:,1) ) ) table2array( T(:,2:end) )];

% Find unique combinations of ticker/day
U = unique(F(:,1:2),'rows');

% Aggregate by ticker and date
T = array2table( cell2mat( cellfun(@(x) [x sum( F( F(:,1) == x(1) & F(:,2) == x(2), 3:4 ), 1 )], rows2cell( U ), 'UniformOutput', false ) ) );

% Remap number to ticker name
T.Var1 = C(table2array( T(:,1) ) );

第18行如下所示,这是脚本的强大之处

Line 18 is as follows and this is the powerhouse of the script

T = array2table( cell2mat( cellfun(@(x) [x sum( F( F(:,1) == x(1) & F(:,2) == x(2), 3:4 ), 1 )], rows2cell( U ), 'UniformOutput', false ) ) );

我们使用以下方式将每日/每日股票组合作为单元格:

We have the unique combo of ticker/day as cells using:

rows2cell( U )

在单元格中,x(1)是代码,而x(2)是日期.我们想运行一些将在这两个参数上聚合的东西.假设采用这种形式,我们可以使用以下代码获取逻辑掩码,以获取与此每日报价器组合相对应的所有数据.

In the cell, x(1) is the ticker and x(2) is the date. We want to run something that will aggregate on these two parameters. Assuming this form, we can get our logical mask using the following to obtain all the data that corresponds to this ticker/day combo.

F(:,1) == x(1) & F(:,2) == x(2)

使用此索引,我们可以使用此索引拉出第三列和第四列:

Using this index, we can pull the 3rd and 4th columns using this:

F( F(:,1) == x(1) & F(:,2) == x(2), 3:4 )

然后使用以下方法在第一个方向(行)上求和:

Then sum them on the first direction (rows) using:

sum( F( F(:,1) == x(1) & F(:,2) == x(2), 3:4 ), 1 )

由于我们要通过连接输入(行情/天)和数据(col 3/4)来构造新表的行,因此可以在cellfun中使用此匿名函数:

Since we want to construct the row of our new table by concatenating our input (ticker/day) and our data (col 3/4), we can use this anonymous function in cellfun:

@(x) [x sum( F( F(:,1) == x(1) & F(:,2) == x(2), 3:4 ), 1 )]

由于cellfun将输出代表我们行的单元格向量,因此我们需要使用cell2mat将其转换为矩阵,然后使用array2table将其从矩阵转换为表,如下所示:

Since our cellfun will output a vector of cells representing our rows, we need to convert it to a matrix using cell2mat and then from a matrix to a table using array2table as follows:

array2table( cell2mat( ... ) ).

这是结果.输入表:

Var1    Var2    Var3    Var4
____    ____    ____    ____
'a'     1       1       10  
'a'     1       1       10  
'a'     2       1       10  
'b'     1       1       10  

输出表:

Var1    Var2    Var3    Var4
____    ____    ____    ____
'a'     1       2       20  
'a'     2       1       10  
'b'     1       1       10  

这篇关于Matlab中的财务代码和日期汇总重复组合表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆