在Matlab中对数据进行排序 [英] Sorting the data in Matlab

查看:292
本文介绍了在Matlab中对数据进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将国家A的产品编号(第1列)和年份(第2列)与相同的产品和国家B的年份进行匹配,以便能够减去相应的贸易编号(第3列减去第6列)该年度和该产品中的国家/地区.如果找不到对应的内容,我希望将该帖子扔掉.

I want to match a product number (column 1) and a year (column 2) for Country A with the same product and year for Country B to be able to subtract the corresponding trade numbers (column 3 minus column 6) for the countries in that year and in that product. When no counterpart is found I want that post to be thrown out.

在下面的链接中的示例中,我手动执行此操作直到第22行(​​减去交易额除外).例如,下面的行(23)说产品030420在1995年的交易价格为2.823,但是该产品在该年度的交易未记录在国家B中,因此我要删除该帖子.相应地,B国在1994年记录了产品030420的贸易,但A国没有对应产品,因此该职位也应删除.

In the example in the link below I have manually done this until row 22 (except subtracting the trade figures). The row below (23) for instance says that product 030420 was traded in 1995 at value 2.823, but no trade in that product in that year is recorded in country B, so that post I want deleted. Correspondingly, in Country B, trade of product 030420 is recorded in 1994, but there is no counterpart in country A, so that post also should be deleted.

即使下面示例中的数据在excel中显示(我试图在excel中解决此问题,但也很棘手),我现在将数据存储在Matlab中的矩阵中,并希望为其编写代码,但是对Matlab/编码来说是很新的东西.换句话说,它可能是:

Even though the data in the example below is shown in excel (I tried to solve this in excel but it got tricky) I now have the data stored in a matrix in Matlab and want to write a code for it, but I'm quite new to Matlab/coding. In words it might be something like:

  • 如果第1列中的第一个条目=第4列中的任何条目
  • 然后,如果第2列中的第一个条目=第5列中的任何条目,则从第3列的第一个条目中减去交易量(第6列,无论最终找到何处).
  • 如果找不到匹配项,我希望程序如上所述将其丢掉.
  • IF first entry in column 1 = any entry in column 4
  • then also IF first entry in column 2 = any entry in column 5, subtract the trade figure (column 6, wherever it ends up being found) from the first entry in column 3.
  • If no match is found I'd like the program to throw out that post as described above.

然后当然要对其余的样本重复该过程.

And then of course repeat the procedure for the rest of the sample.

示例:

如果有人想读这篇简短的文章并有任何建议,将不胜感激.

If anyone feels like reading this shorter essay and have any suggestions it'd be much appreciated.

推荐答案

在此解决方案中,我将使用

In this solution, I will be using Dataset Arrays from the Statistics Toolbox.

请考虑以下两个示例CSV文件(类似于您的Excel文件,但我将两个国家/地区分成了单独的文件):

Consider the following two sample CSV files (similar to your Excel file, but I divided the two countries into separate files):

ProductCode,Year,TradeValue
011111,1992,5.934
011111,1999,7.05
022222,2002,5.2
033333,2005,16.6
033333,2006,55

countryB.csv

ProductCode,Year,TradeValue
011111,1992,24.5
011111,1999,25
033333,2005,33.11
033333,2006,44.92
044444,2010,10.8

下面的代码读取两个数据集,并执行内部的 join 使用(ProductCode,Year)作为行键,然后我们计算匹配行的两个交易值之差:

The code below reads the two datasets, and performs an inner join using (ProductCode,Year) as row keys, then we compute the difference of the two trade values for matching rows:

%# read datasets (note we are reading ProductCode/Year as strings to preserve leading zeros)
dA = dataset('File','countryA.csv', 'Delimiter',',', 'Format','%s %s %f','ReadVarNames',true);
dB = dataset('File','countryB.csv', 'Delimiter',',', 'Format','%s %s %f','ReadVarNames',true);

%# inner join (keep only rows that exist in both datasets)
ds = join(dA, dB, 'keys',{'ProductCode' 'Year'}, 'type','inner', 'MergeKeys',true);

%# add a new variable for the difference
dsTradeDiff = dataset(ds.TradeValue_left - ds.TradeValue_right, 'VarNames','TradeDifference');
ds = cat(2, ds, dsTradeDiff);

结果数据集:

ds = 
    ProductCode     Year          TradeValue_left    TradeValue_right    TradeDifference
    '011111'        '1992'        5.934               24.5               -18.566        
    '011111'        '1999'         7.05                 25                -17.95        
    '033333'        '2005'         16.6              33.11                -16.51        
    '033333'        '2006'           55              44.92                 10.08   


编辑:这是仅使用基本的内置MATLAB函数来实现与上述相同的替代方法:


This is an alternative way to achieve the same as above, using only basic built-in MATLAB functions:

%# read countryA
fid = fopen('countryA.csv','rt');
C = textscan(fid, '%s %s %f', 'Delimiter',',', 'HeaderLines',1);
fclose(fid);
[prodCodeA yearA tradeValA] = deal(C{:});

%# read countryB
fid = fopen('countryB.csv','rt');
C = textscan(fid, '%s %s %f', 'Delimiter',',', 'HeaderLines',1);
fclose(fid);
[prodCodeB yearB tradeValB] = deal(C{:});

%# build rows merged-keys
keysA = strcat(prodCodeA,yearA);
keysB = strcat(prodCodeB,yearB);

%# match rows
[idx1 loc1] = ismember(keysA,keysB);
[idx2 loc2] = ismember(keysB,keysA);

%# compute result for intersection of rows
tradeDiff = tradeValA(loc2(idx2)) - tradeValB(loc1(idx1))

具有相同的结果:

tradeDiff =
      -18.566
       -17.95
       -16.51
        10.08

这篇关于在Matlab中对数据进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆