从数据库中计算最稳定的连续值的算法 [英] Algorithm for calculating most stable, consecutive values from a database

查看:20
本文介绍了从数据库中计算最稳定的连续值的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些问题,需要您的意见.

I have some questions and I'm in need of your input.

假设我有一个包含 2000-3000 行的数据库表,每行都有一个值和一些标识符.我需要提取具有最稳定值(最低价差)的约 100 个连续行.如果可以排除它们,则可以使用一些跳线值.你会怎么做?你会使用什么算法?

Say I have a database table filled with 2000-3000 rows and each row has a value and some identifiers. I am in need of withdrawing ~100 consecutive rows with the most stable values (lowest spread). It's okay with a few jumper values if you can exclude them. How would you do this and what algorithm would you use?

我目前正在为在 Oracle 上运行的数据库使用 SAS Enterprise Guide.我真的不太了解通用 SAS 语言,但我不知道我可以为此使用什么其他语言?一些脚本语言?我的编程知识有限,但这项任务似乎很简单,对吗?

I'm currently using SAS Enterprise Guide for my DB which runs on Oracle. I don't really know that much of the generic SAS language but I don't know what other language I could use for this? Some scripting language? I have limited programming knowledge but this task seems pretty easy, correct?

我一直在考虑的算法是:

The algorithms I've been thinking of is:

  1. 选择 100 个连续的行并计算标准差.将 select 语句增加 1 并再次计算标准偏差.循环穿过整个桌子.导出标准差最小的行

  1. Select 100 consecutive rows and calculate standard deviation. Increment select statement by 1 and calculate standard deviation again. Loop trough the whole table. Export the rows with the lowest standard deviation

同1,但计算方差而不是标准差(基本上是一样的).当整个表被循环后,再做一次,但从 avg 中排除具有最高值的 1 行.重复此过程,直到排除 5 个跳线并比较结果.与方法一相比的优缺点?

Same as 1, but calculate variance instead of standard deviation (basically the same thing). When the whole table has been looped, do it again but exclude 1 row which has the highest value from avg. Repeat process until 5 jumpers has been excluded and compare the results. Pros and cons compared to method 1?

问题:

  • 最佳 &最简单的方法?
  • 首选语言?在 SAS 中可能吗?
  • 您还有其他推荐的方法吗?

提前致谢

/尼克拉斯

推荐答案

下面的代码将满足您的要求.它只是使用一些样本数据,只计算 10 次观察(而不是 100 次).我会留给您根据需要进行调整.

The below code will do what you are asking. It is just using some sample data and only calcs it for 10 observations (rather than 100). I'll leave it to you to adapt as required.

创建一些示例数据.可用于所有 sas 安装:

Create some sample data. available to all sas installations:

data xx;
  set sashelp.stocks;
  where stock = 'IBM';
  obs = _n_;
run;

创建行号并按降序排序.更容易计算标准差:

Create row numbers and sort it descending. Makes it easier to calc stddev:

proc sort data=xx;
  by descending obs;
run;

使用一个数组来为每一行保留随后的 10 个 obs.使用数组计算每一行的标准差(最后 10 行除外.请记住,我们正在向后处理数据.

Use an array to keep the subsequent 10 obs for every row. Calculate the stddev for each row using the array (except for the last 10 rows. Remember we are working backwards through the data.

data calcs;
  set xx;

  array a[10] arr1-arr10;

  retain arr1-arr10 .;

  do tmp=10 to 2 by -1;
    a[tmp] = a[tmp-1];
  end;
  a[1] = close;

  if _n_ ge 10 then do;
    std = std(of arr1-arr10);
  end;

run;

找出哪个 obs(即行)的标准差计算值最低.将其保存到宏变量中.

Find which obs (ie. row) had the lowest stddev calc. Save it to a macro var.

proc sql noprint;
  select obs into :start_row
  from calcs
  having std = min(std)
  ;
quit;

从涉及计算最低标准差的样本数据中选择 10 个观察值.

Select the 10 observations from the sample data that were involved in calcing the lowest stddev.

proc sql noprint;
  create table final as
  select *
  from xx
  where obs between &start_row and %eval(&start_row+10)
  order by obs
  ;
quit;

这篇关于从数据库中计算最稳定的连续值的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆