SAS中的/此数据语句到底在做什么?相当于PostgreSQL? [英] What exactly is the/this data statement in SAS doing? PostgreSQL equivalent?

查看:131
本文介绍了SAS中的/此数据语句到底在做什么?相当于PostgreSQL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将SAS脚本转换为PostgreSQL环境的Python。在一些地方,我在SAS中找到了一条数据语句,它看起来像这样(在多个脚本中):

I'm converting a SAS script to Python for a PostgreSQL environment. In a few places I've found a data statement in SAS, which looks something like this (in multiple scripts):

data dups;
  set picc;
  by btn wtn resp_ji;
if not (first.resp_ji and last.resp_ji);
run;

很明显,在python或SQL环境中,数据集是不一样的,我在确定什么内容时遇到了麻烦此特定声明正在执行。需要明确的是,有许多正在转换的脚本以这种方式使用相同的名称创建了一个数据集。因此,我的期望是大部分这些内容将被一遍又一遍地覆盖。

Obviously datasets aren't the same in python or SQL environments, and I'm having trouble determining what this specific statement is doing. To be clear, there are a number of scripts being converted which create a dataset in this manner with this same name. So my expectation would be that most of these would be overwritten over and over.

我也不清楚与数据伪造语句中的条件对应的postgres是什么

I'm also unclear as to what the postgres equivalent to the condition in the data dups statement would be.

是否有明显的PostgreSQL语句可以代替它?像这样吗?:

Is there an obvious PostgreSQL statement that would work in its place? Something like this?:

CREATE TABLE dups AS
  SELECT btn, wtn, resp_ji
  WHERE /*some condition that matches the condition in the data statement*/

是否

by btn wtn respji;

语句意味着要复制哪些列,或者等于PostgreSQL中的ORDER BY子句?

statement mean which columns are copied over, or is that the equivalent of an ORDER BY clause in PostgreSQL?

谢谢。

推荐答案

该语句使用的是'by分组处理。在运行该步骤之前,它要求数据按 btn wtn resp_ji 排序。

The statement is using what's called 'by group processing'. Before the step can run, it requires that the data is sorted by btn wtn resp_ji.

first.resp_ji 件正在检查是否是第一次看到resp_ji的当前值当前的btn / wtn组合。同样, last.resp_ji 件正在检查是否是最后一次看到 resp_ji 的当前值了。当前的btn / wtn组合。

The first.resp_ji piece is checking to see if it's the first time it's seen the current value of resp_ji within the current btn/wtn combination. Likewise the last.resp_ji piece is checking if it's the final time that it will see the current value of resp_ji within the current btn/wtn combination.

将所有语句组合在一起:

Combining it all together the statement:

if not (first.resp_ji and last.resp_ji);

是说,如果resp_ji的当前值对于btn / wtn的当前组合出现多次,则保留记录,否则丢弃记录。像这样使用 if 语句时,其行为会隐式保留/丢弃记录。

Is saying, if the current value of resp_ji occurs multiple times for the current combination of btn/wtn then keep the record, otherwise discard the record. The behaviour of the if statement when used like that implicitly keeps/discards the record.

SQL,您可以执行以下操作:

To do the equivalent in SQL, you could do something like:


  1. 查找所有要丢弃的记录。

  2. 丢弃这些记录

所以...

create table rows_to_discard as 
select btn, wtn, resp_ji, count(*) as freq
from mytable
group by btn, wtn, resp_ji
having count(*) = 1

create table want as 
select a.*
from mytable a
left join rows_to_discard b  on b.btn = a.btn
                            and b.wtn = a.wtn
                            and b.resp_ji = a.resp_ji
where b.btn is null

编辑:我应该提到,没有简单的SQL等效项。可以通过对子查询中的行编号,然后在其之上构建逻辑来实现,但这很麻烦。这可能还取决于所使用的SQL的特定风格。

EDIT : I should mention that there is no simple SQL equivalent. It may be possible by numbering rows in subqueries, and then building logic on top of that but it'd be ugh-ly. It may also depend on the specific flavour of SQL being used.

这篇关于SAS中的/此数据语句到底在做什么?相当于PostgreSQL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆