SAS中的/这个数据语句到底在做什么?PostgreSQL 等价物? [英] What exactly is the/this data statement in SAS doing? PostgreSQL equivalent?

查看:20
本文介绍了SAS中的/这个数据语句到底在做什么?PostgreSQL 等价物?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为 PostgreSQL 环境将 SAS 脚本转换为 Python.在一些地方,我在 SAS 中找到了一个数据语句,它看起来像这样(在多个脚本中):

I'm converting a SAS script to Python for a PostgreSQL environment. In a few places I've found a data statement in SAS, which looks something like this (in multiple scripts):

data dups;
  set picc;
  by btn wtn resp_ji;
if not (first.resp_ji and last.resp_ji);
run;

显然,python 或 SQL 环境中的数据集并不相同,我无法确定该特定语句在做什么.需要明确的是,有许多脚本正在转换,它们以这种方式创建具有相同名称的数据集.所以我的期望是其中大部分会被一遍又一遍地覆盖.

Obviously datasets aren't the same in python or SQL environments, and I'm having trouble determining what this specific statement is doing. To be clear, there are a number of scripts being converted which create a dataset in this manner with this same name. So my expectation would be that most of these would be overwritten over and over.

我也不清楚与 data dups 语句中的条件等效的 postgres 是什么.

I'm also unclear as to what the postgres equivalent to the condition in the data dups statement would be.

有一个明显的 PostgreSQL 语句可以代替它吗?像这样?:

Is there an obvious PostgreSQL statement that would work in its place? Something like this?:

CREATE TABLE dups AS
  SELECT btn, wtn, resp_ji
  WHERE /*some condition that matches the condition in the data statement*/

有没有

by btn wtn respji;

statement 是指复制了哪些列,还是相当于 PostgreSQL 中的 ORDER BY 子句?

statement mean which columns are copied over, or is that the equivalent of an ORDER BY clause in PostgreSQL?

谢谢.

推荐答案

该语句使用所谓的按组处理".在该步骤可以运行之前,它要求数据按 btn wtn resp_ji 排序.

The statement is using what's called 'by group processing'. Before the step can run, it requires that the data is sorted by btn wtn resp_ji.

first.resp_ji 块正在检查它是否是第一次在当前 btn/wtn 组合中看到 resp_ji 的当前值.同样,last.resp_ji 部分正在检查它是否是最后一次在当前 btn/wtn 组合中看到 resp_ji 的当前值.

The first.resp_ji piece is checking to see if it's the first time it's seen the current value of resp_ji within the current btn/wtn combination. Likewise the last.resp_ji piece is checking if it's the final time that it will see the current value of resp_ji within the current btn/wtn combination.

将它们组合在一起声明:

Combining it all together the statement:

if not (first.resp_ji and last.resp_ji);

就是说,如果当前的 btn/wtn 组合的 resp_ji 的当前值出现多次,则保留该记录,否则丢弃该记录.if 语句的行为在这样使用时会隐式保留/丢弃记录.

Is saying, if the current value of resp_ji occurs multiple times for the current combination of btn/wtn then keep the record, otherwise discard the record. The behaviour of the if statement when used like that implicitly keeps/discards the record.

要在 SQL 中执行等效操作,您可以执行以下操作:

To do the equivalent in SQL, you could do something like:

  1. 查找所有要丢弃的记录.
  2. 从原始数据集中丢弃这些记录.

所以...

create table rows_to_discard as 
select btn, wtn, resp_ji, count(*) as freq
from mytable
group by btn, wtn, resp_ji
having count(*) = 1

create table want as 
select a.*
from mytable a
left join rows_to_discard b  on b.btn = a.btn
                            and b.wtn = a.wtn
                            and b.resp_ji = a.resp_ji
where b.btn is null

EDIT:我应该提一下,没有简单的 SQL 等价物.通过对子查询中的行进行编号,然后在此之上构建逻辑,这可能是可能的,但它会很糟糕.它还可能取决于所使用的 SQL 的特定风格.

EDIT : I should mention that there is no simple SQL equivalent. It may be possible by numbering rows in subqueries, and then building logic on top of that but it'd be ugh-ly. It may also depend on the specific flavour of SQL being used.

这篇关于SAS中的/这个数据语句到底在做什么?PostgreSQL 等价物?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆