请 SAS PRX 提取子字符串 [英] SAS PRX to extract substring please

查看:13
本文介绍了请 SAS PRX 提取子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 SAS PRX 函数从我的数据集中提取子字符串.但它只返回完全匹配,而我需要它更灵活并提取匹配各种条件的那些.

I am trying to use the SAS PRX function to extract a substring from my dataset. But it only returns the exact matches, whereas I need it to be more flexible and extract those that match a variety of conditions.

我在下面复制了我的数据.如您所见,我的数据中的一个变量是brandmodel",其中包含特定相机的品牌名称和型号.我需要为模型#s 设置一个单独的列.所以我使用 PRX 函数来提取它们,因为它们通常遵循以下模式之一:

I have copied my data below. As you can see, one of the variables in my data is "brandmodel" which contains both the brand name and the model# of a particular camera. I need to have a separate column just for the model#s. So I am using the PRX function to extract them as they usually follow one of the following patterns:

例如:JX100 或 JX10 或 JX1(即 1-2 个字母,后跟 1-3 位数字.我的程序(复制到数据下方)可以处理.但我遇到的问题是:如何提取那些模型#的字母与数字用空格或连字符分隔,我如何将它们提取到与它们完全相同的模型"列中?另外,一些观察没有模型#s,我怎样才能将它们设置为丢失而不是完全丢弃?

For example: JX100 or JX10 or JX1 (i.e., 1-2 alphabets, followed immediately by 1-3 digits. This my program (copied below the data) can handle. But where I run into problems is: how to extract those model#'s where the alphabets are separated from the digits by a space or a hyphen, and how do I extract those into the same column "Model" as those with them altogether? Also, some of the observations do not have model#s, how can I get them to be set to missing instead of being dropped altogether?

Brandmodel|Price

iTwist F124 Digital Camera -red|49.00
Vivitar IF045 Digital Camera -Blue|72.83
Liquid Image Underwater Camera Mask|128.00
Impact Series Video Camera MX Gogglesâ„¢|188.00
Olympus VR 340  Silver|148.00
Olympus TG820 Digital Camera Black|278.00
Olympus VR 340 16MP 10x 3.0 LCD Red|148.00
Vivitar VX137-Pur Digital Camera|39.00

Olympus SZ-12 Digital Camera -Black|198.00
Olympus VG160 Digital Camera Red|98.00
Olympus VR340   Purple|148.00
Olympus TG820 Digital Camera Silver|298.00
Olympus TG820 Digital Camera Blue|278.00
Olympus VG160 Digital Camera    Orange|98.00
Olympus TG820 Digital Camera Red|298.00
Fujifilm FinePix AX500 Red|78.63
Canon A2300 Silver|98.63
Canon A810 Red|75.00
Nikon Coolpix S2600 Digital Camera - Red|88.00
Nikon Coolpix L25 Digital Camera - Silver|82.00
Casio Exilim ZS10BK|128.00

Olympus TG-310 14 MP blue Digital Camera|148.00
Hipstreet Kidz Digital Camera - Blue|14.93
Casio Exilim ZS10PK|128.00
Olympus TG-310 14 MP Digital Camera orange|148.00

SAS 计划

data walnov21p2; 
 length brandmodel $ 80;
 infile "G:File2datastore_nov21storenv21p2.csv" firstobs=2 dlm="|" dsd;
 input brandmodel price;
 re= prxparse('/[[:alpha:]]{1,3} d{1,4}/');
 if prxmatch(re, brandmodel) then
 do;
   model=prxposn(re, 0, brandmodel);
   output;
 end;
run;

推荐答案

对于最后一个问题(将变量设置为缺失而不是丢弃观察,请从条件 do 中删除 output 语句 在最后.只需将其更改为:

For your very last question (set variable to missing rather than dropping observation, remove the output statement from the conditional do at the end. Just change it to:

if prxmatch(re, brandmodel) then model=prxposn(re, 0, brandmodel);

这将导致输出所有观察值,无论是否定义了模型.

This will cause all observations to be output, regardless of whether model is defined.

对于剩下的问题,它实际上是关于与 Perl 正则表达式的模式匹配,而不是特定于 SAS.这也很棘手,因为某些模型中有空格.尝试发布一个不同的问题,询问与您想要的匹配的 Perl 正则表达式(带有这些标签).

For the rest of your question, it is really about pattern matching with Perl regular expressions, and is not specific to SAS. It's also a tricky because some models have spaces in them. Try posting a different question asking about the Perl regular expression (with those tags) that would match what you want.

另外,发布一些您希望输出的示例.例如,您对这样的输入有何期望:

Also, post some examples of what you want the output to be. For example, what do you expect for input like this:

Olympus VR 340 16MP 10x 3.0 LCD Red|148.00 
Vivitar VX137-Pur Digital Camera|39.00

这篇关于请 SAS PRX 提取子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆