在R中解析导出的SAS宏文件 [英] Parse in R a file of exported SAS macros

查看:80
本文介绍了在R中解析导出的SAS宏文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要解析的从SAS导出的宏文件,以便使用R和markdown构建文档(由于工作中的安全性限制,我无法使用现有的外部软件).

I have a file of macros exported from SAS that I want to parse in order to build documentation with R and markdown (I can't use existing external software due to security limitations at work).

特别是我要提取:

  • 宏的名称
  • 参数及其描述
  • 名为使用"和示例"的两个部分的内容
  • 宏函数的主体

不幸的是,尽管我认为规则并不那么复杂,但是缺乏正则表达式技能再次伤害了我.

Unfortunately my lack of regex skills is hurting me again though I don't think the rules are that complicated.

请参见下面的示例和预期的输出:

See my example below and expected output:

my_text <- "
%macro macro_name_1
/*----------------------------------------------------------------------------------
optional macro description on one or several lines.
this section always starts with slash star dashes and ends with dashes star slash
and it never contains these combinations of characters in the text                                                         
----------------------------------------------------------------------------------*/
(param1 /* optional description of param1 */
,param2
,param3 /* optional description of param3 */
);
/* USES: 
some info on one or several lines,
always starts with 'slash star USES:'
and ends with 'star slash'
but doesn't contain these combinations of characters
*/
/* EXAMPLES:
some examples on one or several lines,
always starts with 'slash star EXAMPLES:' OR 'slash star EXAMPLE:'
and ends with 'star slash'
but doesn't contain these combinations of characters
*/
some code on one or several lines,
always after USES and EXAMPLE(S) sections
that may or not contain combinations of /* and */
%mend;

some text outside of a macro-mend pattern, which I wish to ignore

%macro macro_name_2
/*---------------------
desc of macro_name_2                                                     
---------------------*/
(x
,y /* desc of y*/
);
/* USES: something */
/* EXAMPLE:
example for macro_name2
*/
code2
%mend;

some more irrelevant text

%macro macro_name_3;
code3
%mend;
"

输出不必与我在此处建议的相同,但至少应具有类似的结构(为便于阅读,缩写为文本):

The output doesn't have to be identical to what I propose here but should have at least a similar structure (text is abbreviated for readability) :

expected_output <- tibble::tribble(
  ~'macro_name',          ~'description',                   ~'parameters',        ~'uses',        ~'examples',        ~'code',
  "macro_name_1",    "optional macro...",  list(param1="optional desc...",
                                               param2="",
                                               param3="optional desc..."), "Some info...", "some examples...", "some code...",
  "macro_name_2", "desc of macro_name_2",       list(x="", y="desc of y"),    "something",   "example for...",        "code2",
  "macro_name_3",                     "",                          list(),             "",                 "",        "code3")


# # A tibble: 3 x 6
#     macro_name          description parameters         uses         examples         code
#          <chr>                <chr>     <list>        <chr>            <chr>        <chr>
# 1 macro_name_1    optional macro... <list [3]> Some info... some examples... some code...
# 2 macro_name_2 desc of macro_name_2 <list [2]>    something   example for...        code2
# 3 macro_name_3                      <list [0]>                                      code3

推荐答案

我相信这将使您非常接近您想要的东西.我没有花时间将其放入tibble,但我怀疑您可以弄清楚如何根据自己的喜好安排这些组件.

I believe this will get you very close to what you want. I haven't taken the time to put it into a tibble, but I suspect you can figure out how to arrange those components to your own preference.

它严重依赖于此答案.它仅使用stringr中的一个功能-str_extract_fixed确实很方便地组织参数及其说明.

It relies heavily on this answer. It only uses one function from stringr--str_extract_fixed is really convenient for getting the parameters and their descriptions organized.

library(stringr)

# Make one character string per macro
macro <- gsub("\\n", "   ", my_text)
macro <- regmatches(macro, gregexpr("(?=%macro).*?(?<=%mend)", macro, perl=TRUE))[[1]]


# MACRO NAME --------------------------------------------------------
macro_name <- 
  unlist(
    regmatches(macro, gregexpr("(?<=%macro ).*?(?= )", macro, perl = TRUE))
  )
macro_name <- 
  sub(";", "", macro_name)

# MACRO DESCRIPTION -------------------------------------------------

macro_desc <- 
  regmatches(macro, gregexpr("(?=/[*]).*?(?<=[*]/)", macro, perl = TRUE))

macro_desc <- vapply(macro_desc,
                     function(x) if (length(x)) x[1] else "",
                     character(1))
macro_desc <- gsub("(/[*][-]+|[-]+[*]/)", "", macro_desc)
macro_desc <- trimws(macro_desc)

# MACRO PARAMETERS --------------------------------------------------

param <- regmatches(macro, gregexpr("(?=\\().*?(?<=\\);)", macro, perl = TRUE))
param <- 
  vapply(param,
         function(x) if (length(x)) gsub("(\\(|\\)|;)", "", x) else "",
         character(1))
param <- strsplit(param, ",")

param <- 
  lapply(param,
         str_split_fixed,
         " ",
         n = 2)

param <- 
  lapply(param,
         function(x) trimws(gsub("(/[*]|[*]/)", "", x)))

# Clean out everything up to the end of the parameters
# This might be problematic if the combination of ');' appears
# before the end of the parameters definition

macro <- trimws(sub("^\\%macro.*?;", "", macro))

# MACRO USES --------------------------------------------------------

# Just in case there are multiple spaces between /* and USES, coerce it to 
# only one space.
macro <- sub("/[*] +USES", "/* USES", macro)
uses <- regmatches(macro, gregexpr("(?<=/[*] USES[:]).*?(?=[*]/)", macro, perl = TRUE))
uses <- vapply(uses,
               function(x) if (length(x)) trimws(x) else "",
               character(1))

# Clean out everything up to the end of the USES

macro <- trimws(sub(".+(?<=/[*] USES[:]).*?(?<=[*]/)", "", macro, perl = TRUE))

# MACRO EXAMPLES ----------------------------------------------------

# Just in case there are multiple spaces between /* and EXAMPLES, coerce it to 
# only one space.
macro <- sub("/[*] +EXAMPLE", "/* EXAMPLE", macro)
examples <- regmatches(macro, gregexpr("(?<=/[*] EXAMPLE).*?(?=[*]/)", macro, perl = TRUE))
examples <- vapply(examples,
                   function(x) if (length(x)) trimws(sub("^(S[:]|[:]) +", "", x)) else "",
                   character(1))

# Clean out everything up to the end of the EXAMPLES

macro <- trimws(sub(".+(?<=/[*] EXAMPLE).*?(?<=[*]/)", "", macro, perl = TRUE))

# MACRO BODY --------------------------------------------------------

# At this point, the body should be the only thing left in `macro`, except 
# for the `%mend` call

body <- trimws(sub("%mend(|.+)$", "", macro))

# RESULTS

macro_name # character vector
macro_desc # character vector
param      # list of two column matrices
uses       # character vector
examples   # character vector
body       # character vector

这篇关于在R中解析导出的SAS宏文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆