R 究竟如何解析右赋值运算符“->"? [英] How exactly does R parse `->`, the right-assignment operator?

查看：43 发布时间：2021/10/4 18:56:53 r yacc

本文介绍了R 究竟如何解析右赋值运算符“->"?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以这是一个微不足道的问题，但让我烦恼的是我无法回答它，也许答案会教会我更多关于 R 的工作原理的细节.

标题说明了一切:R 如何解析 ->，晦涩的右侧赋值函数?

我通常使用的技巧都失败了:

`->`

<块引用>

错误:对象 -> 未找到

getAnywhere("->")

<块引用>

未找到名为 -> 的对象

而且我们不能直接调用它:

`->`(3,x)

<块引用>

错误:找不到函数"->"

当然，它有效:

(3 -> x) #给名字x赋值3# [1] 3

看起来 R 知道如何简单地反转参数，但我认为上述方法肯定能解决这个问题:

pryr::ast(3 -> y)# \- ()# \- `<- #R 解释器清楚地翻转了一些东西# \- `y #(到它到达`ast`的时候，至少......)# \- 3 # (注意:这是因为`substitute(3 -> y)`# # 已经返回反向版本)

将此与常规赋值运算符进行比较:

`<-`.Primitive("<-")`<-`(x, 3) #如预期的那样，将值 3 分配给名称 x

?"->" 、?assignOps 和 R 语言定义都只是顺便提及它作为正确的赋值运算符.

但是，-> 的使用方式显然有其独特之处.它不是一个函数/运算符(就像对 getAnywhere 的调用和对 `->` 的直接调用似乎证明了这一点)，那么它是什么?它完全属于自己的一类吗?

除了-> 在 R 语言中它的解释和处理方式是完全独特的；记住并继续前进"之外，还有什么可以学习的吗?

解决方案

让我先说我对解析器的工作原理一无所知.话虽如此，gram.y的第296行定义了以下标记表示(YACC?)解析器 R 使用的赋值:

%token LEFT_ASSIGN EQ_ASSIGN RIGHT_ASSIGN LBB

然后，在 gram.5150 的第 5140 行到 5150 行，这看起来像对应的C代码:

case '-':如果 (nextchar('>')) {如果 (nextchar('>')) {yylval = install_and_save2("<<-", "->>");返回 RIGHT_ASSIGN;}别的 {yylval = install_and_save2("<-", "->");返回 RIGHT_ASSIGN;}}

最后，从 gram.c 的第 5044 行开始、install_and_save2的定义:

/* 获取一个 R 符号，并设置不同的 yytext.用于翻译 ->到 <-.->>到 <<- */静态 SEXP install_and_save2(char * text, char * savetext){strcpy(yytext, savetext);返回安装(文本)；}

<小时>

同样，在使用解析器的经验为零的情况下，似乎 -> 和 ->> 被直接翻译成 <- 和 <<-，分别位于解释过程中非常低的级别.

<小时>

您提出了一个很好的观点，询问解析器如何知道"将参数反转为 -> - 考虑到 -> 似乎已安装以 <- - 的形式进入 R 符号表，从而能够正确解释 x ->y 为 y <- x 和 not x <- y.当我继续遇到支持我的主张的证据"时，我能做的最好的事情就是提供进一步的推测.希望一些仁慈的 YACC 专家会偶然发现这个问题并提供一些见解；不过，我不会对此屏住呼吸.

返回 gram.y 的第 383 和 384 行<，这看起来像是与上述 LEFT_ASSIGN 和 RIGHT_ASSIGN 符号相关的更多解析逻辑:

<代码>|expr LEFT_ASSIGN expr { $$ = xxbinary($2,$1,$3);setId( $$, @$);}|expr RIGHT_ASSIGN expr { $$ = xxbinary($2,$3,$1);setId( $$, @$);}

尽管我无法真正理解这种疯狂的语法，但我确实注意到 xxbinary 的第二个和第三个参数被交换为 WRT LEFT_ASSIGN (xxbinary($2,$1,$3)) 和 RIGHT_ASSIGN (xxbinary($2,$3,$1)).

这是我在脑海中想象的:

LEFT_ASSIGN 场景:y <- x

$2 是上述表达式中解析器的第二个参数"，即 <-
$1 是第一个；即 y
$3 是第三个；x

因此，生成的 (C?) 调用将是 xxbinary(<-, y, x).

将此逻辑应用于RIGHT_ASSIGN，即x ->y，结合我之前关于 <- 和 -> 被交换的猜想，

$2 从 -> 转换为 <-
$1 是 x
$3 是 y

但是由于结果是 xxbinary($2,$3,$1) 而不是 xxbinary($2,$1,$3)，结果是 仍然em> xxbinary(<-, y, x).

<小时>
在此基础上进一步构建，我们在 gram.c 的第 3310 行:
static SEXP xxbinary(SEXP n1, SEXP n2, SEXP n3){SEXP ans;如果(生成代码)保护(ans = lang3(n1，n2，n3))；别的保护(ans = R_NilValue)；UNPROTECT_PTR(n2);UNPROTECT_PTR(n3);返回答案;}
不幸的是，我在 R 中找不到 lang3(或其变体 lang1、lang2 等)的正确定义源代码，但我假设它用于以与解释器同步的方式评估特殊函数(即符号).
<小时>
更新鉴于我对解析过程的(非常)有限的了解，我将尽力在评论中解决您的一些其他问题.
<块引用>
1) 这真的是 R 中唯一一个行为像这样的对象吗??(我有记住约翰钱伯斯在哈德利的书中引用的一句话:一切存在的是一个对象.发生的一切都是函数调用."这显然位于该域之外 - 还有其他类似的东西吗这个?
首先，我同意这不属于该域.我相信钱伯斯的引述与 R 环境有关，即在此低级解析阶段之后发生的所有过程.不过，我将在下面多谈一点.无论如何，我能找到的此类行为的唯一其他示例是 ** 运算符，它是更常见的幂运算符 ^ 的同义词.与正确赋值一样，** 似乎没有被解释器识别"为函数调用等......:
R>`->`#错误:对象'->'未找到R＞`**`#错误:找不到对象**"
我发现这个是因为这是 install_and_save2 由 C 解析器使用:
case '*':/* 用 ^ 替换 **.这是自 1998 年以来一直在这里，但无证(至少在明显的地方).它在蓝皮书的索引，参考 p.第431话已弃用"的帮助.S-PLUS 6.2 仍然允许这样做，所以大概是为了与 S 兼容. */如果 (nextchar('*')) {yylval = install_and_save2("^", "**");返回'^';} 别的yylval = install_and_save("*");返回 c;
<小时><块引用>
2) 这究竟是什么时候发生的?我想到了替代品(3-> y) 已经翻转了表达式；我无法从消息来源中弄清楚什么替代品会影响 YACC...
当然我仍然在这里推测，但是是的，我认为我们可以安全地假设当您调用 substitute(3 -> y) 时，从替代函数，表达式总是y <- 3;例如该函数完全不知道您键入了 3 ->y.do_substitute 与 R 使用的 99% 的 C 函数一样，仅处理 SEXP 参数 - 在 3 的情况下为 EXPRSXP ->y (== y <- 3)，我相信.这就是我在上面对 R 环境和解析过程进行区分时所暗示的.我认为没有什么可以特别触发解析器开始行动 - 而是一切您输入到解释器中的内容都会被解析.我昨晚一点阅读了有关 YACC/Bison 解析器 生成器 的更多信息，据我所知(也就是不要在这个问题上押注农场)，Bison 使用了您定义的语法(在 .y 文件中)以生成 C 中的解析器 - 即执行实际输入解析的 C 函数.反过来，您在 R 会话中输入的所有内容首先由这个 C 解析函数处理，然后委托在 R 环境中采取适当的操作(顺便说一下，我非常松散地使用这个术语).在这个阶段，lhs ->rhs 将被翻译成 rhs <- lhs，** 到 ^ 等等......例如，这是摘自 names.c 中的原始函数表之一:
/* 语言相关结构 *//* 原语 */{"if", do_if, 0, 200, -1, {PP_IF, PREC_FN, 1}},{"while", do_while, 0, 100, 2, {PP_WHILE, PREC_FN, 0}},{"for", do_for, 0, 100, 3, {PP_FOR, PREC_FN, 0}},{"repeat", do_repeat, 0, 100, 1, {PP_REPEAT, PREC_FN, 0}},{"break", do_break, CTXT_BREAK, 0, 0, {PP_BREAK, PREC_FN, 0}},{"next", do_break, CTXT_NEXT, 0, 0, {PP_NEXT, PREC_FN, 0}},{"return", do_return, 0, 0, -1, {PP_RETURN, PREC_FN, 0}},{"函数", do_function, 0, 0, -1, {PP_FUNCTION,PREC_FN, 0}},{"<-", do_set, 1, 100, -1, {PP_ASSIGN, PREC_LEFT, 1}},{"=", do_set, 3, 100, -1, {PP_ASSIGN, PREC_EQ, 1}},{"<<-", do_set, 2, 100, -1, {PP_ASSIGN2, PREC_LEFT, 1}},{"{", do_begin, 0, 200, -1, {PP_CURLY, PREC_FN, 0}},{"(", do_paren, 0, 1, 1, {PP_PAREN, PREC_FN, 0}},
你会注意到->、->>和**在这里没有定义.据我所知，诸如 <- 和 [ 等 R 原始表达式是 R 环境与任何底层 C 代码最接近的交互.我的建议是，在这个过程中(从你在解释器中输入一组字符并点击Enter"，直到对有效 R 表达式的实际评估)，解析器已经发挥了它的魔力，这就是为什么你不能像通常那样用反引号包围 -> 或 ** 的函数定义.
So this is kind of a trivial question, but it's bugging me that I can't answer it, and perhaps the answer will teach me some more details about how R works.

The title says it all: how does R parse ->, the obscure right-side assignment function?

My usual tricks to dive into this failed:
`->`

Error: object -> not found

getAnywhere("->")

no object named -> was found

And we can't call it directly:
`->`(3,x)

Error: could not find function "->"

But of course, it works:
(3 -> x) #assigns the value 3 to the name x # [1] 3
It appears R knows how to simply reverse the arguments, but I thought the above approaches would surely have cracked the case:
pryr::ast(3 -> y) # \- () # \- `<- #R interpreter clearly flipped things around # \- `y # (by the time it gets to `ast`, at least...) # \- 3 # (note: this is because `substitute(3 -> y)` # # already returns the reversed version)
Compare this to the regular assignment operator:
`<-` .Primitive("<-") `<-`(x, 3) #assigns the value 3 to the name x, as expected
?"->" , ?assignOps, and the R Language Definition all simply mention it in passing as the right assignment operator.

But there's clearly something unique about how -> is used. It's not a function/operator (as the calls to getAnywhere and directly to `->` seem to demonstrate), so what is it? Is it completely in a class of its own?

Is there anything to learn from this besides "-> is completely unique within the R language in how it's interpreted and handled; memorize and move on"?
解决方案
Let me preface this by saying I know absolutely nothing about how parsers work. Having said that, line 296 of gram.y defines the following tokens to represent assignment in the (YACC?) parser R uses:
%token LEFT_ASSIGN EQ_ASSIGN RIGHT_ASSIGN LBB
Then, on lines 5140 through 5150 of gram.c, this looks like the corresponding C code:
case '-': if (nextchar('>')) { if (nextchar('>')) { yylval = install_and_save2("<<-", "->>"); return RIGHT_ASSIGN; } else { yylval = install_and_save2("<-", "->"); return RIGHT_ASSIGN; } }
Finally, starting on line 5044 of gram.c, the definition of install_and_save2:
/* Get an R symbol, and set different yytext. Used for translation of -> to <-. ->> to <<- */ static SEXP install_and_save2(char * text, char * savetext) { strcpy(yytext, savetext); return install(text); }

So again, having zero experience working with parsers, it seems that -> and ->> are translated directly into <- and <<-, respectively, at a very low level in the interpretation process.

You brought up a very good point in asking how the parser "knows" to reverse the arguments to -> - considering that -> appears to be installed into the R symbol table as <- - and thus be able to correctly interpret x -> y as y <- x and not x <- y. The best I can do is provide further speculation as I continue to come across "evidence" to support my claims. Hopefully some merciful YACC expert will stumble on this question and provide a little insight; I'm not going to hold my breath on that, though.

Back to lines 383 and 384 of gram.y, this looks like some more parsing logic related to the aforementioned LEFT_ASSIGN and RIGHT_ASSIGN symbols:
| expr LEFT_ASSIGN expr { $$ = xxbinary($2,$1,$3); setId( $$, @$); } | expr RIGHT_ASSIGN expr { $$ = xxbinary($2,$3,$1); setId( $$, @$); }
Although I can't really make heads or tails of this crazy syntax, I did notice that the second and third arguments to xxbinary are swapped to WRT LEFT_ASSIGN (xxbinary($2,$1,$3)) and RIGHT_ASSIGN (xxbinary($2,$3,$1)).

Here's what I'm picturing in my head:

LEFT_ASSIGN Scenario: y <- x

$2 is the second "argument" to the parser in the above expression, i.e. <-

$1 is the first; namely y

$3 is the third; x

Therefore, the resulting (C?) call would be xxbinary(<-, y, x).

Applying this logic to RIGHT_ASSIGN, i.e. x -> y, combined with my earlier conjecture about <- and -> getting swapped,

$2 gets translated from -> to <-

$1 is x

$3 is y

But since the result is xxbinary($2,$3,$1) instead of xxbinary($2,$1,$3), the result is still xxbinary(<-, y, x).

Building off of this a little further, we have the definition of xxbinary on line 3310 of gram.c:
static SEXP xxbinary(SEXP n1, SEXP n2, SEXP n3) { SEXP ans; if (GenerateCode) PROTECT(ans = lang3(n1, n2, n3)); else PROTECT(ans = R_NilValue); UNPROTECT_PTR(n2); UNPROTECT_PTR(n3); return ans; }
Unfortunately I could not find a proper definition of lang3 (or its variants lang1, lang2, etc...) in the R source code, but I'm assuming that it is used for evaluating special functions (i.e. symbols) in a way that is synchronized with the interpreter.

Updates I'll try to address some of your additional questions in the comments as best I can given my (very) limited knowledge of the parsing process.

1) Is this really the only object in R that behaves like this?? (I've got in mind the John Chambers quote via Hadley's book: "Everything that exists is an object. Everything that happens is a function call." This clearly lies outside that domain -- is there anything else like this?

First, I agree that this lies outside of that domain. I believe Chambers' quote concerns the R Environment, i.e. processes that are all taking place after this low level parsing phase. I'll touch on this a little bit more below, however. Anyways, the only other example of this sort of behavior I could find is the ** operator, which is a synonym for the more common exponentiation operator ^. As with right assignment, ** doesn't seem to be "recognized" as a function call, etc... by the interpreter:
R> `->` #Error: object '->' not found R> `**` #Error: object '**' not found
I found this because it's the only other case where install_and_save2 is used by the C parser:
case '*': /* Replace ** by ^. This has been here since 1998, but is undocumented (at least in the obvious places). It is in the index of the Blue Book with a reference to p. 431, the help for 'Deprecated'. S-PLUS 6.2 still allowed this, so presumably it was for compatibility with S. */ if (nextchar('*')) { yylval = install_and_save2("^", "**"); return '^'; } else yylval = install_and_save("*"); return c;

2) When exactly does this happen? I've got in mind that substitute(3 -> y) has already flipped the expression; I couldn't figure out from the source what substitute does that would have pinged the YACC...

Of course I'm still speculating here, but yes, I think we can safely assume that when you call substitute(3 -> y), from the perspective of the substitute function, the expression always was y <- 3; e.g. the function is completely unaware that you typed 3 -> y. do_substitute, like 99% of the C functions used by R, only handles SEXP arguments - an EXPRSXP in the case of 3 -> y (== y <- 3), I believe. This is what I was alluding to above when I made a distinction between the R Environment and the parsing process. I don't think there is anything that specifically triggers the parser to spring into action - but rather everything you input into the interpreter gets parsed. I did a little more reading about the YACC / Bison parser generator last night, and as I understand it (a.k.a. don't bet the farm on this), Bison uses the grammar you define (in the .y file(s)) to generate a parser in C - i.e. a C function which does the actual parsing of input. In turn, everything you input in an R session is first processed by this C parsing function, which then delegates the appropriate action to be taken in the R Environment (I'm using this term very loosely by the way). During this phase, lhs -> rhs will get translated to rhs <- lhs, ** to ^, etc... For example, this is an excerpt from one of the tables of primitive functions in names.c:
/* Language Related Constructs */ /* Primitives */ {"if", do_if, 0, 200, -1, {PP_IF, PREC_FN, 1}}, {"while", do_while, 0, 100, 2, {PP_WHILE, PREC_FN, 0}}, {"for", do_for, 0, 100, 3, {PP_FOR, PREC_FN, 0}}, {"repeat", do_repeat, 0, 100, 1, {PP_REPEAT, PREC_FN, 0}}, {"break", do_break, CTXT_BREAK, 0, 0, {PP_BREAK, PREC_FN, 0}}, {"next", do_break, CTXT_NEXT, 0, 0, {PP_NEXT, PREC_FN, 0}}, {"return", do_return, 0, 0, -1, {PP_RETURN, PREC_FN, 0}}, {"function", do_function, 0, 0, -1, {PP_FUNCTION,PREC_FN, 0}}, {"<-", do_set, 1, 100, -1, {PP_ASSIGN, PREC_LEFT, 1}}, {"=", do_set, 3, 100, -1, {PP_ASSIGN, PREC_EQ, 1}}, {"<<-", do_set, 2, 100, -1, {PP_ASSIGN2, PREC_LEFT, 1}}, {"{", do_begin, 0, 200, -1, {PP_CURLY, PREC_FN, 0}}, {"(", do_paren, 0, 1, 1, {PP_PAREN, PREC_FN, 0}},
You will notice that ->, ->>, and ** are not defined here. As far as I know, R primitive expressions such as <- and [, etc... are the closest interaction the R Environment ever has with any underlying C code. What I am suggesting is that by this stage in process (from you typing a set characters into the interpreter and hitting 'Enter', up through the actual evaluation of a valid R expression), the parser has already worked its magic, which is why you can't get a function definition for -> or ** by surrounding them with backticks, as you typically can.

这篇关于R 究竟如何解析右赋值运算符“->"?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R 究竟如何解析右赋值运算符“->"? [英] How exactly does R parse `->`, the right-assignment operator?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R 究竟如何解析右赋值运算符“->"? [英] How exactly does R parse `-&gt;`, the right-assignment operator?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

R 究竟如何解析右赋值运算符“->"? [英] How exactly does R parse `->`, the right-assignment operator?

登录关闭