Rscript 在带有单引号和双引号的 Windows 上表现不一致 [英] Rscript behaves inconsistently on windows with single and double quotes

查看:24
本文介绍了Rscript 在带有单引号和双引号的 Windows 上表现不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我调用

Rscript -e "print('hello')";

正确打印出答案

[1] "你好";

但是,如果我切换单引号和双引号,则不起作用,并且看起来双引号被删除了:

Rscript -e 'print("hello")'

给出:

打印错误(hello):找不到对象hello"执行停止

请注意,不是 powershell 错误地进行了转义.回声只给出预期的结果:

PS>echo '打印(你好")'打印(你好")PS>echo "print('hello')";打印('你好')

并且在 macO 或 Linux 上没有观察到相同的行为,其中两种变体都被正确解析.

有趣的是,command.com 更疯狂:

C:>Rscript -e "print('hello')";[1] 你好"C:>Rscript -e 'print(你好")'[1]打印(你好)";

我的意思是……什么?!?这已经在这里提到了:

单行代码从 Windows 命令行运行 R 代码

但是没有解释.在我看来,这是 Windows 上 Rscript 的一个错误,但我想听听其他意见.

解决方案

Dabombber 的有用回答 提供了所有的指针,但是让我试着从概念上把它归结起来:

该问题并非特定于 RScript.exe,并可能影响从 PowerShell 对任何外部可执行文件的调用:

至少为 PowerShell 7.1(在撰写本文时为最新版本),使用内嵌双引号 (") 传递参数外部程序从根本上被破坏,详见GitHub 问题 #1995;简而言之:在幕后,PowerShell 为目标程序(进程)构造了一个命令行,该命令行仅使用 "..."-quoting,但忽略了 escape嵌入 verbatim " 字符.因为它们在语法上有效地包含在这样的双引号字符串中;v7.2 中可能会进行修复 - 请参阅此答案.

  • 现在,您必须手动转义嵌入的 " 字符.如\".

  • 但是,如果错误得到修复,此变通方法将中断,因为修复要求此转义自动应用,然后会转义一个逐字的 \" 作为 \\\".

# WORKAROUND 从 v7.0 开始,如果问题得到解决,它将中断.PS>Rscript -e '打印(\你好\")'

第三方Native 模块(例如使用 Install-Module -Scope CurrentUser Native 安装)提供 helper 函数 ie,它补偿损坏的行为;它是以向前兼容的方式编写的,因此它会在应该修复时简单地遵循内置行为:

# 感谢`ie`,不需要变通方法.PS>即 Rscript -e 'print(你好")'

至于 ad hoc 解决方法 - 它们都适用于 Rscript.exe,但不能期望成为 一般解决方案:

  • 对于同时支持 '...'..." 引用的目标程序:交换引号以仅使用嵌入的 ' 字符.,如您的问题所示,但请注意 '...'"..." 字符串在PowerShell(..." 字符串是可扩展(插值)字符串),并且在目标程序中也可能有不同的语义(在 Rscript 中不是这种情况):

    • Rscript -e "print('hello')"
  • 对于通过标准输入接受输入的目标程序,使用 PowerShell 管道,其中的错误不会出现(但请注意,您可能必须设置$OutputEncoding 首选项变量为目标程序期望的字符编码):

    • 'print(你好")' |Rscript -

至于你的观察和背景信息,包括cmd.exe和兼容POSIX的shell:

<块引用>

请注意,这不是 powershell 错误地进行转义.

正如 Dabombber 指出的那样, PowerShell 是问题所在,但问题仅在调用 外部程序 时出现,而 echo 是PowerShell-native
的内置别名Write-Output cmdlet(用 Get-Command echo 验证).在 Windows 上,您可以通过调用 choice.exe(忽略 [Y,N]?N 后缀)来查看有缺陷的参数传递的问题,如下所示:>

PS>'n' |选择/m '打印(你好")'打印(你好)[Y,N]?N

choice.exe/m 可以用来回显一个参数,因为它会被外部程序接收到,正如你所看到的,双引号是有效的 丢失,因为 PowerShell 错误地将 print("hello") verbatim 放在进程命令行上 - 没有转义" 字符.- 哪些外部程序逐字解析 print(hello),因为它们允许单个参数由未加引号和双引号部分组成 (print( + hello) (去掉语法双引号) + )).

  • 如果逐字 print(hello) 被解释为 R 脚本,它会寻找一个名为 hello变量(对象)——在此场景不存在并触发您看到的错误消息.

在类 Unix 平台(macOS、Linux)上,使用跨平台 PowerShell [Core] edition, /bin/echo 'print("hello")' 显示同样的问题.

<块引用>

并且在 macO 或 Linux 上没有观察到相同的行为,其中两种变体都被正确解析.

是的,如果您在那里使用 本机、POSIX 兼容的 shell,例如 bash,您将获得正确的行为(见下文).

<块引用>

对于 command.com 来说更疯狂:

顺便说一句:您可能指的是 cmd.exe,基于 NT 的 Windows 平台上的传统命令处理器(命令提示符),直到当前的 Windows 10.(command.com 是灭绝的命令处理器以 Windows ME 结尾的基于 DOS 的 Windows 版本).

cmd.exe 只识别-引用(...")来划分本身的参数边界,而不是单引号('...');无论如何,它本质上将原始引用通过传递给目标可执行文件(在执行自己对命令行的解释之后,例如环境变量扩展).

这与 PowerShell 和兼容 POSIX 的 shell 所做的有根本的不同:​​

  • 在类 Unix 平台上 - POSIX 兼容的 shell 识别 '...' 引用的参数 - process 命令行的概念没有't 存在,并且类似 POSIX 的 shell 本身从其命令行中解析出的任何参数都会原样 - 作为逐字参数的 array - 传递给目标可执行文件;因此,shell 字符串文字 "print('hello')"'print("hello")' 作为 verbatim print('hello')print("hello") 分别按预期工作,因为 R 也能识别 '...'"..." 字符串文字.

  • PowerShell 也有 '...' 字符串(它逐字处理它们的内容),但在 Windows 上它将它们翻译成 ..."; 字符串 在幕后(如果需要引用),从 v7.0 开始,上述错误可能会出现在这里.撇开 bug 不谈,这种翻译是有道理的,因为只有 "..." 引用可以被假定为对其他程序具有命令行的语法意义(见底部部分).不幸的是,PowerShell 在类 Unix 平台上做同样的事情,即使它不应该(它构造一个 pseudo 命令行,然后 .NET API 将其解析为传递给目标的逐字参数数组过程),所以 bug 也出现在那里.

因为 cmd.exe 保留原始引用,RScript 在命令行中解释 'print("hello")' Rscript -e 'print("hello")' 作为字符串文字而不是作为命令,因为它删除了任何" 字符.命令行首先使用语法函数(而'(单引号)按照惯例没有具有语法意义在命令行上),before 结果被解释为 R 脚本:

  • 'print("hello")' 因此被解析为 'print( + hello (命令行" 被剥离) + ),导致逐字的 'print(hello)' 被解释为 R 代码,这是一个 R string literal 因此按原样打印(输出使用 "..." 引用,但这只是输出格式的人工制品;注意对 print() 不是必需的,表达式的结果 - 例如字符串文字 'print(hello)' 在这种情况下 - 自动打印).

  • 相比之下,"print('hello')" 被逐字解析为 print('hello')(命令行 " 被删除),由于没有封闭引号,它被解释为一个 command,即一个 print() 函数调用,如预期的那样.


最终,在 Windows 上的进程命令行解析的无政府世界中没有硬性规定:最终由每个程序来解释其命令行 - 这个答案 包含很好的背景信息.

幸运的是,有广泛遵守的约定,如在 MS C/C++/.NET 编译器和 此处记录.>

不幸的是,从 PowerShell 7.0 开始,由于上述错误,PowerShell 不遵守这些约定.由于该错误自 v1 以来就存在,因此用户已经学会了如何解决它,例如使用 manual \"-escaping,如上所示.问题是修复错误会破坏所有变通方法.将修复实施为实验性功能 现在正在考虑中,最早适用于 v7.1 - 请参阅 GitHub 上的此 PR 和相关的讨论这里,这表明,除了实施广泛建立的约定,调整对批处理文件msiexec.exe 样式程序的调用,这些程序具有非常规的引用要求.

If I invoke

Rscript -e "print('hello')"

It correctly prints out the answer

[1] "hello"

However, if I switch single and double quotes, it does not work, and it looks like the double quotes are removed:

Rscript -e 'print("hello")'

gives:

Error in print(hello) : object 'hello' not found
Execution halted

Note that it's not powershell doing the escaping incorrectly. Echoing only gives the expected results:

PS> echo 'print("hello")'
print("hello")
PS> echo "print('hello')"
print('hello')

And the same behavior is not observed on macOs or Linux, where both variants are correctly parsed.

Interestingly enough, it's even crazier for command.com:

C:>Rscript -e "print('hello')"
[1] "hello"

C:>Rscript -e 'print("hello")'
[1] "print(hello)"

I mean... what?!? This has already been mentioned here:

Single line code to run R code from Windows command line

but there's no explanation about it. In my opinion it's a bug of Rscript on windows, but I want to hear other opinions.

解决方案

Dabombber's helpful answer provides all the pointers, but let me try to boil it down conceptually:

The problem is not specific to RScript.exe and potentially affects calls to any external executable from PowerShell:

Up to at least PowerShell 7.1 (current as of this writing), passing arguments with embedded double quotes (") to external programs is fundamentally broken, as detailed in GitHub issue #1995; in short: behind the scenes, PowerShell constructs a command line for the target program (process) that uses "..."-quoting only, but neglects to escape embedded verbatim " chars. for their syntactically valid inclusion in such double-quoted strings; a fix may be coming in v7.2 - see this answer.

  • For now, you have to manually escape embedded " chars. as \".

  • However, if and when the bug gets fixed, this workaround will break, because the fix requires that this escaping be applied automatically, which would then escape a verbatim \" as \\\".

# WORKAROUND as of v7.0, which will break if and when the problem gets fixed.
PS> Rscript -e 'print(\"hello\")'

The third-party Native module (install with Install-Module -Scope CurrentUser Native, for instance) offers helper function ie, which compensates for the broken behavior; it is written in a forward-compatible manner so that it will simply defer to the built-in behavior if and when it should get fixed:

# Thanks to `ie`, no workarounds are required.
PS> ie Rscript -e 'print("hello")'

As for ad hoc workarounds - both of them work for Rscript.exe, but can't be expected to be a general solution:

  • For target programs that support both '...' and "..." quoting: Swap the quotes to use only embedded ' chars., as shown in your question, but note that '...' and "..." strings have different semantics in PowerShell ("..." strings are expandable (interpolating) strings), and may have different semantics in the target program too (not the case in Rscript):

    • Rscript -e "print('hello')"
  • For target programs that accept input via stdin, use the PowerShell pipeline, where the bug doesn't surface (though note that you may have to set the $OutputEncoding preference variable to the character encoding expected by the target program):

    • 'print("hello")' | Rscript -

As for your observations and background information, including about cmd.exe and POSIX-compatible shells:

Note that it's not powershell doing the escaping incorrectly.

As Dabombber points out, it is PowerShell that is the problem, but the problem only occurs when calling external programs, whereas echo is a built-in alias for the PowerShell-native
Write-Output cmdlet (verify with Get-Command echo). On Windows, you could see the problem with the flawed parameter passing as follows, by invoking choice.exe (ignore the [Y,N]?N suffix):

PS> 'n' | choice /m 'print("hello")'
print(hello) [Y,N]?N

choice.exe with /m can be used to echo an argument as it would be received by external programs, and as you can see the double quotes were effectively lost, because PowerShell mistakenly placed print("hello") verbatim on the process command line - without escaping the " chars. - which external programs parse as verbatim print(hello), because they allow a single argument to be composed of unquoted and double-quoted parts (print( + hello (stripped of the syntactic double quotes) + )).

  • If verbatim print(hello) is interpreted as an R script, it looks for a variable (object) named hello - which in this scenario doesn't exist and triggers the error message you saw.

On Unix-like platforms (macOS, Linux), using the cross-platform PowerShell [Core] edition, /bin/echo 'print("hello")' shows the same problem.

And the same behavior is not observed on macOs or Linux, where both variants are correctly parsed.

Yes, if you use a native, POSIX-compatible shell there, such as bash, you'll get the correct behavior (see below).

it's even crazier for command.com:

As an aside: You probably meant cmd.exe, the legacy command processor (Command Prompt) on NT-based Windows platforms up to the current Windows 10. (command.com was the command processor on the extinct DOS-based Windows versions that ended with Windows ME).

cmd.exe only recognizes double-quoting ("...") to demarcate argument boundaries for itself, not also single-quoting ('...'); irrespective of that, it essentially passes the original quoting through to the target executable (after performing its own interpretation of the command line, such as environment-variable expansion).

This differs fundamentally from what PowerShell and POSIX-compatible shells do:

  • On Unix-like platforms - where POSIX-compatible shells recognize '...'-quoted arguments - the concept of a process command line doesn't exist, and whatever arguments a POSIX-like shell has itself parsed out of its command line are passed as-is - as an array of verbatim arguments - to the target executable; thus shell string literals "print('hello')" and 'print("hello")' are passed as verbatim print('hello') and print("hello"), respectively, which works as expected, given that R too recognizes both '...' and "..." string literals.

  • PowerShell too has '...' strings (it treats their content verbatim), but on Windows it translates them to "..." strings behind the scenes (if quoting is needed), which is where the aforementioned bug can surface as of v7.0. The bug aside, this translation makes sense, because only "..." quoting can be assumed to have syntactic meaning on the command line for other programs (see bottom section). Unfortunately, PowerShell does the same thing on Unix-like platforms, even though it shouldn't (it constructs a pseudo command line that the .NET API then parses into an array of verbatim arguments passed to the target process), so the bug surfaces there as well.

Because cmd.exe preserves the original quoting, RScript interprets 'print("hello")' in command line Rscript -e 'print("hello")' as a string literal rather than as a command, because it removes any " chars. with syntactic function on the command line first (whereas ' (single quotes) by convention do not have syntactic meaning on the command line), before the result is interpreted as an R script:

  • 'print("hello")' is therefore parsed as 'print( + hello (the command-line " are stripped) + ), resulting in verbatim 'print(hello)' getting interpreted as R code, which is an R string literal that therefore prints as-is (the output uses "..." quoting, but that's just an artifact of output formatting; note that an explicit call to print() isn't necessary, the result of an expression - such as string literal 'print(hello)' in this case - is automatically printed).

  • By contrast, "print('hello')" is parsed as verbatim print('hello') (the command-line " are stripped), which - due to the absence of enclosing quoting - is then interpreted as a command, namely a print() function call, as intended.


Ultimately, there are no hard and fast rules in the anarchic world of process command-line parsing on Windows: it is ultimately up to each program to interpret its command line - this answer contains excellent background information.

Fortunately, however, there are widely adhered-to conventions, as implemented in the MS C/C++/.NET compilers and documented here.

Unfortunately, as of PowerShell 7.0, PowerShell doesn't adhere to these conventions, due to the aforementioned bug. Since the bug has been around since v1, users have learned to work around it, such as with manual \"-escaping, as shown above. The problem is that fixing the bug would break all workarounds. Implementing a fix as an experimental feature is now being considered, for v7.1 at the earliest - see this PR on GitHub and the associated discussion here, which suggests that, in addition to implementing the widely established conventions, accommodations be made for calls to batch files and msiexec.exe-style programs, which have non-conventional quoting requirements.

这篇关于Rscript 在带有单引号和双引号的 Windows 上表现不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆