R: 反斜杠上的 strsplit (\) [英] R: strsplit on backslash (\)

查看:97
本文介绍了R: 反斜杠上的 strsplit (\)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在第一个反斜杠之前提取字符串的一部分,但我似乎无法让它正常工作.

根据 strsplit 的手册页和在线搜索后,我尝试了多种使其工作的方法.

在我的实际情况中,字符串位于我从数据库连接获取的数据框中,但我可以通过以下方式简化情况:

<前>> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\",fixed=TRUE)[[1]][1] "BLAAT1\022E:" "BLAAT2" "BLAAT3"> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\",fixed=FALSE)strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3", "\\", fixed = FALSE) 中的错误:无效的正则表达式\",原因是尾随反斜杠"> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\",fixed=TRUE)[[1]][1] "BLAAT1\022E:\\BLAAT2\\BLAAT3"> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\",fixed=FALSE)[[1]][1] "BLAAT1\022E:" "BLAAT2" "BLAAT3"

预期的输出也会在 BLAAT1 和 022E 之间的 \ 上分裂:

提前致谢

解决方案

如果您使用带有 strsplit 函数的正则表达式,一个文字反斜杠可以编码为两个文字反斜杠(作为文字 \ 是一个特殊的正则表达式元字符,用于形成正则表达式转义,如 \d\w 等),但由于 R 字符串文字支持字符串转义序列(如 "\r" 用于回车,"\n" 用于换行符)需要用双反斜杠定义文字反斜杠.

所以,"\\" 是一个文字 \,和一个匹配文字反斜杠字符的正则表达式模式,是 \\,应该用 4 个反斜杠编码,"\\\\".

这是您可以使用的正则表达式:它在 \不可打印字符:

strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\|[^[:print:]]",fixed=FALSE)# [1] "BLAAT1" "E:" "BLAAT2" "BLAAT3"

参见 IDEONE 演示

I am trying to extract the part of the string before the first backslash but I can't seem to get it tot work properly.

I have tried multiple ways of getting it to work, based on the manual page for strsplit and after searching online.

In my actual situation the strings are in a dataframe which I get from a database connection but I can simplify the situation with the following:

> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\",fixed=TRUE)
[[1]]
[1] "BLAAT1\022E:" "BLAAT2"      "BLAAT3"  

> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\",fixed=FALSE)
Error in strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3", "\\", fixed = FALSE) : 
  invalid regular expression '\', reason 'Trailing backslash'

> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\",fixed=TRUE)
[[1]]
[1] "BLAAT1\022E:\\BLAAT2\\BLAAT3"

> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\",fixed=FALSE)
[[1]]
[1] "BLAAT1\022E:" "BLAAT2"       "BLAAT3"      

The expected output would also split on the \ between BLAAT1 and 022E:

Thanks in advance

解决方案

If you use a regex with strsplit function, a literal backslash can be coded as two literal backslashes (as a literal \ is a special regex metacharacter that is used to form regex escapes, like \d, \w, etc.), but since R string literals support string escape sequences (like "\r" for carriage return, "\n" for a newline char) a literal backslash needs to be defined with a double backslash.

So, "\\" is a literal \, and a regex pattern to match a literal backslash char, being \\, should be coded with 4 backslashes, "\\\\".

Here is a regex that you can use: it splits at \ and a non-printable character:

strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\|[^[:print:]]",fixed=FALSE)
# [1] "BLAAT1" "E:"     "BLAAT2" "BLAAT3"

See IDEONE demo

这篇关于R: 反斜杠上的 strsplit (\)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆