strsplit 与 gregexpr 不一致 [英] strsplit inconsistent with gregexpr

查看:68
本文介绍了strsplit 与 gregexpr 不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

评论我对这个问题的回答应该使用strsplit 不会,即使它似乎正确匹配字符向量中的第一个和最后一个逗号.这可以使用 gregexprregmatches 来证明.

A comment on my answer to this question which should give the desired result using strsplit does not, even though it seems to correctly match the first and last commas in a character vector. This can be proved using gregexpr and regmatches.

那么为什么在这个例子中 strsplit 在每个逗号上拆分,即使 regmatches 只返回 same 正则表达式的两个匹配项?

So why does strsplit split on each comma in this example, even though regmatches only returns two matches for the same regex?

#  We would like to split on the first comma and
#  the last comma (positions 4 and 13 in this string)
x <- "123,34,56,78,90"

#  Splits on every comma. Must be wrong.
strsplit( x , '^\\w+\\K,|,(?=\\w+$)' , perl = TRUE )[[1]]
#[1] "123" "34"  "56"  "78"  "90" 


#  Ok. Let's check the positions of matches for this regex
m <- gregexpr( '^\\w+\\K,|,(?=\\w+$)' , x , perl = TRUE )

# Matching positions are at
unlist(m)
[1]  4 13

#  And extracting them...
regmatches( x , m )
[[1]]
[1] "," ","

<小时>

嗯?!这是怎么回事?


Huh?! What is going on?

推荐答案

@Aprillion 的理论是准确的,来自 R 文档:

The theory of @Aprillion is exact, from R documentation:

应用于每个输入字符串的算法是

The algorithm applied to each input string is

repeat {
    if the string is empty
        break.
    if there is a match
        add the string to the left of the match to the output.
        remove the match and all to the left of it.
    else
        add the string to the output.
        break.
}

换句话说,在每次迭代时,^ 将匹配一个新字符串的开头(没有前面的项目.)

In other words, at each iteration ^ will match the begining of a new string (without the precedent items.)

简单地说明这种行为:

> x <- "12345"
> strsplit( x , "^." , perl = TRUE )
[[1]]
[1] "" "" "" "" ""

此处,您可以使用前瞻断言作为分隔符查看此行为的后果(感谢@JoshO'Brien链接.)

Here, you can see the consequence of this behavior with a lookahead assertion as delimiter (Thanks to @JoshO'Brien for the link.)

这篇关于strsplit 与 gregexpr 不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆