Scala正则表达式(字符串用双引号分隔) [英] Scala Regular Expressions (string delimited by double quotes)
问题描述
我是scala的新手。我试图匹配一个用双引号分隔的字符串,但我对以下行为感到困惑:
I am new to scala. I am trying to match a string delimited by double quotes, and I am a bit puzzled by the following behavior:
如果我执行以下操作:
val stringRegex = """"([^"]*)"(.*$)"""
val regex = stringRegex.r
val tidyTokens = Array[String]("1", "\"test\"", "'c'", "-23.3")
tidyTokens.foreach {
token => if (token.matches (stringRegex)) println (token + " matches!")
}
我得到
"test" matches!
否则,如果执行以下操作:
otherwise, if I do the following:
tidyTokens.foreach {
token => token match {
case regex(token) => println (token + " matches!")
case _ => println ("No match for token " + token)
}
}
我知道
No match for token 1
No match for token "test"
No match for token 'c'
No match for token -23.3
为什么在第二种情况下测试不匹配?
Why doesn't "test" match in the second case?
推荐答案
使用正则表达式:
"([^"]*)"(.*$)
使用 .r $ c编译时$ c>,此字符串将产生一个
regex
对象-如果与输入字符串匹配,则必须产生 2 个捕获的字符串-一个用于([[^] *)
,另一个用于(。* $)
。您的代码
When compiled with .r
, this string yields a regex
object - which, if it matches it's input string, must yield 2 captured strings - one for the ([^"]*)
and the other for the (.*$)
. Your code
case regex(token) => ...
应该反映这一点,所以也许您想要
Ought to reflect this, so maybe you want
case regex(token, otherStuff) => ...
或者只是
case regex(token, _) => ...
为什么?因为 case regex(matchedCaputures ...)
语法有效,因为 regex
是带有 unapplySeq
方法。 case regex(token)=> ...
大致翻译为:
Why? Because the case regex(matchedCaputures...)
syntax works because regex
is an
object with an unapplySeq
method. case regex(token) => ...
translates (roughly) to:
case List(token) => ...
列表(令牌)
是 regex.unapplySeq(inputString)
返回的内容:
regex.unapplySeq("\"test\"") // Returns Some(List("test", ""))
您的正则表达式确实匹配字符串 test
,但在 case
语句中,正则表达式提取器的 unapplySeq
方法返回一个 2 字符串列表,因为正则表达式表示捕获了该字符串。不幸的是,但是编译器无法在这里为您提供帮助,因为正则表达式是在运行时从字符串编译而成的。
Your regex does match the string "test"
but in the case
statement the regex extractor's unapplySeq
method returns a list of 2 strings because that is what the regex says it captures. That's unfortunate, but the compiler can't help you here because regular expressions are compiled from strings at runtime.
一种选择是使用非捕获组:
One alternative would be to use a non-capturing group:
val stringRegex = """"([^"]*)"(?:.*$)"""
// ^^
然后您的代码将起作用,因为 regex
现在将成为一个提取器对象,其
unapplySeq
方法仅返回一个捕获的组:
Then your code would work, because regex
will now be an extractor object whose
unapplySeq
method returns only a single captured group:
tidyTokens foreach {
case regex(token) => println (token + " matches!")
case t => println ("No match for token " + t)
}
请参阅提取器对象,以更好地了解
如何应用
/ 取消应用
/ unapplySeq
有效。
Have a look at the tutorial on Extractor Objects, for a better understanding on
how apply
/ unapply
/ unapplySeq
works.
这篇关于Scala正则表达式(字符串用双引号分隔)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!