Scala正则表达式(字符串用双引号分隔) [英] Scala Regular Expressions (string delimited by double quotes)

查看:274
本文介绍了Scala正则表达式(字符串用双引号分隔)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是scala的新手。我试图匹配一个用双引号分隔的字符串,但我对以下行为感到困惑:

I am new to scala. I am trying to match a string delimited by double quotes, and I am a bit puzzled by the following behavior:

如果我执行以下操作:

val stringRegex = """"([^"]*)"(.*$)"""
val regex = stringRegex.r
val tidyTokens = Array[String]("1", "\"test\"", "'c'", "-23.3")
tidyTokens.foreach {
    token => if (token.matches (stringRegex)) println (token + " matches!")
}

我得到

"test" matches!

否则,如果执行以下操作:

otherwise, if I do the following:

tidyTokens.foreach {
    token => token match {
        case regex(token) => println (token + " matches!")
        case _ => println ("No match for token " + token)
    }
}

我知道

No match for token 1
No match for token "test"
No match for token 'c'
No match for token -23.3

为什么在第二种情况下测试不匹配?

Why doesn't "test" match in the second case?

推荐答案

使用正则表达式:

 "([^"]*)"(.*$)

使用 .r ,此字符串将产生一个 regex 对象-如果与输入字符串匹配,则必须产生 2 个捕获的字符串-一个用于([[^] *),另一个用于(。* $)。您的代码

When compiled with .r, this string yields a regex object - which, if it matches it's input string, must yield 2 captured strings - one for the ([^"]*) and the other for the (.*$). Your code

  case regex(token) => ...

应该反映这一点,所以也许您想要

Ought to reflect this, so maybe you want

  case regex(token, otherStuff) => ...

或者只是

  case regex(token, _) => ...

为什么?因为 case regex(matchedCaputures ...)语法有效,因为 regex 是带有 unapplySeq 方法。 case regex(token)=> ... 大致翻译为:

Why? Because the case regex(matchedCaputures...) syntax works because regex is an object with an unapplySeq method. case regex(token) => ... translates (roughly) to:

 case List(token) => ...

列表(令牌) regex.unapplySeq(inputString)返回的内容:

 regex.unapplySeq("\"test\"") // Returns Some(List("test", ""))

您的正则表达式确实匹配字符串 test ,但在 case 语句中,正则表达式提取器的 unapplySeq 方法返回一个 2 字符串列表,因为正则表达式表示捕获了该字符串。不幸的是,但是编译器无法在这里为您提供帮助,因为正则表达式是在运行时从字符串编译而成的。

Your regex does match the string "test" but in the case statement the regex extractor's unapplySeq method returns a list of 2 strings because that is what the regex says it captures. That's unfortunate, but the compiler can't help you here because regular expressions are compiled from strings at runtime.

一种选择是使用非捕获组:

One alternative would be to use a non-capturing group:

 val stringRegex = """"([^"]*)"(?:.*$)"""
 //                             ^^

然后您的代码将起作用,因为 regex 现在将成为一个提取器对象,其
unapplySeq 方法仅返回一个捕获的组:

Then your code would work, because regex will now be an extractor object whose unapplySeq method returns only a single captured group:

 tidyTokens foreach { 
    case regex(token) => println (token + " matches!")
    case t => println ("No match for token " + t)
 }

请参阅提取器对象,以更好地了解
如何应用 / 取消应用 / unapplySeq 有效。

Have a look at the tutorial on Extractor Objects, for a better understanding on how apply / unapply / unapplySeq works.

这篇关于Scala正则表达式(字符串用双引号分隔)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆