用Scala中的占位符替换字符串中的值 [英] Substitute values in a string with placeholders in Scala

查看:2831
本文介绍了用Scala中的占位符替换字符串中的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚开始使用Scala,并希望更好地理解解决问题的功能方法。
我有一对字符串,第一个具有参数的占位符,它的对具有要替代的值。例如
从tab1中选择col1,其中id> $ 1并且名称像$ 2
参数:$ 1 ='250',$ 2 ='some%'



可能有2个以上的参数。

我可以通过在每行上逐步使用regex.findAllIn(line)来构建正确的字符串,然后去通过迭代器来构造替换,但这看起来相当不雅和程序化的驱动。



任何人都可以指向一种功能更强的方法, p>
解决方案

对于替换问题,我首选的解决方案是由即将推出的Scala 2.8中可能提供的功能启用的解决方案,这是使用函数替换正则表达式模式的能力。使用它,问题可以简化为:

  def replaceRegex(input:String,values:IndexedSeq [String])= 
\ $(\d +)。r.replaceAllMatchesIn(input,{
case)Regex.Groups(index)=> values(index.toInt)
})

这样可以减少问题的实际意图:将所有的 $ N或者,如果你真的可以为你的输入字符串设置标准,那么你可以使用 你可以这样做:

 从tab1中选择col1,其中id>%1 $ s并且名称如%2 $ s格式(one,two)

如果这就是你想要的,你可以停止这里。但是,如果你对如何以功能方式解决这些问题感兴趣,缺乏聪明的图书馆功能,请继续阅读。



从功能上思考它意味着考虑功能。你有一个字符串,一些值,你想要一个字符串。在一个静态类型的函数式语言中,这意味着你需要这样的东西:

 (String,List [String])=>字符串

如果考虑到这些值可以以任何顺序使用,我们可能会要求更好的类型适用于此:

 (String,IndexedSeq [String])=>字符串

这对我们的函数应该足够好。现在,我们如何分解工作?有几种标准的做法:递归,理解,折叠。

RECURSION



让我们从递归开始。递归意味着将问题划分为第一步,然后在剩余的数据上重复它。对我来说,这里最明显的区分是:


  1. 替换第一个占位符

  2. 重复使用剩余的占位符

这实际上非常简单,因此我们来进一步了解详情。我如何替换第一个占位符?有一件无法避免的事情是我需要知道占位符是什么,因为我需要从中得到我的值。所以我需要找到它:

 (String,Pattern)=>字符串

一旦找到,我可以将其替换为字符串并重复:

  val stringPattern =\\ $(\\d +)
val regexPattern = stringPattern.r
def replaceRecursive (input:String,values:IndexedSeq [String]):String = regexPattern findFirstIn input match {
case regexPattern(index)=> replaceRecursive(input replaceFirst(stringPattern,values(index.toInt)))
case _ =>输入//找不到占位符,已完成
}

效率低下,因为它重复产生新的字符串,而不是仅仅连接每个部分。让我们试着更聪明一点。



为了通过串联有效地构建一个字符串,我们需要使用 StringBuilder 。我们也想避免创建新的字符串。 StringBuilder 可以接受 CharSequence ,我们可以从 String 。我不确定是否真的创建了一个新的字符串 - 如果是,我们可以用一种方式将我们自己的 CharSequence 放入 String ,而不是创建一个新的 String 。如果需要,我们可以很容易地改变这一点,我会继续假设它不是。

因此,让我们考虑一下我们需要的功能。当然,我们需要一个将索引返回到第一个占位符的函数:

  String => Int 

但是我们也想跳过我们已经看过的字符串的任何部分。这意味着我们还需要一个起始索引:

 (String,Int)=> Int 

虽然有一个小细节。如果有更多的占位符会怎么样?那么就不会有任何索引返回。 Java重用索引来返回该异常。但是,在进行函数式编程时,最好还是回到你的意思。而我们的意思是我们可能会返回一个索引,否则我们可能不会。签名是这样的:

 (String,Int)=>选项[Int] 

让我们来构建这个函数:

  def indexOfPlaceholder(input:String,start:Int):Option [Int] = if(start< input.lengt){
input indexOf($,开始)匹配{
case -1 =>无
案例索引=>
if(index + 1< input.length&& input(index + 1).isDigit)
Some(index)
else
indexOfPlaceholder(input,index + 1)
}
} else {
None
}

这很复杂,主要是为了处理边界条件,比如索引超出范围,或者在寻找占位符时出现误报。



要跳过占位符,我们还需要知道它的长度,签名(String,Int)=> Int

  def placeholderLength(input:String,start:Int):Int = {
$ recurse(pos:Int):Int = if(pos< input.length&&& input(pos).isDigit)
递归(pos + 1)
else
pos
recurse(start + 1) - start // start + 1跳过$符号
}

接下来,我们也想知道占位符所代表的值的索引。这个签名有点含糊:

 (String,Int)=> Int 

第一个 Int 是一个索引输入,而第二个是一个索引值。我们可以做些什么,但不是那么容易或有效,所以我们不要理会。这里有一个实现:

  def indexOfValue(input:String,start:Int):Int = {
def (pos:Int,acc:Int):Int = if(pos< input.length&& amp; amp; input(pos).isDigit)
recurse(pos + 1,acc * 10 + input(pos) .asDigit)
else
acc
recurse(start + 1,0)// start + 1跳过$
}



我们也可以使用这个长度,并且实现一个更简单的实现:

  def indexOfValue2(input:String,start:Int,length:Int):Int = if(length> 0){
input(start + length - 1).asDigit + 10 * indexOfValue2 (input,start,length - 1)
} else {
0
}

需要注意的是,在简单表达式周围使用大括号(比如上面的),可以被传统的Scala风格所吸引,但我在这里使用它,因此它可以很容易地粘贴到REPL上。



因此,我们可以得到下一个占位符的索引,它的长度和索引 价值。这几乎是所有需要更高效版本的 replaceRecursive

  def replaceRecursive2(input:String,values:IndexedSeq [String]):String = {
val sb = new StringBuilder(input.length)
def recurse(start:Int):String = if(start< ; input.length){
indexOfPlaceholder(input,start)match {
case Some(placeholderIndex)=>
val placeholderLength = placeholderLength(input,placeholderIndex)
sb.append(input subSequence(start,placeholderIndex))
sb.append(values(indexOfValue(input,placeholderIndex)))
recurse(start + placeholderIndex + placeholderLength)
case None => sb.toString
}
} else {
sb.toString
}
递归(0)
}

效率更高,而且可以使用 StringBuilder


$ b $ p COMPREHENSION



理解,在最基本的层面上,意味着将 T [A] 转换为 T [B] 给定函数 A =>乙。这是一件单调的事情,但在收藏时很容易理解。例如,我可以通过一个名字将一个 List [String] 名称转换为名称长度的 List [Int] 函数 String => Int 它返回一个字符串的长度。这是一个列表理解。



还有其他的操作可以通过理解来完成,给定的函数的签名为 A => T [B] A =>布尔



这意味着我们需要看到输入字符串为 T [A] 。我们不能使用 Array [Char] 作为输入,因为我们想要替换整个占位符,该占位符大于单个字符。因此,我们建议这样的签名:

 (List [String],String => String)=>字符串

由于我们输入的是 String ,我们需要一个函数 String =>首先列出[String] ,这将把我们的输入分成占位符和非占位符。我建议这样做:

  val regexPattern2 =((?:[^ $] + | \ $(?! (输入:字符串):列表[字符串] = regexPattern2.findAllIn(输入).toList 

另一个问题是我们得到了一个 IndexedSeq [String] ,但我们需要一个 String =>串。有很多方法,但让我们来解决这个问题:
$ b $ pre $ code def valuesMatcher(值:IndexedSeq [String]):String = > String =(input:String)=>值(input.substring(1).toInt - 1)

我们还需要一个函数 List [String] => String ,但 List mkString 已经这样做了。因此,除了编写所有这些东西外,几乎没有什么可做:

  def comprehension(input:List [String],matcher:String = > String)= 
for(token< - input)yield(token:@unchecked)match {
case regexPattern2(_,placeholder:String)=> matcher(占位符)
case regexPattern2(other:String,_)=>其他

我使用 @unchecked 不应该是上面这两个以外的任何模式,如果我的正则表达式模式是正确构建的。然而,编译器不知道,所以我使用该注释来沉默它会产生的警告。如果抛出异常,则表示存在正则表达式模式中的错误。



然后,最终函数统一所有这些:

  def replaceComprehension(input:String,values:IndexedSeq [String])= 
comprehension(tokenize(input),valuesMatcher(values))。mkString

这个解决方案的一个问题是我应用了两次正则表达式模式:一次是分割字符串,另一个是确定占位符。另一个问题是标记的 List 是不必要的中间结果。我们可以通过这些改变来解决这个问题:

  def tokenize2(input:String):Iterator [List [String]] = regexPattern2。 findAllIn(input).matchData.map(_。subgroups)

def comprehension2(input:Iterator [List [String]],matcher:String => String)=
for(token < - input)yield(token:@unchecked)match {
case List(_,placeholder:String)=>匹配器(占位符)
案例列表(other:String,_)=>其他

$ b $ def replaceComprehension2(input:String,values:IndexedSeq [String])=
comprehension2(tokenize2(input),valuesMatcher(values))。mkString

折叠



折叠与递归和理解有点相似。通过折叠,我们可以获取可以理解的 T [A] 输入,一个 B seed和一个函数(B,A)=>乙。我们使用函数理解列表,总是取得最后处理的元素(第一个元素接受种子)产生的 B 。最后,我们返回最后理解的元素的结果。



我承认我几乎不能用一种不太明显的方式来解释它。当你试图保持抽象时会发生这种情况。我这样解释,以便所涉及的类型签名变得清晰。但是,让我们看一个折叠的简单例子来理解它的用法:

  def factorial(n:Int)= {
val输入= 2到n
val种子= 1
val函数=(b:Int,a:Int)=> b * a
input.foldLeft(seed)(function)
}

或者,作为单线:

  def factorial2(n:Int)=(2到n).foldLeft(1) (_ * _)

好的,那么我们将如何解决折叠问题?当然,结果应该是我们想要产生的字符串。因此,种子应该是一个空的字符串。让我们使用 tokenize2 的结果作为可理解的输入,并执行此操作:

  def replaceFolding(input:String,values:IndexedSeq [String])= {
val seed = new StringBuilder(input.length)
val matcher = valuesMatcher(values)
folding foldingFunction = (sb:StringBuilder,token:List [String])=> {
token match {
case List(_,placeholder:String)=> sb.append(matcher(placeholder))
case List(other:String,_)=> sb.append(other)
}
sb
}
tokenize2(input).foldLeft(seed)(foldingFunction).toString
}

然后,我用完整的方式展示了最常用的方法。我使用 StringBuilder ,因为 String 的连接速度很慢。如果不是这样,我可以轻松地用 String 替换上面函数中的 StringBuilder 。我也可以将 Iterator 转换为 Stream ,完全消除可变性。



这是斯卡拉,但斯卡拉是关于平衡需求和手段,而不是纯粹的解决方案。虽然,当然,你可以自由地享受纯粹主义。 : - )

I have just started using Scala and wish to better understand the functional approach to problem solving. I have pairs of strings the first has placeholders for parameter and it's pair has the values to substitute. e.g. "select col1 from tab1 where id > $1 and name like $2" "parameters: $1 = '250', $2 = 'some%'"

There may be many more than 2 parameters.

I can build the correct string by stepping through and using regex.findAllIn(line) on each line and then going through the iterators to construct the substitution but this seems fairly inelegant and procedurally driven.

Could anyone point me towards a functional approach that will be neater and less error prone?

解决方案

Speaking strictly to the replacement problem, my preferred solution is one enabled by a feature that should probably be available in the upcoming Scala 2.8, which is the ability to replace regex patterns using a function. Using it, the problem can be reduced to this:

def replaceRegex(input: String, values: IndexedSeq[String]) =  
  """\$(\d+)""".r.replaceAllMatchesIn(input, {
    case Regex.Groups(index) => values(index.toInt)
  })

Which reduces the problem to what you actually intend to do: replace all $N patterns by the corresponding Nth value of a list.

Or, if you can actually set the standards for your input string, you could do it like this:

"select col1 from tab1 where id > %1$s and name like %2$s" format ("one", "two")

If that's all you want, you can stop here. If, however, you are interested in how to go about solving such problems in a functional way, absent clever library functions, please do continue reading.

Thinking functionally about it means thinking of the function. You have a string, some values, and you want a string back. In a statically typed functional language, that means you want something like this:

(String, List[String]) => String

If one considers that those values may be used in any order, we may ask for a type better suited for that:

(String, IndexedSeq[String]) => String

That should be good enough for our function. Now, how do we break down the work? There are a few standard ways of doing it: recursion, comprehension, folding.

RECURSION

Let's start with recursion. Recursion means to divide the problem into a first step, and then repeating it over the remaining data. To me, the most obvious division here would be the following:

  1. Replace the first placeholder
  2. Repeat with the remaining placeholders

That is actually pretty straight-forward to do, so let's get into further details. How do I replace the first placeholder? One thing that can't be avoided is that I need to know what that placeholder is, because I need to get the index into my values from it. So I need to find it:

(String, Pattern) => String

Once found, I can replace it on the string and repeat:

val stringPattern = "\\$(\\d+)"
val regexPattern = stringPattern.r
def replaceRecursive(input: String, values: IndexedSeq[String]): String = regexPattern findFirstIn input match {
  case regexPattern(index) => replaceRecursive(input replaceFirst (stringPattern, values(index.toInt)))
  case _ => input // no placeholder found, finished
}

That is inefficient, because it repeatedly produces new strings, instead of just concatenating each part. Let's try to be more clever about it.

To efficiently build a string through concatenation, we need to use StringBuilder. We also want to avoid creating new strings. StringBuilder can accepts CharSequence, which we can get from String. I'm not sure if a new string is actually created or not -- if it is, we could roll our own CharSequence in a way that acts as a view into String, instead of creating a new String. Assured that we can easily change this if required, I'll proceed on the assumption it is not.

So, let's consider what functions we need. Naturally, we'll want a function that returns the index into the first placeholder:

String => Int

But we also want to skip any part of the string we have already looked at. That means we also want a starting index:

(String, Int) => Int

There's one small detail, though. What if there's on further placeholder? Then there wouldn't be any index to return. Java reuses the index to return that exception. When doing functional programming however, it is always best to return what you mean. And what we mean is that we may return an index, or we may not. The signature for that is this:

(String, Int) => Option[Int]

Let's build this function:

def indexOfPlaceholder(input: String, start: Int): Option[Int] = if (start < input.lengt) {
  input indexOf ("$", start) match {
    case -1 => None
    case index => 
      if (index + 1 < input.length && input(index + 1).isDigit)
        Some(index)
      else
        indexOfPlaceholder(input, index + 1)
  }
} else {
  None
}

That's rather complex, mostly to deal with boundary conditions, such as index being out of range, or false positives when looking for placeholders.

To skip the placeholder, we'll also need to know it's length, signature (String, Int) => Int:

def placeholderLength(input: String, start: Int): Int = {
  def recurse(pos: Int): Int = if (pos < input.length && input(pos).isDigit)
    recurse(pos + 1)
  else
    pos
  recurse(start + 1) - start  // start + 1 skips the "$" sign
}

Next, we also want to know what, exactly, the index of the value the placeholder is standing for. The signature for this is a bit ambiguous:

(String, Int) => Int

The first Int is an index into the input, while the second is an index into the values. We could do something about that, but not that easily or efficiently, so let's ignore it. Here's an implementation for it:

def indexOfValue(input: String, start: Int): Int = {
  def recurse(pos: Int, acc: Int): Int = if (pos < input.length && input(pos).isDigit)
    recurse(pos + 1, acc * 10 + input(pos).asDigit)
  else
    acc
  recurse(start + 1, 0) // start + 1 skips "$"
}

We could have used the length too, and achieve a simpler implementation:

def indexOfValue2(input: String, start: Int, length: Int): Int = if (length > 0) {
  input(start + length - 1).asDigit + 10 * indexOfValue2(input, start, length - 1)
} else {
  0
}

As a note, using curly brackets around simple expressions, such as above, is frowned upon by conventional Scala style, but I use it here so it can be easily pasted on REPL.

So, we can get the index to the next placeholder, its length, and the index of the value. That's pretty much everything needed for a more efficient version of replaceRecursive:

def replaceRecursive2(input: String, values: IndexedSeq[String]): String = {
  val sb = new StringBuilder(input.length)
  def recurse(start: Int): String = if (start < input.length) {
    indexOfPlaceholder(input, start) match {
      case Some(placeholderIndex) =>
        val placeholderLength = placeholderLength(input, placeholderIndex)
        sb.append(input subSequence (start, placeholderIndex))
        sb.append(values(indexOfValue(input, placeholderIndex)))
        recurse(start + placeholderIndex + placeholderLength)
      case None => sb.toString
    }
  } else {
    sb.toString
  }
  recurse(0)
}

Much more efficient, and as functional as one can be using StringBuilder.

COMPREHENSION

Comprehensions, at their most basic level, means transforming T[A] into T[B] given a function A => B. It's a monad thing, but it can be easily understood when it comes to collections. For instance, I may transform a List[String] of names into a List[Int] of name lengths through a function String => Int which returns the length of a string. That's a list comprehension.

There are other operations that can be done through comprehensions, given functions with signatures A => T[B] or A => Boolean.

That means we need to see the input string as a T[A]. We can't use Array[Char] as input because we want to replace the whole placeholder, which is larger than a single char. Let's propose, therefore, this type signature:

(List[String], String => String) => String

Since we the input we receive is String, we need a function String => List[String] first, which will divide our input into placeholders and non-placeholders. I propose this:

val regexPattern2 = """((?:[^$]+|\$(?!\d))+)|(\$\d+)""".r
def tokenize(input: String): List[String] = regexPattern2.findAllIn(input).toList

Another problem we have is that we got an IndexedSeq[String], but we need a String => String. There are many ways around that, but let's settle with this:

def valuesMatcher(values: IndexedSeq[String]): String => String = (input: String) => values(input.substring(1).toInt - 1)

We also need a function List[String] => String, but List's mkString does that already. So there's little left to do aside composing all this stuff:

def comprehension(input: List[String], matcher: String => String) = 
  for (token <- input) yield (token: @unchecked) match {
    case regexPattern2(_, placeholder: String) => matcher(placeholder)
    case regexPattern2(other: String, _) => other
  }

I use @unchecked because there shouldn't be any pattern other than these two above, if my regex pattern was built correctly. The compiler doesn't know that, however, so I use that annotation to silent the warning it would produce. If an exception is thrown, there's a bug in the regex pattern.

The final function, then, unifies all that:

def replaceComprehension(input: String, values: IndexedSeq[String]) =
  comprehension(tokenize(input), valuesMatcher(values)).mkString

One problem with this solution is that I apply the regex pattern twice: once to break up the string, and the other to identify the placeholders. Another problem is that the List of tokens is an unnecessary intermediate result. We can solve that with these changes:

def tokenize2(input: String): Iterator[List[String]] = regexPattern2.findAllIn(input).matchData.map(_.subgroups)

def comprehension2(input: Iterator[List[String]], matcher: String => String) = 
  for (token <- input) yield (token: @unchecked) match {
    case List(_, placeholder: String) => matcher(placeholder)
    case List(other: String, _) => other
  }

def replaceComprehension2(input: String, values: IndexedSeq[String]) =
  comprehension2(tokenize2(input), valuesMatcher(values)).mkString

FOLDING

Folding is a bit similar to both recursion and comprehension. With folding, we take a T[A] input that can be comprehended, a B "seed", and a function (B, A) => B. We comprehend the list using the function, always taking the B that resulted from the last element processed (the first element takes the seed). Finally, we return the result of the last comprehended element.

I'll admit I could hardly explained it in a less-obscure way. That's what happens when you try to keep abstract. I explained it that way so that the type signatures involved become clear. But let's just see a trivial example of folding to understand its usage:

def factorial(n: Int) = {
  val input = 2 to n
  val seed = 1
  val function = (b: Int, a: Int) => b * a
  input.foldLeft(seed)(function)
}

Or, as a one-liner:

def factorial2(n: Int) = (2 to n).foldLeft(1)(_ * _)

Ok, so how would we go about solving the problem with folding? The result, of course, should be the string we want to produce. Therefore, the seed should be an empty string. Let's use the result from tokenize2 as the comprehensible input, and do this:

def replaceFolding(input: String, values: IndexedSeq[String]) = {
  val seed = new StringBuilder(input.length)
  val matcher = valuesMatcher(values)
  val foldingFunction = (sb: StringBuilder, token: List[String]) => {
    token match {          
      case List(_, placeholder: String) => sb.append(matcher(placeholder))
      case List(other: String, _) => sb.append(other)
    }
    sb
  }
  tokenize2(input).foldLeft(seed)(foldingFunction).toString
}

And, with that, I finish showing the most usual ways one would go about this in a functional manner. I have resorted to StringBuilder because concatenation of String is slow. If that wasn't the case, I could easily replace StringBuilder in functions above by String. I could also convert Iterator into a Stream, and completely do away with mutability.

This is Scala, though and Scala is about balancing needs and means, not of purist solutions. Though, of course, you are free to go purist. :-)

这篇关于用Scala中的占位符替换字符串中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆