使用功能性方法拆分复杂的字符串模式(不使用正则表达式) [英] Splitting complex String Pattern (without regex) in a functional approach

查看:91
本文介绍了使用功能性方法拆分复杂的字符串模式(不使用正则表达式)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试以一种更加惯用的功能方法来拆分不带正则表达式的字符串.

I am trying to split a string without regex in a more idiomatic functional approach.

case class Parsed(blocks: Vector[String], block: String, depth: Int)

def processChar(parsed: Parsed, c: Char): Parsed = {
  import parsed._
  c match {

    case '|'  if depth == 0
                =>  parsed.copy(block = "", blocks = blocks :+ block ,
                                  depth = depth)                          
    case '['  => parsed.copy(block = block + c,
                                  depth = depth + 1)
    case ']'  if depth == 1
                => parsed.copy( block = "", blocks = blocks :+ (block + c),
                                depth = depth - 1)
    case ']'  => parsed.copy(block = block + c,
                                  depth = depth - 1)
    case _    => parsed.copy(block = block + c)
  }
}

val s = "Str|[ts1:tssub2|ts1:tssub2]|BLANK|[INT1|X.X.X.X|INT2|BLANK |BLANK | |X.X.X.X|[INT3|s1]]|[INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]|BLANK |BLANK |[s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]|[[s8|s9|s10|INT20|INT21]|ts3:tssub3| | ];[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK |BLANK ]|BLANK |BLANK |[s14|s15]"
val parsed = s.foldLeft(Parsed(Vector(), "", 0))(processChar)
parsed.blocks.size //20 
parsed.blocks foreach println

我希望得到以下结果( parsed.blocks.size 应该为12).

Str
[ts1:tssub2|ts1:tssub2]
BLANK|
[INT1|X.X.X.X|INT2|BLANK |BLANK | |X.X.X.X|[INT3|s1]]
[INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]
BLANK 
BLANK 
[s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]
[[s8|s9|s10|INT20|INT21]|ts3:tssub3| | ];[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK |BLANK ]
BLANK 
BLANK 
[s14|s15]

无论如何我得到的结果都是( parsed.blocks.size 为20)

Str
[ts1:tssub2|ts1:tssub2]

BLANK
[INT1|X.X.X.X|INT2|BLANK|BLANK||X.X.X.X|[INT3|s1]]

[INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]

BLANK
BLANK
[s2|s3|s4|INT16|INT17]
;[s5|s6|s7|INT18|INT19]

[[s8|s9|s10|INT20|INT21]|ts1:tssub2||]
;[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK|BLANK]

BLANK
BLANK
[s14|s15]

据我了解,这是括号平衡问题的细微变化.但是,在这种情况下,;表示某种延续.

To my understanding this is slight variation of parenthesis balancing problem. However in this case ; would mean kind of continuation.

在这种情况下,我有两个问题

I have two questions in this case

1)[ts1:tssub2|ts1:tssub2]之后(也是在

[INT1|X.X.X.X|INT2|BLANK|BLANK||X.X.X.X|[INT3|s1]][INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15];[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK|BLANK]

[INT1|X.X.X.X|INT2|BLANK|BLANK||X.X.X.X|[INT3|s1]] , [INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15] and ;[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK|BLANK]

在我的结果中也是如此?

in my result as well ?

2)此时[s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]

作为两个不同的条目进入.但是,这应该合并为 [s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]

go in as two different entries. However this should be merged as [s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]

一个条目[

[[s8|s9|s10|INT20|INT21]|ts1:tssub2||]

;[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK|BLANK])

].有关如何执行此操作的任何线索?

as well]. Any clues to how to do so ?

推荐答案

问题1

出现额外的空字符串块是因为每次都是前一个情况

The extra empty string block appears because the very previous case each time is

case ']'  if depth == 1

它添加了一个空白块并减小了深度.然后我们有

It adds an empty block and decreases the depth. Then we have

case '|'  if depth == 0

这还会添加另一个空块,将之前的空块推入结果Vector.

which also adds another empty block, pushing the previous empty one into the resulting Vector.

在回答第二个问题之前,我想提出另一种实现此解析器的方法,该方法稍微有些习惯.我对当前对象的主要批评是使用中间对象(Parsed)包装状态并在每种情况下均将其复制.确实,我们不需要它:更常见的方法是使用递归函数,尤其是在涉及到 depth 的情况下.

Before answering the second question, I would like to suggest another approach to the implementation of this parser, which is slightly more idiomatic. My major criticism about the current one is the usage of an intermediate object (Parsed) to wrap state and copying it in each and every case. Indeed, we do not need it: more frequent approach is to use a recursive function, especially when depth is involved.

因此,在不显着修改case的处理的情况下,可以将其表示为:

So, without modifying significantly the processing of your cases, it can be represented as follows:

def parse(blocks: Seq[String],
          currentBlock: String,
          remaining: String,
          depth: Int): Seq[String] =
  if (remaining.isEmpty) {
    blocks
  } else {
    val curChar = remaining.head
    curChar match {
      case '|' if depth == 0 =>
        parse(blocks :+ currentBlock, "", remaining.tail, depth)
      case '[' =>
        parse(blocks, currentBlock + curChar, remaining.tail, depth + 1)
      case ']' =>
        if (depth == 1)
          parse(blocks :+ (currentBlock + curChar), "", remaining.tail, depth - 1)
        else
          parse(blocks, currentBlock + curChar, remaining.tail, depth - 1)
      case _ =>
        parse(blocks, currentBlock + curChar, remaining.tail, depth)
    }
  }

它产生的输出与原始解决方案完全相同.

It produces exactly the same output as the original solution.

要解决空白块的问题,我们需要更改case '|':

To fix the issue with empty blocks, we need to change case '|':

case '|' if depth == 0 =>
  val updatedBlocks = if (currentBlock.isEmpty) blocks
                      else blocks :+ currentBlock
  parse(updatedBlocks, "", remaining.tail, depth)

如果当前块包含一个空字符串,我们只是跳过它.

We just skip the current block if it contains an empty string.

问题2

要合并; char之间的两个块,我们需要带回一个已解析的块并将其返回到currentBlock引用中.这代表了另一种情况:

To merge the two blocks between ; char, we need to bring back one parsed block and return it into the currentBlock reference. This represents an additional case:

case ';' =>
  parse(blocks.init, blocks.last + curChar, remaining.tail, depth)

现在,之后

val result = parse(Seq(), "", s, 0)
result.foreach(println)

输出为

Str
[ts1:tssub2|ts1:tssub2]
BLANK
[INT1|X.X.X.X|INT2|BLANK |BLANK | |X.X.X.X|[INT3|s1]]
[INT3|INT4|INT5|INT6|INT7|INT8|INT9|INT10|INT11|INT12|INT13|INT14|INT15]
BLANK 
BLANK 
[s2|s3|s4|INT16|INT17];[s5|s6|s7|INT18|INT19]
[[s8|s9|s10|INT20|INT21]|ts3:tssub3| | ];[[s11|s12|s13|INT21|INT22]|INT23:INT24|BLANK |BLANK ]
BLANK 
BLANK 
[s14|s15]

它看起来和您想要的非常相似.

And it looks very similar to what you were looking for.

这篇关于使用功能性方法拆分复杂的字符串模式(不使用正则表达式)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆