XMLEventReader 为单个标签生成两个 EvText 事件 [英] XMLEventReader generates two EvText events for single tag

查看:36
本文介绍了XMLEventReader 为单个标签生成两个 EvText 事件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 scala xml 事件阅读器中发现了一个奇怪的行为.对于这样的 xml:

I spotted a weird behavior in scala xml event reader. For an xml like this:

  <page>
    <title>AT&amp;T Bell Labs</title>
    <ns>0</ns>
    <id>63739</id>
  </page>

它为标题生成 EvText 事件,因为它包含 & 的特殊 xml 编码.

It generates to EvText events for title since it contains the special xml encoding of &.

case EvText( text ) =>
{
  println(text)
}

作为上面代码的结果,我得到了输出

As a result for the code above, I get the output

AT 
 T Bell Labs

而不是AT&amp;T 贝尔实验室.

推荐答案

实体引用事件由它们自己的构造函数表示,EvEntityRef(并且通常您不应该指望由单个 EvText 事件表示的连续字符,无论如何,如果我没记错的话).

Entity reference events are represented by their own constructor, EvEntityRef (and in general you shouldn't count on consecutive characters being represented by a single EvText event, anyway, if I remember correctly).

以下是我在过去某个时间编写的一些难看的命令式代码,用于处理这两种文本事件:

Here's some ugly imperative code I wrote at some point in the past to handle both kinds of text events:

def readText(reader: Iterator[XMLEvent]): String = {
  val builder = new StringBuilder
  var current = reader.next
  while (
    current match {
      case EvText(text)       => builder.append(text); true
      case EvEntityRef("amp") => builder.append("&"); true
      case EvEntityRef("lt")  => builder.append("<"); true
      case EvEntityRef("gt")  => builder.append(">"); true
      case _ => false
    }
  ) current = reader.next 
  builder.toString
}

请注意,这会烧掉第一个非文本事件(我想?谁知道呢——这是一种你永远不想再读的代码),并且通常令人不快,但它应该让你知道如何你可以处理这种事情.

Note that this burns the first non-text event (I think? who knows—this is the kind of code you never want to have to read again), and is generally unpleasant, but it should give you some idea of how you could handle this kind of thing.

这篇关于XMLEventReader 为单个标签生成两个 EvText 事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆