XMLEventReader 为单个标签生成两个 EvText 事件 [英] XMLEventReader generates two EvText events for single tag
问题描述
我在 scala xml 事件阅读器中发现了一个奇怪的行为.对于这样的 xml:
I spotted a weird behavior in scala xml event reader. For an xml like this:
<page>
<title>AT&T Bell Labs</title>
<ns>0</ns>
<id>63739</id>
</page>
它为标题生成 EvText 事件,因为它包含 & 的特殊 xml 编码.
It generates to EvText events for title since it contains the special xml encoding of &.
case EvText( text ) =>
{
println(text)
}
作为上面代码的结果,我得到了输出
As a result for the code above, I get the output
AT
T Bell Labs
而不是AT&T 贝尔实验室
.
推荐答案
实体引用事件由它们自己的构造函数表示,EvEntityRef(并且通常您不应该指望由单个 EvText
事件表示的连续字符,无论如何,如果我没记错的话).
Entity reference events are represented by their own constructor, EvEntityRef (and in general you shouldn't count on consecutive characters being represented by a single EvText
event, anyway, if I remember correctly).
以下是我在过去某个时间编写的一些难看的命令式代码,用于处理这两种文本事件:
Here's some ugly imperative code I wrote at some point in the past to handle both kinds of text events:
def readText(reader: Iterator[XMLEvent]): String = {
val builder = new StringBuilder
var current = reader.next
while (
current match {
case EvText(text) => builder.append(text); true
case EvEntityRef("amp") => builder.append("&"); true
case EvEntityRef("lt") => builder.append("<"); true
case EvEntityRef("gt") => builder.append(">"); true
case _ => false
}
) current = reader.next
builder.toString
}
请注意,这会烧掉第一个非文本事件(我想?谁知道呢——这是一种你永远不想再读的代码),并且通常令人不快,但它应该让你知道如何你可以处理这种事情.
Note that this burns the first non-text event (I think? who knows—this is the kind of code you never want to have to read again), and is generally unpleasant, but it should give you some idea of how you could handle this kind of thing.
这篇关于XMLEventReader 为单个标签生成两个 EvText 事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!