使用 Ruby,如何确认 XML 片段有效? [英] Using Ruby, how can I confirm that an XML snippit is valid?

查看:23
本文介绍了使用 Ruby,如何确认 XML 片段有效?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如你们中的一些人所知,我正在,第 8 节,第 5 点:

<块引用>

XMPP Core 的第 11.1 节规定了除 XML 规范第 4.6 节中定义的五个通用实体(即 &lt;、&gt;、&amp;、&apos; 和&quot;) 不得通过 XML 流发送.因此,XHTML-IM 的实现不得包含预定义的 XHTML 1.0 实体,例如 &nbsp;-- 相反,实现必须使用 XML 规范第 4.1 节中指定的等效字符引用(即使在不明显的地方,例如包含在 'href' 属性中的 URI).

所以这个问题不仅仅是生成格式良好的 XML,这是一个先决条件.您还需要确保仅使用 第 6 节中批准的集合中的 XHTML.

简而言之,您需要阅读 XEP-0071.

As some of you make know, I'm working on XMPP (Jabber) integration for the StackOverflow chat system, as an XMPP component written in Ruby using the xmpp4r package.

I'm struggling with one issue (well, many issues, but one issue at the moment :-) I am taking the JSON feed from the chat and extracting the HTML for the messages. I am using The Ruby TidyHTML bindings to convert the HTML from the JSON fed to XHTML, so that I can send it as an XMPP message -- since XMPP messages are just XML, converting the HTML to XHTMl should make it valid XML which I can just stick into the <message> stanza.

For most messages, it works great!

However for other messages, it completely chokes -- the XMPP server closes the stream and the script grinds to a halt. (And rchern and others in The Tavern get upset. Well, maybe not upset, but they laugh at me. This makes me sad!)

I am almost certain that what's happing is, for some reason or other, the messages are not valid XML, and so the XMPP server is closing the connection because it encounters a parse error in the XML stream from the Ruby component. Here's an example of one such message:

<message to='jeswah@smart-safe-secure.com/Token' type='groupchat' xmlns='jabber:client'><body>&lt;div class=&quot;onebox ob-message&quot;&gt;&lt;a class=&quot;roomname&quot; href=&quot;/transcript/message/263372#263372&quot;&gt;&lt;span title=&quot;2010-11-04 19:20:23Z&quot;&gt;1 hour ago&lt;/span&gt;&lt;/a&gt;, by &lt;span class=&quot;user-name&quot;&gt;Fosco&lt;/span&gt; &lt;br/&gt;&lt;div class=&quot;quote&quot;&gt;&lt;div class=&quot;room-mini&quot;&gt;&lt;div class=&quot;room-mini-header&quot;&gt;&lt;h3&gt;&lt;img class=&quot;small-site-logo&quot; title=&quot;Gaming&quot; alt=&quot;Gaming&quot; width=&quot;16&quot; height=&quot;16&quot; src=&quot;http://sstatic.net/gaming/img/favicon.ico&quot; /&gt;&amp;nbsp;&lt;span class=&quot;room-name&quot;&gt;&lt;a href=&quot;http://chat.stackexchange.com/rooms/28/minecraft-talk&quot;&gt;Minecraft Talk&lt;/a&gt;&lt;/span&gt;&lt;/h3&gt;&lt;div class=&quot;room-mini-description&quot;&gt;Everything Minecraft, including classic and survival mode&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;room-current-user-count&quot; title=&quot;current users&quot;&gt;9&lt;/div&gt;&lt;div class=&quot;mspark&quot; style=&quot;height:25px;width:205px&quot;&gt;
&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:13px;left:0px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:9px;left:8px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:2px;left:16px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:8px;left:24px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:1px;left:32px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:1px;left:56px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:0px;left:64px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:0px;left:88px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:0px;left:96px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:11px;left:104px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:7px;left:112px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:7px;left:120px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:25px;left:128px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:14px;left:136px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:4px;left:144px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:7px;left:152px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:19px;left:160px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:19px;left:168px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:12px;left:176px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar&quot; style=&quot;width:8px;height:11px;left:184px;&quot;&gt;&lt;/div&gt;&lt;div class=&quot;mspbar now&quot; style=&quot;height:25px;left:154px;&quot;&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;clear-both&quot;&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</body><html xmlns='http://jabber.org/protocol/xhtml-im'><body xmlns='http://www.w3.org/1999/xhtml'><div class="onebox ob-message"><a class="roomname" href="/transcript/message/263372#263372"><span title="2010-11-04 19:20:23Z">1 hour ago</span></a>, by <span class="user-name">Fosco</span><br />
<div class="quote">
<div class="room-mini"><div class="room-mini-header">
<h3><img class="small-site-logo" title="Gaming" alt="Gaming" width="16" height="16" src="http://sstatic.net/gaming/img/favicon.ico" />&nbsp;<span class="room-name"><a href="http://chat.stackexchange.com/rooms/28/minecraft-talk">Minecraft Talk</a></span></h3>
<div class="room-mini-description">Everything Minecraft, including classic and survival mode</div>
</div>
<div class="room-current-user-count" title="current users">9</div>
<div class="mspark" style="height:25px;width:205px">
<div class="mspbar" style="width:8px;height:13px;left:0px;"></div>
<div class="mspbar" style="width:8px;height:9px;left:8px;"></div>
<div class="mspbar" style="width:8px;height:2px;left:16px;"></div>
<div class="mspbar" style="width:8px;height:8px;left:24px;"></div>
<div class="mspbar" style="width:8px;height:1px;left:32px;"></div>
<div class="mspbar" style="width:8px;height:1px;left:56px;"></div>
<div class="mspbar" style="width:8px;height:0px;left:64px;"></div>
<div class="mspbar" style="width:8px;height:0px;left:88px;"></div>
<div class="mspbar" style="width:8px;height:0px;left:96px;"></div>
<div class="mspbar" style="width:8px;height:11px;left:104px;"></div><div class="mspbar" style="width:8px;height:7px;left:112px;"></div><div class="mspbar" style="width:8px;height:7px;left:120px;"></div><div class="mspbar" style="width:8px;height:25px;left:128px;"></div><div class="mspbar" style="width:8px;height:14px;left:136px;"></div>
<div class="mspbar" style="width:8px;height:4px;left:144px;"></div>
<div class="mspbar" style="width:8px;height:7px;left:152px;"></div>
<div class="mspbar" style="width:8px;height:19px;left:160px;"></div>
<div class="mspbar" style="width:8px;height:19px;left:168px;"></div><div class="mspbar" style="width:8px;height:12px;left:176px;"></div>
<div class="mspbar" style="width:8px;height:11px;left:184px;"></div>
<div class="mspbar now" style="height:25px;left:154px;"></div>
</div>
<div class="clear-both"></div>
</div>
</div>
</div>
</body></html></message>

(This message happened to be a quote of a oneboxed link to a chat room)

Here was the error Ruby gave me:

IOError: stream closed
/usr/lib/ruby/1.8/xmpp4r/stream.rb:594:in `empty?'
/usr/lib/ruby/1.8/rexml/parsers/baseparser.rb:153:in `empty?'
/usr/lib/ruby/1.8/rexml/parsers/baseparser.rb:193:in `pull'
/usr/lib/ruby/1.8/rexml/parsers/sax2parser.rb:92:in `parse'
/usr/lib/ruby/1.8/xmpp4r/streamparser.rb:79:in `parse'
/usr/lib/ruby/1.8/xmpp4r/stream.rb:75:in `start'
/usr/lib/ruby/1.8/xmpp4r/stream.rb:72:in `initialize'
/usr/lib/ruby/1.8/xmpp4r/stream.rb:72:in `new'
/usr/lib/ruby/1.8/xmpp4r/stream.rb:72:in `start'
/usr/lib/ruby/1.8/xmpp4r/connection.rb:119:in `start'
/usr/lib/ruby/1.8/xmpp4r/component.rb:70:in `start'
/usr/lib/ruby/1.8/xmpp4r/connection.rb:77:in `connect'
/usr/lib/ruby/1.8/xmpp4r/component.rb:47:in `connect'
./classes/SOXMPP_Bridge.rb:20:in `initialize'
./soxmpp.rb:81:in `new'
./soxmpp.rb:81

Finally, my question!

Given that sending invalid XML to the XMPP server kicks me off, is there any way using Ruby I can validate (and, preferably, correct) the XML before sending it to the XMPP server? Most likely, correcting it will be a matter of my writing additional code for each case where Tidy isn't producing valid XML, but I'd at least like to stop the script from crashing. So, how can I validate the XML before sending it to the XMPP server?

解决方案

The actual error in this case is your &nbsp;. According to XEP-0071, section 8, point 5:

Section 11.1 of XMPP Core stipulates that character entities other than the five general entities defined in Section 4.6 of the XML specification (i.e., &lt;, &gt;, &amp;, &apos;, and &quot;) MUST NOT be sent over an XML stream. Therefore implementations of XHTML-IM MUST NOT include predefined XHTML 1.0 entities such as &nbsp; -- instead, implementations MUST use the equivalent character references as specified in Section 4.1 of the XML specification (even in non-obvious places such as URIs that are included in the 'href' attribute).

So this issue is about more than just generating well-formed XML, which is a pre-requisite. You'll also want to ensure that you're only using XHTML from the approved set in section 6.

In short, you need to read XEP-0071.

这篇关于使用 Ruby,如何确认 XML 片段有效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆