Xerces在SUN JRE v1.5和IBM J9 v1.5上表现不同 [英] Xerces behaving differently on SUN JRE v1.5 and IBM J9 v1.5

查看:107
本文介绍了Xerces在SUN JRE v1.5和IBM J9 v1.5上表现不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 NekoHTML 解析一些HTML。

问题是当在 SUN JDK 1.5.0_01 上执行下面的代码片段时,它工作正常这是当我用sun jre使用eclipse时)。但是,当在 IBM J9 VM(build 2.3,J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323(JIT enabled)>上执行同样的事情时, (这是当我使用IBM RAD进行开发时)。

  NodeList tags = doc.getElementsByTagName(td ); 

for(int i = 0; i< tags.getLength(); i ++)
{
Element elem =(Element)tags.item(i);
//用elem
做一些事情

通过正常工作我的意思是我得到了可以进一步处理的td元素的列表。在J9的情况下,我没有为循环输入



我使用最新版本的NekoHTML(以及捆绑的Xerces jars)。上面代码中的 doc 类型为 org.w3.dom.Document (使用的运行时类是 org.apache.html.dom.HTMLDocumentImpl



IBM J9的详细信息如下:

  java version1.5.0 
Java(TM)2运行环境标准版(build pwi32devifx-20070323(ifix 117674:SR4 + 116644 + 114941 + 116110 + 114881))
IBM J9 VM(build 2.3,J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323(JIT已启用)
J9VM - 20070322_12058_lHdSMR
JIT - 20070109_1805ifx3_r8
GC - WASIFIX_2007)
JCL - 20070131

赞赏任何想法,建议或解决方法。感谢。

解决方案

我有2个主意。


  1. 我刚刚证实xerces是JRE安装的一部分,所以我相信它会从那里到达您的应用程序的类路径。 SUN和IBM可能会为您带来不同版本的xerces。因此,作为第一种方法检查它,并可能尝试将您在IBM下的内容替换为SUN的版本。如果它有助于您有两种选择:继续使用SUN的xerces运行IBM Java,或者继续调查IBM的xerces中的错误。
  2. 开发和生产环境之间是否存在其他差异?这些是相同的操作系统吗?您是否有机会使用(例如)Windows开发和unix进行生产,但是您的xml是在Windows上用\r\\\
    作为新行写入的?甚至更多:如果您的XML包含unicode字符并在Windows中编写,它可以包含特殊(不可见)前缀,表明这是unicode。此前缀可能会导致解析器失败。


I am trying to parse some HTML using NekoHTML.

The problem is that when the below code snippet is executed on the SUN JDK 1.5.0_01 it works fine (this is when i am using eclipse with sun jre). But when the same thing is executed on IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323 (JIT enabled) then it is not working (this is when i am using the IBM RAD for development).

NodeList tags = doc.getElementsByTagName("td"); 

for (int i = 0; i < tags.getLength(); i++) 
{
 Element elem = (Element) tags.item(i);
 // do something with elem
}

By working fine I mean that I am getting a list of "td" elements which I can process further. In case of the J9 I am not entering the for loop.

I am using latest version of NekoHTML (along with the bundled Xerces jars). The doc in the above code is of type org.w3.dom.Document (the runtime class used is org.apache.html.dom.HTMLDocumentImpl)

The IBM J9 details are as follows:

java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20070323 (ifix 117674: SR4 + 116644 + 114941 + 116110 + 114881))
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323 (JIT enabled)
J9VM - 20070322_12058_lHdSMR
JIT  - 20070109_1805ifx3_r8
GC   - WASIFIX_2007)
JCL  - 20070131

Any idea, suggestion or workaround is appreciated. Thanks.

解决方案

I have 2 ideas.

  1. I have just verified that xerces is a part of the JRE installation, so I believe it arrives to the classpath of your application from there. Probably SUN and IBM bring you different versions of xerces. So, as a first approach check it and probably try to replace what you have under IBM to the SUN's version. If it helps you have 2 options: continue running IBM java with xerces from SUN or continue to investigate what's wrong with xerces from IBM.
  2. Are there other differences between your dev and production environments? Are these the same operating systems? Is it a chance that you are using (for example) windows for development and unix for production but your xml is written on Windows with \r\n as a new line? Or even more: if your XML contains unicode characters and written in windows it can contain special (invisible) prefix that indicates that this is unicode. This prefix may cause parser to fail.

这篇关于Xerces在SUN JRE v1.5和IBM J9 v1.5上表现不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆