“内容在序言中是不允许的”当在GAE上解析完全有效的XML时 [英] "Content is not allowed in prolog" when parsing perfectly valid XML on GAE

查看:803
本文介绍了“内容在序言中是不允许的”当在GAE上解析完全有效的XML时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在过去的48小时里,我一直在对付这个绝对令人愤怒的错误,所以我想我最终会扔掉毛巾,然后在我把笔记本电脑扔出窗外之前先试着问这里。

b
$ b

我试图从我对AWS SimpleDB的调用中解析响应XML。答案刚刚回来就好了;例如,它可能看起来像:

 <?xml version =1.0encoding =utf-8?> ; 
< ListDomainsResponse xmlns =http://sdb.amazonaws.com/doc/2009-04-15/>
< ListDomainsResult>
< DomainName>音频< / DomainName>
< DomainName>课程< / DomainName>
< DomainName> DocumentContents< / DomainName>
< DomainName> LectureSet< / DomainName>
< DomainName>元数据< / DomainName>
< DomainName>教授< / DomainName>
< DomainName>标记< / DomainName>
< / ListDomainsResult>
< ResponseMetadata>
< RequestId> 42330b4a-e134-6aec-e62a-5869ac2b4575< / RequestId>
< BoxUsage> 0.0000071759< / BoxUsage>
< / ResponseMetadata>
< / ListDomainsResponse>

我将这个XML传递给一个解析器,其中包含

  XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent()); 

并呼叫 eventReader.nextEvent(); 一堆时间来获得我想要的数据。



这是奇怪的部分 - 它在本地服务器内运行良好。回应来了,我解析它,每个人都很开心。问题是,当我将代码部署到Google App Engine时,传出的请求仍然有效,并且响应XML似乎与我完全相同且正确,但响应无法解析,出现以下异常:

  com.amazonaws.http.HttpClient handleResponse:无法取消编组响应([row,col]处的ParseError:[1,1] 
Message :prolog中不允许使用内容。):<?xml version =1.0encoding =utf-8?>
< ListDomainsResponse xmlns =http://sdb.amazonaws.com/doc/2009-04-15/>< ListDomainsResult>< DomainName> Audio< / DomainName>< DomainName> Course< /域名><域名> DocumentContents< /域名><域名> LectureSet< /域名><域名>元数据< /域名><域名>教授< /域名><域名>标记和LT; /域名>< /ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse>
javax.xml.stream.XMLStreamException:[row,col]处的ParseError:[1,1]
消息:prolog中不允许使用内容。
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)
...(其余行省略)

我对这个XML进行了双倍,三倍,四倍检查,查看'不可见字符'或非UTF8编码字符等。我在字节中逐字节地查看字节 - 订单标记或这种性质的东西。没有;它通过了我可以投入的每个验证测试。即使是陌生人,如果我也使用基于Saxon的解析器,它也会发生 - 但仅在GAE上,它始终在我的本地环境中正常工作。



它使得它非常当我只能在完美运行的环境下运行调试器时,很难追踪问题的代码(我还没有找到任何在GAE上远程调试的好方法)。尽管如此,使用原始手段我已经尝试了一百万种方法,包括:带有和不带序言的XML



  • 带和不带换行符
  • 在序言中带和不带encoding =属性
  • 两种换行符样式
  • li>
  • 在HTTP流中包含和不包含分块信息
    / b>

    我尝试了大部分这些在多个组合中,它们有意义的互动 - 没有什么!我在智慧的结尾。有没有人在之前看到过这样的问题,希望能对此有所了解?



    谢谢!

    解决方案

    XML和XSD(或DTD)中的编码不同。

    XML文件头:<?xml version ='1.0'编码='utf-8'?>

    XSD文件头:<?xml version ='1.0'encoding ='utf-16'? >



    另一种可能导致这种情况的原因是XML文档类型声明之前有任何内容。即你可能在缓冲区中有这样的东西:

      helloworld <?xml version =1.0encoding =utf-8 >?; 

    甚至是空格或特殊字符。

    有些特殊字符称为字节顺序标记,可能在缓冲区中。
    在将缓冲区传递给解析器之前,请执行此操作...

      String xml =<?xml .. 。; 
    xml = xml.trim()。replaceFirst(^([\\W] +)<,<);


    I've been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I'd finally throw in the towel and try asking here before I throw my laptop out the window.

    I'm trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for example, it may look like:

    <?xml version="1.0" encoding="utf-8"?> 
    <ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">
        <ListDomainsResult>
            <DomainName>Audio</DomainName>
            <DomainName>Course</DomainName>
            <DomainName>DocumentContents</DomainName>
            <DomainName>LectureSet</DomainName>
            <DomainName>MetaData</DomainName>
            <DomainName>Professors</DomainName>
            <DomainName>Tag</DomainName>
        </ListDomainsResult>
        <ResponseMetadata>
            <RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId>
            <BoxUsage>0.0000071759</BoxUsage>
        </ResponseMetadata>
    </ListDomainsResponse>
    

    I pass in this XML to a parser with

    XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent());
    

    and call eventReader.nextEvent(); a bunch of times to get the data I want.

    Here's the bizarre part -- it works great inside the local server. The response comes in, I parse it, everyone's happy. The problem is that when I deploy the code to Google App Engine, the outgoing request still works, and the response XML seems 100% identical and correct to me, but the response fails to parse with the following exception:

    com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1]
    Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?> 
    <ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse>
    javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
    Message: Content is not allowed in prolog.
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
        at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
        at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)
        ... (rest of lines omitted)
    

    I have double, triple, quadruple checked this XML for 'invisible characters' or non-UTF8 encoded characters, etc. I looked at it byte-by-byte in an array for byte-order-marks or something of that nature. Nothing; it passes every validation test I could throw at it. Even stranger, it happens if I use a Saxon-based parser as well -- but ONLY on GAE, it always works fine in my local environment.

    It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly (I haven't found any good way to remotely debug on GAE). Nevertheless, using the primitive means I have, I've tried a million approaches including:

    • XML with and without the prolog
    • With and without newlines
    • With and without the "encoding=" attribute in the prolog
    • Both newline styles
    • With and without the chunking information present in the HTTP stream

    And I've tried most of these in multiple combinations where it made sense they would interact -- nothing! I'm at my wit's end. Has anyone seen an issue like this before that can hopefully shed some light on it?

    Thanks!

    解决方案

    The encoding in your XML and XSD (or DTD) are different.
    XML file header: <?xml version='1.0' encoding='utf-8'?>
    XSD file header: <?xml version='1.0' encoding='utf-16'?>

    Another possible scenario that causes this is when anything comes before the XML document type declaration. i.e you might have something like this in the buffer:

    helloworld<?xml version="1.0" encoding="utf-8"?>  
    

    or even a space or special character.

    There are some special characters called byte order markers that could be in the buffer. Before passing the buffer to the Parser do this...

    String xml = "<?xml ...";
    xml = xml.trim().replaceFirst("^([\\W]+)<","<");
    

    这篇关于“内容在序言中是不允许的”当在GAE上解析完全有效的XML时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆