校正用ReadToDescendant和/或ReadElementContentAsObject XmlReader的问题 [英] Correcting XmlReader problems using ReadToDescendant and/or ReadElementContentAsObject

查看:304
本文介绍了校正用ReadToDescendant和/或ReadElementContentAsObject XmlReader的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我工作在一个神秘的错误,在平时非常优秀的开源项目 Excel数据读取器。从我特别的OpenXML的.xlsx US preadsheet阅读它跳过值。

I'm working on a mysterious bug in the usually very good open source project Excel Data Reader. It's skipping values reading from my particular OpenXML .xlsx spreadsheet.

这个问题是发生在<一href="http://exceldatareader.$c$cplex.com/sourcecontrol/changeset/view/38911?projectName=ExcelDataReader#600479"相对=nofollow> ReadSheetRow方法(以下示范code)。源XML保存Excel和不包含空格也就是当奇怪的现象发生。但是XML已经格式化用空格(例如,在Visual Studio中去编辑,高级,格式文档)的作品完全没问题!

The problem is occurring in the ReadSheetRow method (demonstration code below). The source XML is saved by Excel and contains no whitespace which is when the strange behaviour occurs. However XML that has been reformatted with whitespace (e.g. in Visual Studio go to Edit, Advanced, Format Document) works completely fine!

与空白的测试数据:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    <sheetData>
        <row r="5" spans="1:73" s="7" customFormat="1">
            <c r="B5" s="12">
                <v>39844</v>
            </c>
            <c r="C5" s="8"/>
            <c r="D5" s="8"/>
            <c r="E5" s="8"/>
            <c r="F5" s="8"/>
            <c r="G5" s="8"/>
            <c r="H5" s="12">
                <v>39872</v>
            </c>
            <c r="I5" s="8"/>
            <c r="J5" s="8"/>
            <c r="K5" s="8"/>
            <c r="L5" s="8"/>
            <c r="M5" s="8"/>
            <c r="N5" s="12">
                <v>39903</v>
            </c>
        </row>
    </sheetData>
</worksheet>

无空白的测试数据:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"><sheetData><row r="5" spans="1:73" s="7" customFormat="1"><c r="B5" s="12"><v>39844</v></c><c r="C5" s="8"/><c r="D5" s="8"/><c r="E5" s="8"/><c r="F5" s="8"/><c r="G5" s="8"/><c r="H5" s="12"><v>39872</v></c><c r="I5" s="8"/><c r="J5" s="8"/><c r="K5" s="8"/><c r="L5" s="8"/><c r="M5" s="8"/><c r="N5" s="12"><v>39903</v></c></row></sheetData></worksheet>

示例code演示该问题:

注意的 A 是之后的 _xmlReader.Read() B 之后的 ReadToDescendant ,和 C 之后的 ReadElementContentAsObject

Note that A is output after _xmlReader.Read(), B after ReadToDescendant, and C after ReadElementContentAsObject.

while (reader.Read())
{
    if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*A* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));

    if (reader.NodeType == XmlNodeType.Element && reader.Name == "c")
    {
        string a_s = reader.GetAttribute("s");
        string a_t = reader.GetAttribute("t");
        string a_r = reader.GetAttribute("r");

        bool matchingDescendantFound = reader.ReadToDescendant("v");
        if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*B* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
        object o = reader.ReadElementContentAsObject();
        if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*C* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
    }
}

测试结果XML与空白:


*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"'
*A* NodeType: Element, Name: 'worksheet', Empty: False, Value: ''
*A* NodeType: Element, Name: 'sheetData', Empty: False, Value: ''
*A* NodeType: Element, Name: 'row', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: False, Value: ''
*B* NodeType: Element, Name: 'v', Empty: False, Value: ''
*A* NodeType: EndElement, Name: 'c', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: True, Value: ''
*B* NodeType: Element, Name: 'c', Empty: True, Value: ''
...

测试结果XML没有空白:


*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"'
*A* NodeType: Element, Name: 'worksheet', Empty: False, Value: ''
*A* NodeType: Element, Name: 'sheetData', Empty: False, Value: ''
*A* NodeType: Element, Name: 'row', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: False, Value: ''
*B* NodeType: Element, Name: 'v', Empty: False, Value: ''
*C* NodeType: EndElement, Name: 'c', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: True, Value: ''
*B* NodeType: Element, Name: 'c', Empty: True, Value: ''
...

格局的变化表明 ReadElementContentAsObject 或可能的位置的 ReadToDescendant 移动的XmlReader来。

The pattern changes indicate an issue in ReadElementContentAsObject or possibly the location that ReadToDescendant moves the XmlReader to.

有谁知道可能会发生在这里?

Does anyone know what might be happening here?

推荐答案

这是相当简单的。正如你可以从输出,第一次你在 B 的行看,你定位在第一'V'元素。然后,你叫ReadElementContentAsObject。这返回v的文本内容,的读取器移到结尾元素标记。 (诉)。你现在指向一个空白节点,如果有空白,或(c)的一个的EndElement节点如果没有。当然,如果是空白的输出不打印。无论哪种方式,你那么做read()和前进到下一个元素。在非空白的情况下,你已经失去了的EndElement。

It's fairly simple. As you can see from the output, the first time you're on the "B" line, you're positioned at the first 'v' Element. Then, you call ReadElementContentAsObject. That returns the text content of v, and "moves the reader past the end element tag." (of v). You are now pointing to a whitespace node if there is whitespace, or an EndElement node (of c) if there is not. Of course, your output doesn't print if it's whitespace. Either way, you then do a Read() and move on to the next element. In the case of the non-whitespace, you have lost the EndElement.

现在的问题是在其他situtations差很多。当你交流的ReadElementContentAsObject(称之为C1),你再继续前进下一个C(C2)。然后,你做一个阅读,移动到C3,而失去C2为好。

The problem is much worse in other situtations. When you do a ReadElementContentAsObject of a c (call it c1), you then move on the next c (c2). Then you do a Read, moving to c3, and lose c2 for good.

我不打算尝试修复<一href="http://exceldatareader.$c$cplex.com/sourcecontrol/changeset/view/38911?projectName=ExcelDataReader#600479"相对=nofollow>真正的code 。但是,很明显你需要担心的,向前移动流中的多个位置。这是一个循环的一般错误的常见原因。

I'm not going to try to fix the real code. But it's clear what you need to worry about, moving the stream forward in more than one place. This is a common source of looping errors in general.

这篇关于校正用ReadToDescendant和/或ReadElementContentAsObject XmlReader的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆