在XmlTextReader对象中读取“伪" xml文档(xml片段) [英] Read a 'fake' xml document (xml fragment) in a XmlTextReader object

查看:45
本文介绍了在XmlTextReader对象中读取“伪" xml文档(xml片段)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[案例] 我已经揭露了一堆带有元数据的"xml文件",这些元数据中包含大量文档.至少,这就是我的要求.我在没有根元素的"xml文件"中收到的内容是这样的(我省略了一堆元素):

[Case] I have reveived a bunch of 'xml files' with metadata about a big number of documents in them. At least, that was what I requested. What I received where 'xml files' without a root element, they are structured something like this (i left out a bunch of elements):

<folder name = "abc"></folder>
<folder name = "abc/def">
<document name = "ghi1">
</document>
<document name = "ghi2">
</document>
</folder>

[问题] 当我尝试在XmlTextReader对象中读取文件时,它无法告诉我没有根元素.

[Problem] When I try to read the file in an XmlTextReader object it fails telling me that there is no root element.

[当前解决方法] 当然,我可以将文件读为流,追加<xmlroot>和</xmlroot>并将流写入新文件,然后在XmlTextReader中读取该文件.这正是我现在正在执行的操作,但是我不希望篡改"原始数据.

[Current workaround] Of course I can read the file as a stream, append < xmlroot> and < /xmlroot> and write the stream to a new file and read that one in XmlTextReader. Which is exactly what I am doing now, but I prefer not to 'tamper' with the original data.

[请求的解决方案] 我知道我应该为此使用XmlTextReader和DocumentFragment选项.但是,这会产生编译时错误:

[Requested solution] I understand that I should use XmlTextReader for this, with the DocumentFragment option. However, this gives the compiletime error:

"System.Xml.XmlException"类型的未处理异常发生在System.Xml.dll

An unhandled exception of type 'System.Xml.XmlException' occurred in System.Xml.dll

其他信息:不支持XmlNodeType DocumentFragment用于部分内容解析.第1行,位置1.

Additional information: XmlNodeType DocumentFragment is not supported for partial content parsing. Line 1, position 1.

[错误代码]

using System.Diagnostics;
using System.Xml;

namespace XmlExample
{
    class Program
    {
        static void Main(string[] args)
        {
            string file = @"C:\test.txt";
            XmlTextReader tr = new XmlTextReader(file, XmlNodeType.DocumentFragment, null);
            while(tr.Read())
                Debug.WriteLine("NodeType: {0} NodeName: {1}", tr.NodeType, tr.Name);
        }
    }
}

推荐答案

即使可以使用 ConformanceLevel.Fragment 选项使 XmlReader 读取数据,如所示Martijn编写的 XmlDataDocument 似乎不喜欢具有多个根元素的想法.

Even though the XmlReader can be made to read the data using the ConformanceLevel.Fragment option as demonstrated by Martijn, it seems that XmlDataDocument does not like the idea of having multiple root elements.

我认为我会尝试另一种方法,就像您当前正在使用的方法一样,但是没有中间文件.大多数XML库(XmlDocument,XDocument,XmlDataDocument)都可以使用 TextReader 作为输入,因此我实现了自己的一个.它的用法如下:

I thought I'd try a different approach, much like the one you're currently using, but without the intermediate file. Most XML libraries (XmlDocument, XDocument, XmlDataDocument) can take a TextReader as an input, so I've implemented one of my own. It's used like so:

var dataDocument = new XmlDataDocument();
dataDocument.Load(new FakeRootStreamReader(File.OpenRead("test.xml")));

实际类的代码:

public class FakeRootStreamReader : TextReader
{
    private static readonly char[] _rootStart;
    private static readonly char[] _rootEnd;

    private readonly TextReader _innerReader;
    private int _charsRead;
    private bool _eof;

    static FakeRootStreamReader()
    {
        _rootStart = "<root>".ToCharArray();
        _rootEnd = "</root>".ToCharArray();
    }

    public FakeRootStreamReader(Stream stream)
    {
        _innerReader = new StreamReader(stream);
    }

    public FakeRootStreamReader(TextReader innerReader)
    {
        _innerReader = innerReader;
    }

    public override int Read(char[] buffer, int index, int count)
    {
        if (!_eof && _charsRead < _rootStart.Length)
        {
            // Prepend root element
            return ReadFake(_rootStart, buffer, index, count);
        }

        if (!_eof)
        {
            // Normal reading operation
            int charsRead = _innerReader.Read(buffer, index, count);
            if (charsRead > 0) return charsRead;

            // We've reached the end of the Stream
            _eof = true;
            _charsRead = 0;
        }

        // Append root element end tag at the end of the Stream
        return ReadFake(_rootEnd, buffer, index, count);
    }

    private int ReadFake(char[] source, char[] buffer, int offset, int count)
    {
        int length = Math.Min(source.Length - _charsRead, count);
        Array.Copy(source, _charsRead, buffer, offset, length);
        _charsRead += length;
        return length;
    }
}

第一次调用 Read(...)只会返回< root> 元素.随后的调用会照常读取流,直到到达流的末尾,然后输出end标签.

The first call to Read(...) will return only the <root> element. Subsequent calls read the stream as normal, until the end of the stream is reached, then the end tag is outputted.

代码有点...嗯...主要是因为我想处理一些永远不会发生的情况,即有人试图一次读取少于6个字符的流.

The code is a bit... meh... mostly because I wanted to handle some never-gonna-happen cases where someone tries to read the stream less than 6 characters at a time.

这篇关于在XmlTextReader对象中读取“伪" xml文档(xml片段)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆