是否可以使用 xmlreader 更改节点值? [英] is it possible to alter the node value with xmlreader?

查看:39
本文介绍了是否可以使用 xmlreader 更改节点值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读取大约 100 mb 的 XML 流,我想替换超过 1 mb 的值.

I'm reading in an XML stream that's approximately 100mb, and I'd like to replace values that are over 1mb.

示例输入

<root>
    <visit>yes</visit>
    <filedata>SDFSFDSDFfgdfgsgdf==(this is 5 mb)</filedata>
    <type>pdf</type>
    <moredata>sssssssssssssss (this 2mb)</moredata>
</root>

预期输出

<root>
    <visit>yes</visit>
    <filedata>REPLACED TEXT</filedata>
    <type>pdf</type>
    <moredata>REPLACED TEXT</moredata>
</root>

这是我用来读取流以及检查大小的内容:

Here's what I am using to read the stream, as well as checking the size:

XmlReader rdr = XmlReader.Create (new System.IO.StringReader (xml));
while (rdr.Read ()) {
    if (rdr?.Value.Length > ONEMEGABYTE) {
        //replace value with "REPLACE TEXT"}
    }

如何替换rdr.Value中的值?

How do I replace the value in rdr.Value?

推荐答案

您可以子类化 XmlReader 以过滤"掉不需要的元素,然后使用 XmlDocument.Load()与你的读者一起,而不是让它自己创造.

You can subclass XmlReader to "filter" out undesired elements, then use XmlDocument.Load() with your reader instead of letting it create its own.

请注意,这将仅排除违规标签的 value:如果在 Read() 循环中放置断点,您会发现 <foo>bar</foo> 分为三部分: 具有没有值的 NodeType 元素,bar"具有 NodeType Text,具有空的 LocalName,以及 </foo> 是没有值的 NodeType EndElement.如果bar"超过限制长度,下面的过滤器"会将bar变成要根据bar"的长度排除所有 <foo>bar</foo> ,您必须向前看.可行,但可能不值得你花时间.希望这不是这里的要求.

Note that this will exclude only the value of the offending tags: If you put a breakpoint in your Read() loop, you'll find that <foo>bar</foo> comes in three pieces: <foo> has NodeType Element with no value, "bar" has NodeType Text, with an empty LocalName, and </foo> is NodeType EndElement with no value. If "bar" were over the limit length, the "filter" below would turn <foo>bar</foo> into <foo></foo> To exclude all of <foo>bar</foo> based on the length of "bar", you'd have to look ahead. Doable, but maybe not worth your time. Hopefully that's not a requirement here.

这个类的替代(或添加)可能是一个带有 Func 的版本,每个 Value 都会通过:=>(s.Length > MAX_LEN) ?"" : s.

An alternative (or addition) to this class might be a version of this with a Func<string, string> that every Value is passed through: s => (s.Length > MAX_LEN) ? "" : s.

此外,据我所知,XmlTextReaderImpl(_reader 的实际类型)可能会缓存整个文本并导致性能下降.您可能还必须为这件事写下自己的胆量.

Also, for all I know, XmlTextReaderImpl (the actual type of _reader) may cache the whole text and kill your performance anyway. You may have to write your own guts for the thing as well.

public class FilteredXmlReader : XmlReader
{
    public Func<XmlReader, bool> Filter;

    private XmlReader _reader;
    private FilteredXmlReader(TextReader input, Func<XmlReader, bool> filterProc)
    {
        Filter = filterProc;
        _reader = XmlReader.Create(input);
    }

    public static new XmlReader Create(TextReader input, Func<XmlReader, bool> filterProc)
    {
        return new FilteredXmlReader(input, filterProc);
    }

    public override bool Read()
    {
        var b = _reader.Read();

        while (!(bool)Filter?.Invoke(_reader))
        {
            b = _reader.Read();
        }

        return b;
    }

    #region Wrapper Boilerplate

    public override XmlNodeType NodeType => _reader.NodeType;

    public override string LocalName => _reader.LocalName;

    public override string NamespaceURI => _reader.NamespaceURI;

    public override string Prefix => _reader.Prefix;

    public override string Value => _reader.Value;

    public override int Depth => _reader.Depth;

    public override string BaseURI => _reader.BaseURI;

    public override bool IsEmptyElement => _reader.IsEmptyElement;

    public override int AttributeCount => _reader.AttributeCount;

    public override bool EOF => _reader.EOF;

    public override ReadState ReadState => _reader.ReadState;

    public override XmlNameTable NameTable => _reader.NameTable;

    public override string GetAttribute(string name) => _reader.GetAttribute(name);

    public override string GetAttribute(string name, string namespaceURI) => _reader.GetAttribute(name, namespaceURI);

    public override string GetAttribute(int i) => _reader.GetAttribute(i);

    public override string LookupNamespace(string prefix) => _reader.LookupNamespace(prefix);

    public override bool MoveToAttribute(string name) => _reader.MoveToAttribute(name);

    public override bool MoveToAttribute(string name, string ns) => _reader.MoveToAttribute(name, ns);

    public override bool MoveToElement() => _reader.MoveToElement();

    public override bool MoveToFirstAttribute() => _reader.MoveToFirstAttribute();

    public override bool MoveToNextAttribute() => _reader.MoveToNextAttribute();

    public override bool ReadAttributeValue() => _reader.ReadAttributeValue();

    public override void ResolveEntity() => _reader.ResolveEntity();

    #endregion Wrapper Boilerplate
}

用法:

var xml = "<test />";
XmlDocument doc = new XmlDocument();

XmlReader rdr = FilteredXmlReader.Create(new System.IO.StringReader(xml), 
                    r => r?.Value.Length < 20);

var filteredXML = doc.OuterXml;

这篇关于是否可以使用 xmlreader 更改节点值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆