是否可以使用 xmlreader 更改节点值? [英] is it possible to alter the node value with xmlreader?
问题描述
我正在读取大约 100 mb 的 XML 流,我想替换超过 1 mb 的值.
I'm reading in an XML stream that's approximately 100mb, and I'd like to replace values that are over 1mb.
示例输入
<root>
<visit>yes</visit>
<filedata>SDFSFDSDFfgdfgsgdf==(this is 5 mb)</filedata>
<type>pdf</type>
<moredata>sssssssssssssss (this 2mb)</moredata>
</root>
预期输出
<root>
<visit>yes</visit>
<filedata>REPLACED TEXT</filedata>
<type>pdf</type>
<moredata>REPLACED TEXT</moredata>
</root>
这是我用来读取流以及检查大小的内容:
Here's what I am using to read the stream, as well as checking the size:
XmlReader rdr = XmlReader.Create (new System.IO.StringReader (xml));
while (rdr.Read ()) {
if (rdr?.Value.Length > ONEMEGABYTE) {
//replace value with "REPLACE TEXT"}
}
如何替换rdr.Value
中的值?
How do I replace the value in rdr.Value
?
推荐答案
您可以子类化 XmlReader
以过滤"掉不需要的元素,然后使用 XmlDocument.Load()
与你的读者一起,而不是让它自己创造.
You can subclass XmlReader
to "filter" out undesired elements, then use XmlDocument.Load()
with your reader instead of letting it create its own.
请注意,这将仅排除违规标签的 value:如果在 Read() 循环中放置断点,您会发现 <foo>bar</foo>
分为三部分:
具有没有值的 NodeType 元素,bar"具有 NodeType Text,具有空的 LocalName,以及 </foo>
是没有值的 NodeType EndElement.如果bar"超过限制长度,下面的过滤器"会将
变成
要根据bar"的长度排除所有 <foo>bar</foo>
,您必须向前看.可行,但可能不值得你花时间.希望这不是这里的要求.
Note that this will exclude only the value of the offending tags: If you put a breakpoint in your Read() loop, you'll find that <foo>bar</foo>
comes in three pieces: <foo>
has NodeType Element with no value, "bar" has NodeType Text, with an empty LocalName, and </foo>
is NodeType EndElement with no value. If "bar" were over the limit length, the "filter" below would turn <foo>bar</foo>
into <foo></foo>
To exclude all of <foo>bar</foo>
based on the length of "bar", you'd have to look ahead. Doable, but maybe not worth your time. Hopefully that's not a requirement here.
这个类的替代(或添加)可能是一个带有 Func
的版本,每个 Value
都会通过:=>(s.Length > MAX_LEN) ?"" : s
.
An alternative (or addition) to this class might be a version of this with a Func<string, string>
that every Value
is passed through: s => (s.Length > MAX_LEN) ? "" : s
.
此外,据我所知,XmlTextReaderImpl
(_reader
的实际类型)可能会缓存整个文本并导致性能下降.您可能还必须为这件事写下自己的胆量.
Also, for all I know, XmlTextReaderImpl
(the actual type of _reader
) may cache the whole text and kill your performance anyway. You may have to write your own guts for the thing as well.
public class FilteredXmlReader : XmlReader
{
public Func<XmlReader, bool> Filter;
private XmlReader _reader;
private FilteredXmlReader(TextReader input, Func<XmlReader, bool> filterProc)
{
Filter = filterProc;
_reader = XmlReader.Create(input);
}
public static new XmlReader Create(TextReader input, Func<XmlReader, bool> filterProc)
{
return new FilteredXmlReader(input, filterProc);
}
public override bool Read()
{
var b = _reader.Read();
while (!(bool)Filter?.Invoke(_reader))
{
b = _reader.Read();
}
return b;
}
#region Wrapper Boilerplate
public override XmlNodeType NodeType => _reader.NodeType;
public override string LocalName => _reader.LocalName;
public override string NamespaceURI => _reader.NamespaceURI;
public override string Prefix => _reader.Prefix;
public override string Value => _reader.Value;
public override int Depth => _reader.Depth;
public override string BaseURI => _reader.BaseURI;
public override bool IsEmptyElement => _reader.IsEmptyElement;
public override int AttributeCount => _reader.AttributeCount;
public override bool EOF => _reader.EOF;
public override ReadState ReadState => _reader.ReadState;
public override XmlNameTable NameTable => _reader.NameTable;
public override string GetAttribute(string name) => _reader.GetAttribute(name);
public override string GetAttribute(string name, string namespaceURI) => _reader.GetAttribute(name, namespaceURI);
public override string GetAttribute(int i) => _reader.GetAttribute(i);
public override string LookupNamespace(string prefix) => _reader.LookupNamespace(prefix);
public override bool MoveToAttribute(string name) => _reader.MoveToAttribute(name);
public override bool MoveToAttribute(string name, string ns) => _reader.MoveToAttribute(name, ns);
public override bool MoveToElement() => _reader.MoveToElement();
public override bool MoveToFirstAttribute() => _reader.MoveToFirstAttribute();
public override bool MoveToNextAttribute() => _reader.MoveToNextAttribute();
public override bool ReadAttributeValue() => _reader.ReadAttributeValue();
public override void ResolveEntity() => _reader.ResolveEntity();
#endregion Wrapper Boilerplate
}
用法:
var xml = "<test />";
XmlDocument doc = new XmlDocument();
XmlReader rdr = FilteredXmlReader.Create(new System.IO.StringReader(xml),
r => r?.Value.Length < 20);
var filteredXML = doc.OuterXml;
这篇关于是否可以使用 xmlreader 更改节点值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!