如何从之前建造使用该数据一个XmlReader或XPathDocument的基于XML的数据源中删除无效的十六进制字符? [英] How do you remove invalid hexadecimal characters from an XML-based data source prior to constructing an XmlReader or XPathDocument that uses the data?

查看:142
本文介绍了如何从之前建造使用该数据一个XmlReader或XPathDocument的基于XML的数据源中删除无效的十六进制字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有干净的基于XML数据源中的一个XmlReader使用它,这样我可以正常使用XML数据是不符合的要放在XML的十六进制字符的限制之前,任何简单的/通用的方式?

Is there any easy/general way to clean an XML based data source prior to using it in an XmlReader so that I can gracefully consume XML data that is non-conformant to the hexadecimal character restrictions placed on XML?

请注意:


  • 解决方案需要处理XML
    使用字符数据源
    编码不是UTF-8,如其他通过
    在指定的字符编码
    XML文档声明。不
    重整的字符编码
    而剥离无效源
    十六进制字符一直是
    主要症结。

  • 去除无效的十六进制字符应只删除十六进制EN codeD值,你可以经常发现在发生数据HREF值包含一个字符串,将是一个十六进制字符的字符串匹配。

背景:

我需要消耗一个符合特定格式基于XML的数据源(认为Atom或RSS源),但希望能够消耗已发布的数据源包含每个XML规范无效的十六进制字符。

I need to consume an XML-based data source that conforms to a specific format (think Atom or RSS feeds), but want to be able to consume data sources that have been published which contain invalid hexadecimal characters per the XML specification.

在.NET中,如果你有一个流的重新presents XML数据源,然后尝试使用一个XmlReader和/或XPathDocument的解析它,就抛出一个异常由于包含在无效的十六进制字符XML数据。我目前解决这个问题的尝试是解析流作为一个字符串,并使用正前pression删除和/或替换无效的十六进制字符,但我期待一个更高性能的解决方案。

In .NET if you have a Stream that represents the XML data source, and then attempt to parse it using an XmlReader and/or XPathDocument, an exception is raised due to the inclusion of invalid hexadecimal characters in the XML data. My current attempt to resolve this issue is to parse the Stream as a string and use a regular expression to remove and/or replace the invalid hexadecimal characters, but I am looking for a more performant solution.

推荐答案

它的可能无法完美(因为人的补充强调缺少此声明),但我已经在这种情况下做的是以下。您可以调整甲流使用。

It may not be perfect (emphasis added since people missing this disclaimer), but what I've done in that case is below. You can adjust to use with a stream.

/// <summary>
/// Removes control characters and other non-UTF-8 characters
/// </summary>
/// <param name="inString">The string to process</param>
/// <returns>A string with no control characters or entities above 0x00FD</returns>
public static string RemoveTroublesomeCharacters(string inString)
{
    if (inString == null) return null;

    StringBuilder newString = new StringBuilder();
    char ch;

    for (int i = 0; i < inString.Length; i++)
    {

        ch = inString[i];
        // remove any characters outside the valid UTF-8 range as well as all control characters
        // except tabs and new lines
        //if ((ch < 0x00FD && ch > 0x001F) || ch == '\t' || ch == '\n' || ch == '\r')
        //if using .NET version prior to 4, use above logic
        if (XmlConvert.IsXmlChar(ch)) //this method is new in .NET 4
        {
            newString.Append(ch);
        }
    }
    return newString.ToString();

}

这篇关于如何从之前建造使用该数据一个XmlReader或XPathDocument的基于XML的数据源中删除无效的十六进制字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆