如何判断一个字符串是XML? [英] How to tell if a string is xml?

查看:360
本文介绍了如何判断一个字符串是XML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个字符串字段可以包含XML或纯文本。该XML不包含<?XML 头,并没有根元素,即不能很好地形成



我们需要能够纂XML数据,排空元素和属性值,仅留下自己的名字,所以我需要测试,如果这个字符串是XML它的节录了。



目前我用这个方法:

 字符串纂(字符串eventDetail)
{
串细节= eventDetail 。修剪();
如果(detail.StartsWith(&下;!)及&放大器;!detail.EndsWith(>中))返回eventDetail;
...



有没有更好的办法?



是否有任何优势情况下,这种做法可能会错过?



我明白我可以使用 XmlDocument.LoadXml 和catch XmlException ,但这种感觉就像一个昂贵的选择,因为我已经知道很多数据不会在XML



下面是XML数据的一个例子,除了缺少根元素(其中省略,以节省空间,因为会有大量的数据),我们可以假定它是格式良好的:

 <表名FirstField =富SecondField =酒吧/> 
<表名FirstField =富SecondField =酒吧/>
...

目前,我们只使用基于属性值,但是我们可以使用元素在未来,如果数据变得更加复杂。



SOLUTION



根据多个意见(谢谢你们!)

 字符串纂(字符串eventDetail)
{
如果(字符串。 IsNullOrEmpty(eventDetail))返回eventDetail; // + 1单元测试:)
串细节= eventDetail.Trim();
如果(detail.StartsWith(&下;!)及&放大器;!detail.EndsWith(>中))返回eventDetail;
XmlDocument的XML =新的XmlDocument();

{
xml.LoadXml(的String.Format(<根和GT; {0}< /根>中,细节));
}
赶上(XmlException E)
{
log.WarnFormat(数据未删节陷入{0}加载eventDetail {1}​​,e.Message,eventDetail);
返回eventDetail;
}
... //纂


解决方案

一种可能性是混合两种溶液。您可以使用您纂方法,并尝试加载它(如果内部)。通过这种方式,你只尝试加载哪些可能是一个良好的XML,并丢弃大多数非XML条目。


We have a string field which can contain XML or plain text. The XML contains no <?xml header, and no root element, i.e. is not well formed.

We need to be able to redact XML data, emptying element and attribute values, leaving just their names, so I need to test if this string is XML before it's redacted.

Currently I'm using this approach:

string redact(string eventDetail)
{
    string detail = eventDetail.Trim();
    if (!detail.StartsWith("<") && !detail.EndsWith(">")) return eventDetail;
    ...

Is there a better way?

Are there any edge cases this approach could miss?

I appreciate I could use XmlDocument.LoadXml and catch XmlException, but this feels like an expensive option, since I already know that a lot of the data will not be in XML.

Here's an example of the XML data, apart from missing a root element (which is omitted to save space, since there will be a lot of data), we can assume it is well formed:

<TableName FirstField="Foo" SecondField="Bar" /> 
<TableName FirstField="Foo" SecondField="Bar" /> 
...

Currently we are only using attribute based values, but we may use elements in the future if the data becomes more complex.

SOLUTION

Based on multiple comments (thanks guys!)

string redact(string eventDetail)
{
    if (string.IsNullOrEmpty(eventDetail)) return eventDetail; //+1 for unit tests :)
    string detail = eventDetail.Trim();
    if (!detail.StartsWith("<") && !detail.EndsWith(">")) return eventDetail;
    XmlDocument xml = new XmlDocument();
    try
    {
        xml.LoadXml(string.Format("<Root>{0}</Root>", detail));
    }
    catch (XmlException e)
    {
        log.WarnFormat("Data NOT redacted. Caught {0} loading eventDetail {1}", e.Message, eventDetail);
        return eventDetail;
    }
    ... // redact

解决方案

One possibility is to mix both solutions. You can use your redact method and try to load it (inside the if). This way, you'll only try to load what is likely to be a well-formed xml, and discard most of the non-xml entries.

这篇关于如何判断一个字符串是XML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆