查找空标签和错误的属性值 [英] Find The empty tags and incorrect attribute values

查看:86
本文介绍了查找空标签和错误的属性值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

非常感谢您回答我之前的查询. :)

现在,我正在处理巨大的XML文件,而我的工作是在生成此巨大的XML文件的同时查找由软件创建的错误.

使用某些软件,所有这些文件都将转换为XML.

在两次转换之间,软件可能导致以下可能的错误.

1)有些标签可以保留为空白或为空.

2)指示符值属性之间的值可能不正确.

这两个可能的错误可能会最大程度地发生.

现在我的问题是


1)如何创建C#.Net应用程序以读取此XML文件并查找空标记的位置,并在记事本或任何文本文件上生成结果.(适用于每个空标记的规则)

2)根据标记之间给出的文本来匹配指定符的属性值.

示例XML标签下面:

Hi Everyone,

Thanks a lot for answering my previous queries. :)

Now a days I am working with huge XML files and my job is to find the error created by software while generating this huge XML files.

This all files are converted into XML by using some softwares.

In between this conversion the software can make the following possible errors.

1) The Some tags can be left blank or empty.

2) The value between the designator value attribute can be incorrect.

This two possible error can occur maximum.

Now My Questions are


1) how to create C#.Net Application to read this XML file and find the location of empty markups and generate the result on notepad or on any text file.(Rule applicable for every empty markup)

2) to match the designator vulue''s attribute value as per the text given between the markup.

below the sample XML tagging:

<lnci:content>ABC,123</lnci:content>
<heading>
<designator value="a">(a).</desigantor>
<title>This is Title.</title>
</heading>
<lawTextComponet>
<p>This is the Content.</p>
<p></p>
</lawTextComponet>
...



请尽快建议我.

问候
Mayur Alaspure



Please suggest me as soon as possible.

Regards
Mayur Alaspure

推荐答案

您可以尝试使用 XSD.exe .


为了匹配指示符的值,可以使用正则表达式.
You can try with XmlReaderSettings and an XSD.

With this MSDN example:

-it never loads the entire document
-the while(reader.Reader()) just enumerates the entire file at the node level
-validation is enabled via the XmlReaderSettings

For no empty string, use minlength in your XSD (you can generate you XSD with XSD.exe.


For match the designator value''s, you can use Regular Expressions.


创建有效的xml并使用xsd.exe(VS附带)生成XSD.

编辑您的exsd以使其具有字符串限制.

Create a valid xml and generate XSD using xsd.exe(comes with VS).

Edit your exsd to have string restrictions.

 <xs:element minOccurs="0" name="UserName">
  <xs:simpleType>
    <xs:restriction base="xs:string">
      <xs:minlength value="5" >
      <xs:maxLength value="50" >
    </xs:restriction>
  </xs:simpleType>
</xs:element>




现在,您可以使用它通过XmlSchemaValidator验证XML,可以在以下位置找到示例:
http://msdn.microsoft.com/en-us/library/system. xml.schema.xmlschemavalidator.aspx [ ^ ]




Now you use it to validate your xml using XmlSchemaValidator, an example can be found at:
http://msdn.microsoft.com/en-us/library/system.xml.schema.xmlschemavalidator.aspx[^]


我知道这看起来有些奇怪,但是有时语法上的文本操作比语义上的具有更好的性能.将XML与模式进行匹配并不总是最好的.如果我正确理解它,那么需要发现的第二个错误"更多是语义上的,模式验证就不会很简单.

这两个正则表达式可以确定有问题的元素:
1)<(.*?)>\s*</\1>
2)<designator value="(.*?)">(?!(\1<)).*?</desigantor>
好的,可以对它们进行完善,但是如果xml格式正确,就足够了.

唯一可以考虑的是文件的大小.我想它可以根据需要在加载时分段,因为它具有结构.但是几百个MiB的大小看起来并不多-当然,这取决于机器.该框架将能够对其进行处理,但可能会使用更多的虚拟内存.

顺便说一句,如果您决定走这条路,那么还有一些流之上的正则表达式实现,例如: http://www.developer.com/net/article.php/3719741/Building-a-Regular-Expression-Stream -Search-with-the-NET-Framework.htm [
I know this looks a little bit strange, but sometimes syntactic text manipulation gives better performance, than semantic one. Matching an xml against a schema is not always the best. If I have understood it correctly, the second "error" needed to be found is more semantic one, a schema validation would not be straightforward.

These two regular expressions could identify the problematic elements:
1) <(.*?)>\s*</\1>
2) <designator value="(.*?)">(?!(\1<)).*?</desigantor>
Ok, these could be refined, but if the xml is well formed, should be enough.

The only thing that could be considered in addition is the size of the file. I suppose it can be fragmented on load if needed, since it has a structure. But the size of several hundred MiB looks not extremely much - of course, this depends on the machine. The framework will be able to process it, but might use more virtual memory.

By the way, if you decide to take this path, there are also implementations of regex over streams out there, like this one: http://www.developer.com/net/article.php/3719741/Building-a-Regular-Expression-Stream-Search-with-the-NET-Framework.htm[^]

Good luck!


这篇关于查找空标签和错误的属性值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆