C#读取XML文件格式不正确 [英] c# read XML file not correctly formatted

查看:85
本文介绍了C#读取XML文件格式不正确的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须读取没有根元素的XML文件来提取包含的数据. XML具有许多类似这样的元素:

I have to read an XML file, that has no root element, to extract contained data. The XML has many elements like these:

<DocumentElement>
  <LOG_x0020_ParityRate>
    <DATE>12/09/2017 - 00:00</DATE>
    <CHANNELNAME>ParityRate</CHANNELNAME>
    <SQL>update THROOMDISP set ID_HOTEL = '104', ID_ROOM = '920', NUM = '3', MYDATA = '20171006' where id_hotel =104 and id_room ='920' and MYDATA ='20171006'</SQL>
    <ID_HOTEL>104</ID_HOTEL>
    <TYPEREQUEST>updateTHROOMDISP(OK)</TYPEREQUEST>
  </LOG_x0020_ParityRate>
</DocumentElement><DocumentElement>
  <LOG_x0020_ParityRate>
    <DATE>12/09/2017 - 00:00</DATE>
    <CHANNELNAME>ParityRate</CHANNELNAME>
    <SQL>update THROOMDISP set ID_HOTEL = '105', ID_ROOM = '923', NUM = '1', MYDATA = '20171006' where id_hotel =105 and id_room ='923' and MYDATA ='20171006'</SQL>
    <ID_HOTEL>105</ID_HOTEL>
    <TYPEREQUEST>updateTHROOMDISP(OK)</TYPEREQUEST>
  </LOG_x0020_ParityRate>
</DocumentElement><DocumentElement>
  <LOG_x0020_ParityRate>
    <DATE>12/09/2017 - 00:00</DATE>
    <CHANNELNAME>ParityRate</CHANNELNAME>
    <SQL>update THROOMDISP set ID_HOTEL = '104', ID_ROOM = '920', NUM = '3', MYDATA = '20171007' where id_hotel =104 and id_room ='920' and MYDATA ='20171007'</SQL>
    <ID_HOTEL>104</ID_HOTEL>
    <TYPEREQUEST>updateTHROOMDISP(OK)</TYPEREQUEST>
  </LOG_x0020_ParityRate>
</DocumentElement><DocumentElement>

我试图将其读取为字符串,手动添加打开和关闭标签,并像XDocument一样对其进行解析,但是它也有一些格式错误的标签,例如

I tried to read it as a string, add manually opening and closing tags, and parse it like an XDocument, but it has also some bad formatted tags, like these

</DocumentElement>
<TYPEREQUEST>updateTHROOMPRICE(OK)</TYPEREQUEST>

这些标签与任何开头的标签都不匹配,并且当我在结果字符串上调用XDocument.Parse时,我会遇到异常.该文件具有数百万行,因此我无法逐行读取它,否则迭代将持续数小时.如何摆脱所有这些格式错误的标签并解析文档?

Where these tags doesn't match any opening tags, and when I call XDocument.Parse on the resulting string I have exceptions. The file has millions of rows, so I can't read it line by line, or the iteration will last for hours. How can I get rid of all these bad formatted tags and parse the document?

推荐答案

您的xml格式不正确,而xml数据合并在一起时经常会发生这种情况.您的xml在根级别具有多个标签,因此请使用如下所示的XML阅读器:

You xml is simply not well formed which often happens when xml data is merged together. Your xml has multiple tags at root level so use XML reader like below :

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;


namespace ConsoleApplication4
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            XmlReaderSettings settings = new XmlReaderSettings();
            settings.ConformanceLevel = ConformanceLevel.Fragment;
            XmlReader reader = XmlReader.Create(FILENAME,settings);
            while (!reader.EOF)
            {
                try
                {
                    if (reader.Name != "LOG_x0020_ParityRate")
                    {
                        reader.ReadToFollowing("LOG_x0020_ParityRate");
                    }
                    if (!reader.EOF)
                    {
                        XElement parityRate = (XElement)XElement.ReadFrom(reader);

                        ParityRate newLog = new ParityRate();
                        ParityRate.logs.Add(newLog);
                        newLog.date = DateTime.ParseExact((string)parityRate.Element("DATE"), "MM/dd/yyyy - hh:mm", System.Globalization.CultureInfo.InvariantCulture);
                        newLog.name = (string)parityRate.Element("CHANNELNAME");
                        newLog.sql = (string)parityRate.Element("SQL");
                        newLog.hotel = (int)parityRate.Element("ID_HOTEL");
                    }
                }
                catch (Exception ex)
                {
                }
            }
        }
    }
    public class ParityRate
    {
        public static List<ParityRate> logs = new List<ParityRate>();

        public DateTime date { get; set; }
        public string name { get; set; }
        public string sql { get; set; }
        public int hotel { get; set; }
    }
}

这篇关于C#读取XML文件格式不正确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆