C#读取XML文件格式不正确 [英] c# read XML file not correctly formatted
问题描述
我必须读取没有根元素的XML文件来提取包含的数据. XML具有许多类似这样的元素:
I have to read an XML file, that has no root element, to extract contained data. The XML has many elements like these:
<DocumentElement>
<LOG_x0020_ParityRate>
<DATE>12/09/2017 - 00:00</DATE>
<CHANNELNAME>ParityRate</CHANNELNAME>
<SQL>update THROOMDISP set ID_HOTEL = '104', ID_ROOM = '920', NUM = '3', MYDATA = '20171006' where id_hotel =104 and id_room ='920' and MYDATA ='20171006'</SQL>
<ID_HOTEL>104</ID_HOTEL>
<TYPEREQUEST>updateTHROOMDISP(OK)</TYPEREQUEST>
</LOG_x0020_ParityRate>
</DocumentElement><DocumentElement>
<LOG_x0020_ParityRate>
<DATE>12/09/2017 - 00:00</DATE>
<CHANNELNAME>ParityRate</CHANNELNAME>
<SQL>update THROOMDISP set ID_HOTEL = '105', ID_ROOM = '923', NUM = '1', MYDATA = '20171006' where id_hotel =105 and id_room ='923' and MYDATA ='20171006'</SQL>
<ID_HOTEL>105</ID_HOTEL>
<TYPEREQUEST>updateTHROOMDISP(OK)</TYPEREQUEST>
</LOG_x0020_ParityRate>
</DocumentElement><DocumentElement>
<LOG_x0020_ParityRate>
<DATE>12/09/2017 - 00:00</DATE>
<CHANNELNAME>ParityRate</CHANNELNAME>
<SQL>update THROOMDISP set ID_HOTEL = '104', ID_ROOM = '920', NUM = '3', MYDATA = '20171007' where id_hotel =104 and id_room ='920' and MYDATA ='20171007'</SQL>
<ID_HOTEL>104</ID_HOTEL>
<TYPEREQUEST>updateTHROOMDISP(OK)</TYPEREQUEST>
</LOG_x0020_ParityRate>
</DocumentElement><DocumentElement>
我试图将其读取为字符串,手动添加打开和关闭标签,并像XDocument一样对其进行解析,但是它也有一些格式错误的标签,例如
I tried to read it as a string, add manually opening and closing tags, and parse it like an XDocument, but it has also some bad formatted tags, like these
</DocumentElement>
<TYPEREQUEST>updateTHROOMPRICE(OK)</TYPEREQUEST>
这些标签与任何开头的标签都不匹配,并且当我在结果字符串上调用XDocument.Parse
时,我会遇到异常.该文件具有数百万行,因此我无法逐行读取它,否则迭代将持续数小时.如何摆脱所有这些格式错误的标签并解析文档?
Where these tags doesn't match any opening tags, and when I call XDocument.Parse
on the resulting string I have exceptions. The file has millions of rows, so I can't read it line by line, or the iteration will last for hours. How can I get rid of all these bad formatted tags and parse the document?
推荐答案
您的xml格式不正确,而xml数据合并在一起时经常会发生这种情况.您的xml在根级别具有多个标签,因此请使用如下所示的XML阅读器:
You xml is simply not well formed which often happens when xml data is merged together. Your xml has multiple tags at root level so use XML reader like below :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication4
{
class Program
{
const string FILENAME = @"c:\temp\test.xml";
static void Main(string[] args)
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;
XmlReader reader = XmlReader.Create(FILENAME,settings);
while (!reader.EOF)
{
try
{
if (reader.Name != "LOG_x0020_ParityRate")
{
reader.ReadToFollowing("LOG_x0020_ParityRate");
}
if (!reader.EOF)
{
XElement parityRate = (XElement)XElement.ReadFrom(reader);
ParityRate newLog = new ParityRate();
ParityRate.logs.Add(newLog);
newLog.date = DateTime.ParseExact((string)parityRate.Element("DATE"), "MM/dd/yyyy - hh:mm", System.Globalization.CultureInfo.InvariantCulture);
newLog.name = (string)parityRate.Element("CHANNELNAME");
newLog.sql = (string)parityRate.Element("SQL");
newLog.hotel = (int)parityRate.Element("ID_HOTEL");
}
}
catch (Exception ex)
{
}
}
}
}
public class ParityRate
{
public static List<ParityRate> logs = new List<ParityRate>();
public DateTime date { get; set; }
public string name { get; set; }
public string sql { get; set; }
public int hotel { get; set; }
}
}
这篇关于C#读取XML文件格式不正确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!