文本转换为xml [英] text to xml conversion

查看:80
本文介绍了文本转换为xml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我被分配了使用C#将纯文本文件转换为xml文件的任务。纯文本文件包含:

 ## Main 
MachineName DomainName Scandate GUID RegID
RMM-LT-417 @ * @ Home.LOCAL @ * @ 03/23/2015 18:48:38 @ * @ 31c0841e-f7bf-4de1-9d75-7e9080498e6b-20141216020243430495 @ * @ 2853625

## AP_LogicalDrivesInformation
标题说明DriveType文件系统FreeSpace UsedSpace大小CFreeSpace CUsedSpace VolumeName VolumeSerialNumber
C @ * @硬盘驱动器@ * @驱动器固定@ * @ NTFS @ * @ 421002715136 @ * @ 79097778176 @ * @ 500100493312 @ * @ 392.1 GB @ * @ 73.7 GB @ * @ OS @ * @ 469D6D66
D @ * @ Hard Drive @ * @ Drive Fixed @ * @ NTFS @ * @ 484706508800 @ * @ 4765618176 @ * @ 489472126976 @ * @ 451.4 GB @ * @ 4.4 GB @ * @ DATAPART @ * @ 7A9EDCCC
E @ * @MATSHITA DVD + -RW UJ8E2 @ * @ Cd-Rom @ * @@ * @ 0 @ * @ 0 @ * @ 0 @ * @ 0 @ * @ 0 @ * @@ * @ 0



其中#### Main和## AP_LogicalDrivesInformation是root元素

MachineName DomainName Scandate GUID RegID是下面的节点和行,它是分割l的值@ * @后的ine。



我的代码是:

  var  xml =  new  StringBuilder(); 

xml.Append( < Main> \ n) ;

foreach var line in File.ReadAllLines( @ yourfile.txt))
{
Regex.Replace(line, @ (?< =(?:\ r?\ n){2} | \ A)(?:\r?\ n)+ );

var line1 = line.Replace( @ * @ ;);

var vals = line1.Split(' ;');

// TODO添加更多字段
xml.AppendFormat( < MachineName> {0}< / MachineName> \ n< DomainName> {1}< / DomainName> \ n< Scandate> {2}< / Scandate> \ n< GUID> {3}< / GUID> \ n< RegID> {4}< / RegID> \ n
vals [ 0 ]。修剪(),vals [ 1 ] .Trim(),vals [ 2 ]。修剪(),vals [ 3 ]。修剪( ),vals [ 4 ]。Trim());
xml.Append( < / Main> \ n);
}



但现在问题是没有为## AP_LogicalDrivesInformation

解决方案

<创建新的根节点blockquote>你只需检查你的逻辑,没有别的。允许XML只有一个top元素root(感谢Richard Deeming)。



我不会使用 StringBuilder string.Format (仅适用于非常严格的XML架构),或者更好的是, System.Xml.XmlWriter

https ://msdn.microsoft.com/en-us/library/system.xml.xmlwriter%28v=vs.110%29.aspx [ ^ ]。



-SA


这是一个可能的解决方案,但如果它正是你想要的那么我很难用您提供的信息。

生成的XML结构只是猜测,可能不适合您的需求。



首先阅读全部内容将文件转换为字符串。

(如果文件很大,这种方法可能不是最好的)

  string  content = File.ReadAllTe xt(< filename>); < /   filename   >  



然后在内容上使用此正则表达式:

正则表达式= < span class =code-keyword> new 正则表达式( @  ##(?< parent> \w +)\\\\ n(?< children> [\ w] +)\\\((?< values> [\ S] +)(\\\\ n (?!#)| 


))+,RegexOptions.None);



此表达式将给出你有三个命名组,其中'values'组有一个或多个捕获。

然后遍历所有匹配并创建XML结构。

我使用过XElement,但是也可以使用XmlDocument。

 XElement xeRoot =  new  XElement(  root);  //  将名称root更改为任何合适的 
foreach (匹配m in expression.Matches(content))
{
XElement xeParent = new XElement(m.Groups [ parent]。 );

string [] children = m.Groups [ children]。Value.Split(' '); // 将空格用作子项的分隔符

foreach (捕获上限 m.Groups [ values]。捕获)
{
XElement xeChild = new XElement( child); // 将名称child更改为任何合适的
string [] values = cap.Value.Split( new string [] { @ * @},StringSplitOptions.None);

// 检查子项和值计数是否相等
if (children.Length!= values.Length)
throw new 异常( 子项数和值不匹配。 );

for int i = 0 ; i < children.Length; i ++)
{
XElement xeChildValue = new XElement(children [i]);
xeChildValue.Value = values [i];
xeChild.Add(xeChildValue);
}

xeParent.Add(xeChild);
}

xeRoot.Add(xeParent);
}



最后将XML数据保存到文件

 XDocument doc =  new  XDocument(); 
doc.Add(xeRoot);
doc.Save( @ C:\Temp\test.xml) ;





生成的XML

 <?  xml     version   =  1.0   < span class =code-summarycomment> encoding   =  utf-8  >  
< < span class =code-leadattribute> root >
< 主要 >
< child >
< < span class =code-leadattribute> MachineName > RMM-LT-417 < / MachineName >
< DomainName > Home.LOCAL < / DomainName >
< ; Scandate > 03/23/2015 18:48:38 < / Scandate >
< GUID > 31c0841e-f7bf-4de1-9d75-7e9080498e6b-20141216020243430495 < / GUID >
< RegID > 2853625 < / RegID >
< / child >
< < span class =code-leadattribute> / Main
>
< AP_LogicalDrivesInformation >
< child >
< 标题 > C < / Caption >
< 描述 > 硬盘< / Description >
< DriveType > 云位置固定< / DriveType >
< FileSystem > NTFS < / FileSystem >
< FreeSpace > 421002715136 < / FreeSpace >
< UsedSpace > 79097778176 < / UsedSpace >
< 尺寸 > 500100493312 < /尺寸 >
< CFreeSpace > 392.1 GB < / CFreeSpace >
< CUsedSpace > 73.7 GB < / CUsedSpace >
< VolumeName > 操作系统< / VolumeName >
< VolumeSerialNumber > 469D6D66 < / VolumeSerialNumber >
< / child >
< child >
< 标题 > D < / Caption >
< 描述 > 硬盘驱动器< /描述 >
< DriveType > Drive Fixed < / DriveType >
< FileSystem > NTFS < / FileSystem >
< FreeSpace > 484706508800 < / FreeSpace >
< UsedSpace > 4765618176 < / UsedSpace >
< 尺寸 > 489472126976 < / Size >
< CFreeSpace > 451.4 GB < / CFreeSpace >
< CUsedSpace > 4.4 GB < / CUsedSpace >
< VolumeName > DATAPART < / VolumeName >
< VolumeSerialNumber > 7A9EDCCC < / VolumeSerialNumber >
< / child >
< 孩子 >
< 标题 > E < /标题 >
< 说明 > MATSHITA DVD + -RW UJ8E2 < / Description >
< DriveType > Cd-Rom < / DriveType >
< FileSystem > < / FileSystem >
< ; FreeSpace > 0 < / FreeSpace >
< ; UsedSpace > 0 < / UsedSpace >
< 尺寸 > 0 < / Size >
< CFreeSpace > 0 < / CFreeSpace >
< CUsedSpace > 0 < / CUsedSpace >
< VolumeName > < / VolumeName >
< VolumeSerialNumber > 0 < / VolumeSerialNumber >
< / child >
< < span class =code-leadattribute> / AP_LogicalDrivesInformation
>
< / root >


I am being assigned with the task to convert plain text file into xml file using C#.the plain text file contains:

##Main
MachineName DomainName Scandate GUID RegID
RMM-LT-417@*@Home.LOCAL@*@03/23/2015 18:48:38@*@31c0841e-f7bf-4de1-9d75-7e9080498e6b-20141216020243430495@*@2853625

##AP_LogicalDrivesInformation
Caption Description DriveType FileSystem FreeSpace UsedSpace Size CFreeSpace CUsedSpace VolumeName VolumeSerialNumber
C@*@Hard Drive@*@Drive Fixed@*@NTFS@*@421002715136@*@79097778176@*@500100493312@*@392.1 GB@*@73.7 GB@*@OS@*@469D6D66
D@*@Hard Drive@*@Drive Fixed@*@NTFS@*@484706508800@*@4765618176@*@489472126976@*@451.4 GB@*@4.4 GB@*@DATAPART@*@7A9EDCCC
E@*@MATSHITA DVD+-RW UJ8E2@*@Cd-Rom@*@@*@0@*@0@*@0@*@0@*@0@*@@*@0


where ####Main and ##AP_LogicalDrivesInformation are root element
MachineName DomainName Scandate GUID RegID are nodes and line below that are its value which splits line after @*@.

My code for the same is :

var xml = new StringBuilder();

xml.Append("<Main>\n");

foreach (var line in File.ReadAllLines(@"yourfile.txt"))
{
    Regex.Replace(line, @"(?<=(?:\r?\n){2}|\A)(?:\r?\n)+", "");

    var line1 = line.Replace("@*@", ";");

    var vals = line1.Split(';');

    // TODO add more fields
    xml.AppendFormat("<MachineName>{0}</MachineName>\n <DomainName>{1}</DomainName>\n <Scandate>{2}</Scandate>\n <GUID>{3}</GUID>\n <RegID>{4}</RegID>\n",
                   vals[0].Trim(), vals[1].Trim(), vals[2].Trim(), vals[3].Trim(), vals[4].Trim());
    xml.Append("</Main>\n");
}


But the problem now is no new root node is created for ##AP_LogicalDrivesInformation

解决方案

You need just to check your logic, nothing else. XML is allowed to have only one top element, root (credit to Richard Deeming).

I would not use StringBuilder though. I would use string.Format (for very rigid-format XML schema only) or, even better, System.Xml.XmlWriter:
https://msdn.microsoft.com/en-us/library/system.xml.xmlwriter%28v=vs.110%29.aspx[^].

—SA


This is one possible solution, however if it is exactly what you want is difficult for me to say with the little information you have provided.
The resulting XML structure is just a guesstimate and might not suit your needs.

First read the whole contents of the file into a string.
(If the file is very big, this approach might not be the best)

string content = File.ReadAllText(<filename>);</filename>


Then use this regular expression on the content:

Regex expression = new Regex(@"##(?<parent>\w+)\r\n(?<children>[\w ]+)\r\n((?<values>[\S ]+)(\r\n(?!#)|


))+", RegexOptions.None);


This expression will give you three named groups where of the 'values' group has one or more captures.
Then loop through all matches and create the XML structure.
I have used XElement, but XmlDocument can also be used.

XElement xeRoot = new XElement("root");   // Change the name 'root' to whatever suitable
foreach (Match m in expression.Matches(content))
{
    XElement xeParent = new XElement(m.Groups["parent"].Value);

    string[] children = m.Groups["children"].Value.Split(' ');      // Use space as delimter for the children
    
    foreach (Capture cap in m.Groups["values"].Captures)
    {
        XElement xeChild = new XElement("child");   // Change the name 'child' to whatever suitable
        string[] values = cap.Value.Split(new string[] { "@*@" }, StringSplitOptions.None);

        // Check that the children and values counts are equal
        if (children.Length != values.Length)
            throw new Exception("The number of children and values mismatch.");
      
        for (int i = 0; i < children.Length; i++)
        {
            XElement xeChildValue = new XElement(children[i]);
            xeChildValue.Value = values[i];
            xeChild.Add(xeChildValue);
        }

        xeParent.Add(xeChild);
    }
    
    xeRoot.Add(xeParent);
}


Finally save the XML data to file

XDocument doc = new XDocument();
doc.Add(xeRoot);
doc.Save(@"C:\Temp\test.xml");



Resulting XML

<?xml version="1.0" encoding="utf-8"?>
<root>
  <Main>
    <child>
      <MachineName>RMM-LT-417</MachineName>
      <DomainName>Home.LOCAL</DomainName>
      <Scandate>03/23/2015 18:48:38</Scandate>
      <GUID>31c0841e-f7bf-4de1-9d75-7e9080498e6b-20141216020243430495</GUID>
      <RegID>2853625</RegID>
    </child>
  </Main>
  <AP_LogicalDrivesInformation>
    <child>
      <Caption>C</Caption>
      <Description>Hard Drive</Description>
      <DriveType>Drive Fixed</DriveType>
      <FileSystem>NTFS</FileSystem>
      <FreeSpace>421002715136</FreeSpace>
      <UsedSpace>79097778176</UsedSpace>
      <Size>500100493312</Size>
      <CFreeSpace>392.1 GB</CFreeSpace>
      <CUsedSpace>73.7 GB</CUsedSpace>
      <VolumeName>OS</VolumeName>
      <VolumeSerialNumber>469D6D66</VolumeSerialNumber>
    </child>
    <child>
      <Caption>D</Caption>
      <Description>Hard Drive</Description>
      <DriveType>Drive Fixed</DriveType>
      <FileSystem>NTFS</FileSystem>
      <FreeSpace>484706508800</FreeSpace>
      <UsedSpace>4765618176</UsedSpace>
      <Size>489472126976</Size>
      <CFreeSpace>451.4 GB</CFreeSpace>
      <CUsedSpace>4.4 GB</CUsedSpace>
      <VolumeName>DATAPART</VolumeName>
      <VolumeSerialNumber>7A9EDCCC</VolumeSerialNumber>
    </child>
    <child>
      <Caption>E</Caption>
      <Description>MATSHITA DVD+-RW UJ8E2</Description>
      <DriveType>Cd-Rom</DriveType>
      <FileSystem></FileSystem>
      <FreeSpace>0</FreeSpace>
      <UsedSpace>0</UsedSpace>
      <Size>0</Size>
      <CFreeSpace>0</CFreeSpace>
      <CUsedSpace>0</CUsedSpace>
      <VolumeName></VolumeName>
      <VolumeSerialNumber>0</VolumeSerialNumber>
    </child>
  </AP_LogicalDrivesInformation>
</root>


这篇关于文本转换为xml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆