解析xml文件而不更改编码并保留文件格式 [英] parse an xml file without changing the encoding and preserving the file format

查看:65
本文介绍了解析xml文件而不更改编码并保留文件格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

原始xml文件使用 UTF-8编码,而没有BOM

<?xml version="1.0" encoding="UTF-8"?>
<some_text>
    <ada/>
    <file/>
    <title><![CDATA[]]></title>
    <code/>
    <parathrhseis/>
</some_text>

我尝试在此功能中将文本设置为 title :

I try to set text to title in this function:

Dim myXmlDocument As XmlDocument = New XmlDocument()
Dim node As XmlNode
Dim s As String

s = "name.xml"
If System.IO.File.Exists(s) = False Then
    Return False
End If

myXmlDocument.Load(s)
node = myXmlDocument.DocumentElement

Try
    For Each node In node.ChildNodes
        If node.Name = "title" Then
            node.FirstChild.InnerText = "text"
            Exit For
        End If
    Next

    myXmlDocument.Save(s)
Catch e As Exception
        MsgBox("Error in XmlParsing: " + e.Message)
        Return False
End Try

Return True

文字正确书写,但编码更改为带BOM的UTF-8 ,并且
添加空格:

The text is written correctly but the encoding changes to UTF-8 with BOM and also it
adds spaces:

<?xml version="1.0" encoding="UTF-8"?>
<some_text>
    <ada /> <- here
    <file /> <- here
    <title><![CDATA[text]]></title>
    <code /> <- here
    <parathrhseis /> <- here
</some_text>

我该如何解决这些问题

解决方案(在Bradley Uffner的帮助下)

SOLUTION (with the help of Bradley Uffner)

Dim fileReader As String

Try
    fileReader = My.Computer.FileSystem.ReadAllText("original.xml")

    fileReader = fileReader.Replace("<ada />", "<ada/>")
    fileReader = fileReader.Replace("<file />", "<file/>")
    fileReader = fileReader.Replace("<code />", "<code/>")
    fileReader = fileReader.Replace("<parathrhseis />", "<parathrhseis/>")

    File.WriteAllText("copy.xml", fileReader) <- File.WriteAllText automatically stores it without the BOM
Catch ex As Exception
    MsgBox("Error: " + ex.Message)

    Return
End Try

推荐答案

这实际上不是解析文件的问题,而是保存文件的问题.

This actually isn't a problem parsing the file, it's a problem saving it.

有关如何在不使用BOM的情况下保存xml的信息,请参见这篇文章. XDocument:将XML保存到不带BOM的文件中

See this post for how to save xml without a BOM. XDocument: saving XML to file without BOM

相关代码为:

Using writer = New XmlTextWriter(".\file.xml", New UTF8Encoding(False))
    doc.Save(writer)
End Using

通常,您可以通过XmlTextWriter的 .Settings 属性控制文档的格式,但是我看不到用于控制自闭元素间距的属性.通过将输出保存到流中并手动删除"/>"之前的任何空格,在将输出保存到文件系统之前,可能会比较幸运.

Typically you can control the formatting of the document via the .Settings property of the XmlTextWriter, but I don't see a property to control the spacing of self closing elements. You might have better luck post-processing the output before saving to the filesystem by saving it to a stream and manually removing any spaces before "/>".

这篇关于解析xml文件而不更改编码并保留文件格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆