使用LINQ to XML将HTML标签保留在XML中 [英] Keep HTML tags in XML using LINQ to XML

查看:84
本文介绍了使用LINQ to XML将HTML标签保留在XML中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个xml文件,正在使用LINQ to XML从中提取html.这是该文件的示例:

I have an xml file from which I am extracting html using LINQ to XML. This is a sample of the file:

<?xml version="1.0" encoding="utf-8" ?>
<tips>
    <tip id="0">
    This is the first tip.
</tip>
<tip id="1">
    Use <b>Windows Live Writer</b> or <b>Microsoft Word 2007</b> to create and publish content.
</tip>
<tip id="2">
    Enter a <b>url</b> into the box to automatically screenshot and index useful webpages.
</tip>
<tip id="3">
    Invite your <b>colleagues</b> to the site by entering their email addresses.  You can then share the content with them!
</tip>
</tips>

我正在使用以下查询从文件中提取提示":

I am using the following query to extract a 'tip' from the file:

Tip tip = (from t in tipsXml.Descendants("tip")
                   where t.Attribute("id").Value == nextTipId.ToString()
                   select new Tip()
                   {
                     TipText= t.Value,
                     TipId = nextTipId
                   }).First();

我的问题是HTML元素被剥离了.我希望使用InnerHtml之类的东西来代替Value,但这似乎不存在.

The problem I have is that the Html elements are being stripped out. I was hoping for something like InnerHtml to use instead of Value, but that doesn't seem to be there.

有什么想法吗?

预先感谢

戴夫

推荐答案

调用t.ToString()而不是Value.这将以字符串形式返回XML.您可能要使用带SaveOptions的重载来禁用格式设置.我目前无法检查,但我怀疑它会包含element标签(和elements),因此您需要将其剥离.

Call t.ToString() instead of Value. That will return the XML as a string. You may want to use the overload taking SaveOptions to disable formatting. I can't check right now, but I suspect it will include the element tag (and elements) so you would need to strip this off.

请注意,如果您的HTML无效的XML,最终将导致整个XML文件无效.

Note that if your HTML isn't valid XML, you will end up with an invalid overall XML file.

XML文件的格式是否完全不受您的控制?对于其中的任何HTML进行XML编码都会更好.

Is the format of the XML file completely out of your control? It would be nicer for any HTML inside to be XML-encoded.

避免获取外部部分的一种方法可能是执行以下操作(当然,这是从查询中调用的单独方法):

One way of avoiding getting the outer part might be to do something like this (in a separate method called from your query, of course):

StringBuilder builder = new StringBuilder();
foreach (XNode node in element.Nodes())
{
    builder.Append(node.ToString());
}

这样,您将获得HTML元素及其后代和散布的文本节点.我强烈怀疑,基本上它相当于InnerXml.

That way you'll get HTML elements with their descendants and interspersed text nodes. Basically it's the equivalent of InnerXml, I strongly suspect.

这篇关于使用LINQ to XML将HTML标签保留在XML中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆