如果没有结束标签,如何删除开始标签? [英] How to Remove opening tags if ending tags are not there ?

查看:94
本文介绍了如果没有结束标签,如何删除开始标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



此代码中的小问题..

Hi ,

small issue in this code..

<ul>
    <li><u><em><strong>Hi    </strong></em></u></li>
    <li><u><em><strong>Hello    </strong></em></u></li>
    <li><u><em><strong>How r u.      </li>
</ul>





在"第三列表项"的代码中,没有结束标记(</strong></em></u>)

所以我要的是,我要删除(< u>< em>< strong>)这些第三列表项"
的标签


在此先谢谢您..





In this code for "Third list item" there is no ending tag(</strong></em></u>)

so what i want is, i want to remove (<u><em><strong>) these tags for "Third List Item"



Thanks in advance..

推荐答案

确定-这在算法上有点有趣,我会告诉您如何解决,但是您必须自己弄清楚代码.

您需要将前向阅读解析器与递归对象或函数调用结合使用.在每个打开的标签上,您都需要进一步递归,将从该点开始的文本存储在新对象(或堆栈参数)中.在读取任何结束标记时终止每个递归级别.如果结束标记与开始标记匹配,则在响应中包括两个标记,否则只需返回标记和结束标记之间的文本.

这样,您要么获得带有匹配标签的文本,要么获得单独的文本,从而满足您的需求.您正在删除没有结束标签的开始标签.希望这有道理...
OK - this is a bit of algorithmic fun and I''ll tell you how I''d go about it, but you''ll have to figure the code out yourself.

You need to combine a forward reading parser with a recursive object or functional call. On every open tag you need to go one level further of recursion, storing the text from that point forward in your new object (or stack parameter). Terminate each level of recursion on reading any closing tag. If the closing tag matches the starting tag include both tags in the response, otherwise just return the text between the tags and the closing one.

This way, you either get the text with matching tags either side or the text on its own which gives you your requirement. You''re removing the starting tag where there is no closing tag. Hope that makes some sort of sense...


我的空闲时间为您完成了全部工作.

My complete work for you in a free time.

String HTMLStr = "<ul><li><u><em><strong>Hi    </strong></em></u></li><li><u><em><strong>Hello    </strong></em></u></li>    <li><u><em><strong>How r u.      </li></ul>";
        Regex regex = new Regex("\\<[^\\>]*\\>");
        MatchCollection collection = regex.Matches(HTMLStr);
        List<coll> list = new List<coll>();
        foreach (Match match in collection)
        {
            list.Add(new coll() { POS = match.Index, TAG = match.Value.ToString() });
        }

        for (int i = 0; i < collection.Count / 2; i++)
        {
            bool temp = false;
            foreach (coll col in list)
            {
                if (!col.TAG.Contains("/"))
                {
                    foreach (coll col1 in list)
                    {
                        if (col1.TAG.Contains("/"))
                        {
                            if (col.TAG.Replace(" ", "") == col1.TAG.Replace("/", "").Replace(" ", ""))
                            {
                                list.Remove(col);
                                list.Remove(col1);
                                temp = true;
                                break;
                            }
                        }
                    }
                }
                if (temp)
                    break;
            }
            
        }
        foreach (coll col in list)
        {
            HTMLStr = HTMLStr.Remove(col.POS, col.TAG.Length);
        }



类coll



Class coll

public class coll
{
public int POS {get;set;}
public string TAG {get;set;}
}


您可能要考虑使用HTMLAgilityPack:

位于Codeplex的HTMLAgilityPack [ ^ ]

我不确定该库中是否有一种方法可以修复丢失的HTML标记,但您可能需要检查一下.
You might want to consider using the HTMLAgilityPack:

HTMLAgilityPack at Codeplex[^]

I don''t know for sure if there''s a method in that library that fixes missing tags HTML, but you might want to check it out.


这篇关于如果没有结束标签,如何删除开始标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆