REGEX仅用于数据和结束标记 [英] REGEX for only data and end tag

查看:132
本文介绍了REGEX仅用于数据和结束标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



<$ p $ <$ p

例如

p> 输入:
-----------------
< p> ABC< p>
-----------------
输出将是
-----------------
ABC< p>

-----------------

它只会删除第一个para

para标签,而不是用于第二个para

标签,其间的所有文本都是相同的。

我想在此提及,我正在寻找

 < p> ABC< p> 

不适用

 < p为H. ABC< / p为H. 

它用于具有不规则

标签的特定文本文件

示例:



我有很大的xhtml文件,例如...

 < p为H. SCET< / p为H. 
< p>晴天< / p>
< p> <! - 此标签将被移除 - >
< p> <! - 此标签将被移除 - >
< p>标记< / p>
< p>托马斯< / p>

它是一个完整的XHTML file.having body head标签
这里只有问题是
额外标签
i期待这样的输出

 < p> scet< / p> 
< p>晴天< / p>

< p>标记< / p>
< p>托马斯< / p>


解决方案

/ p>

  public static class XHTMLCleanerUpperThingy 
{
private const string p =< p>;
private const string closingp =< / p>;

public static string CleanUpXHTML(string xhtml)
{
StringBuilder builder = new StringBuilder(xhtml);
for(int idx = 0; idx< xhtml.Length; idx ++)
{
int current;如果((current,xhtml.IndexOf(p,idx))!= -1)
{
int idxofnext = xhtml.IndexOf(p,current + p.Length);
int idxofclose = xhtml.IndexOf(closingp,current);

//如果有下一个< p>标记
if(idxofnext> 0)
{
//如果下一个结束标记比下一个< p>更远,标记
if(idxofnext< idxofclose)
{
for(int j = 0; j< p.Length; j ++)
{
builder [current + j] ='';
}
}
}
//如果没有最终结束标记
else if(idxofclose <0)
{
for(int j = 0; j {
builder [current + j] ='';





return builder.ToString();
}
}


I am looking for REGEX which will give me data along with the end tag

e.g.

input:
-----------------
<p>ABC<p>
-----------------
Output would be
-----------------
ABC<p>

-----------------

it will only remove the first para

para tag,Not for the second para

tag and all text in between would be same.

I want to mention here that i am looking for

<p>ABC<p> 

not for

<p>ABC</p>

Its for specific text file having irregular

tags

Example:

i have big xhtml file like...

<p>scet</p>
<p>sunny </p>
<p>             <!--this tag is to be removed -->
<p>              <!--this tag is to be removed -->
<p>mark</p>
<p>Thomas </p>

its a complete XHTML file.having body head etc tags Only problem here is extra tags i am expecting output like this

<p>scet</p>
<p>sunny </p>

<p>mark</p>
<p>Thomas </p>

解决方案

This will work, take html document in string xhtml

 public static class XHTMLCleanerUpperThingy
    {
        private const string p = "<p>";
        private const string closingp = "</p>";

    public static string CleanUpXHTML(string xhtml)
    {
        StringBuilder builder = new StringBuilder(xhtml);
        for (int idx = 0; idx < xhtml.Length; idx++)
        {
            int current;
            if ((current = xhtml.IndexOf(p, idx)) != -1)
            {
                int idxofnext = xhtml.IndexOf(p, current + p.Length);
                int idxofclose = xhtml.IndexOf(closingp, current);

                // if there is a next <p> tag
                if (idxofnext > 0)
                {
                    // if the next closing tag is farther than the next <p> tag
                    if (idxofnext < idxofclose)
                    {
                        for (int j = 0; j < p.Length; j++)
                        {
                            builder[current + j] = ' ';
                        }
                    }
                }
                // if there is not a final closing tag
                else if (idxofclose < 0)
                {
                    for (int j = 0; j < p.Length; j++)
                    {
                        builder[current + j] = ' ';
                    }
                }
            }
        }

        return builder.ToString();
    }
}

这篇关于REGEX仅用于数据和结束标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆