使用rereplace删除代码(Removing Code by using rereplace)

高性能WEB开发 IT屋
问 题

Hi Have the following code, I am using the following code to remove the contents from the page which i do not know:

I am using regex, and i cannot use jsoup, please do not provide any jsoup link or code because that will be useless to use here for me..

<cfset removetitle = rereplacenocase(cfhttp.filecontent, '<title[^>]*>(.+)</title>', "\1")>

Now above the same way, i want to use the follwoing things:

1. <base href="http://search.google.com">
2. <link rel="stylesheet" href="mystyle.css">
3. and there are 5 tables inside the body, i want to remove the 2nd table.,

Can anyone guide on this

解决方案

Scott is right, and Leigh was right before, when you asked a similar question, jSoup is your best option.

As to a regex solution. This is possible with regex but there are problems that regex cannot always solve. For instance, if the first or second table contains a nested table, this regex would trip. (Note that text is not required between the tables, I'm just demonstrating that things can be between the tables)

(If there is always a nested table, regex can handle it, but if there is sometimes a nested table, in other words: unknown), it gets a lot messier.)

<cfsavecontent variable="sampledata">
<body>
<table cellpadding="4"></table>stuff
is <table border="5" cellspacing="7"></table>between
<table border="3"></table>the
<table border="2"></table>tables
<table></table>
</body>
</cfsavecontent>

<cfset sampledata = rereplace(sampledata,"(?s)(.*?<table.*?>.*?<\/table>.*?)(<table.*?>.*?<\/table>)(.*)","\1\3","ALL") />
<cfoutput><pre>#htmleditformat(sampledata)#</pre></cfoutput>

What this does is

(?s) sets . to match newlines as well. (.*?<table.*?>.*?<\/table>.*?) Matches everything before the first table, the first table, and everything between it and the second table and sets it as capture group 1. (<table.*?>.*?<\/table>) Matches the second table and creates capture group 2. (.*) matches everything after the second table and creates capture group 3.

And then the third paramters \1\3 picks up the first and third capture groups.

If you have control of the source document, you can create html comments like

<!-- table1 -->
  <table>...</table>
<!-- /table1 -->

And then use that in the regex and end up with a more regex-friendly document.

However, still, Scott said it best, not using the proper tool for the task is:

That is like telling a carpenter, build me a house, but don't use a hammer.

These tools are created because programmers frequently run into precisely the problem you're having, and so they create a tool, and often freely share it, because it does the job much better.

本文地址:IT屋 » Removing Code by using rereplace

问 题

你有以下代码,我使用下面的代码从页面中删除我不知道的内容:



我使用regex,不能使用jsoup,请不要提供任何jsoup链接或代码,因为这将是无用的在这里为我..



 < ; cfset removetitle = rereplacenocase(cfhttp.filecontent,'< title [^>] *>(。+)< / title>',“\1”)& 


现在,以同样的方式,我想使用下面的事情:



  1。 < base href =“http://search.google.com”> 
2.< link rel =“stylesheet”href =“mystyle.css”>
3.体内有5个表,我要删除第2个表。


任何人都可以在此指导


解决方案

Scott是对的, Leigh是正确的之前,当你问一个类似的问题,jSoup是你最好的选择。



。这是可能与正则表达式,但有正则表达式不能总是解决的问题。例如,如果第一个或第二个表包含一个嵌套表,这个正则表达式将跳闸。 (注意,表之间不需要文本,我只是证明事物可以在表之间)



(如果总是有嵌套表,regex可以处理它,但如果有时有一个嵌套表,换句话说:未知),它会变得更麻烦。)



  < cfsavecontent variable =“sampledata”> 
< body>
< table cellpadding =“4”>< / table> stuff
是< table border =“5”cellspacing =“7”>< / table& < table border =“3”>< / table> the
< table border =“2”>< / table> tables
< table>< / table&
< / body>
< / cfsavecontent>

< cfset sampledata = rereplace(sampledata,“(?s)(。*?< table。*?>。*?< \ / table>。*?) ; table. *?>。*?< \ / table>)(。*)“,”\1\3“,”ALL“)/>
< cfoutput>< pre> #htmleditformat(sampledata)#< / pre>< / cfoutput>


这是什么?



(?s)设置。以匹配换行符。
(。*?< table。*?>。*?< \ / table>。*?)匹配第一个表之前的所有内容,第一个表以及它与第二个表之间的所有内容,并将其设置为捕获组1.
(< table。*?>。*?< \ / table> 匹配第二个表并创建捕获组2.
(。*)匹配第二个表后的所有内容并创建捕获组3。



然后第三个参数 \1\3 选择第一个和第三个捕获组。 p>

如果您有源文档的控制权,可以创建



 <! -  table1  - > 
< table> ...< / table>
<! - / table1 - >


然后在正则表达式中使用它,最后得到一个更合适的正则表达式文档。



但是,Scott还是说,最好不要使用合适的工具:







这些工具已创建因为程序员经常遇到你正在遇到的问题,所以他们创建一个工具,并且经常自由地共享它,因为它做得更好。


本文地址:IT屋 » 使用rereplace删除代码