正则表达式:选择一切,但img标签 [英] regex: selecting everything but img tag

查看:133
本文介绍了正则表达式:选择一切,但img标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用正则表达式来选择一些文本,使所有 img 标签保持不变。



我发现以下代码可以选择所有 img 标签:

  /< ; img [^>] +> / g 

但实际上有如下文字:

 这是一个未加标签的文本。 
< p>这是我的段落文字< / p>
< img src =http://example.com/image.pngalt =/>
< a href =http://example.com/>这是一个连结< / a>

使用上面的代码将仅选择img标记

  /< img [^>] +> / g# - >使用此代码将导致:
< img src =http://example.com/image.pngalt =/>

但是我想使用一些正则表达式来选择一切,但像这样的图像:

  / magical regex / g# - >结果是:
这是一个未加标签的文本。
< p>这是我的段落文字< / p>
< a href =http://example.com/>这是一个连结< / a>

我也发现了这段代码:

  / <(?!img)[^>] +> / g 

选择除 img 之外的所有标签。但在某些情况下,我会在标记之间使用未标记的文字或文字,因此这对我的情况不起作用。 :(

有没有办法做到这一点?
对不起,但我对于正则表达式真的很陌生,所以我很努力地尝试了几天让它工作,但我不能。



预先感谢




更新:



对于那些认为我想解析它的人来说,对不起,我不想要它,我只想选择文本。 p>

另一件事,我没有使用任何特定的语言,我正在使用 Yahoo Pipes ,它只提供正则表达式和一些字符串工具来完成这项工作,但它不会发展任何编程代码。

为了更好这里了解正则表达式模块在雅虎管道中的工作方式:



http://pipes.yahoo.com/pipes/docs?doc=operators#Regex






更新2



能够去除 img 标记附近的文本,但是可以像@Blixt建议的那样逐步删除文本,如:

 ≤(?张图片)[^>] +> ,替换为# - >去掉不是img 
(?s)^ [^ <*(。*)的每个标签,用$ 1# - >删除img标签
(?s)^([^>] +>)*之前的所有文本,替换为$ 1# - >删除img标签后的所有文本

这个问题就是它只会捕获第一个img标签,然后我必须手动执行,并抓住其他人对它进行硬编码,所以我仍然不确定这是否是最好的解决方案。

解决方案正则表达式您必须找到图片标签,可以使用替换来获得您想要的内容。



假设您使用的是PHP:

  $ htmlWithoutIMG = preg_replace('/< img [^>] +> / g','',$ html) ; 

如果您使用的是Javascript:

  var htmlWithoutIMG = html.replace(/< img [^>] +> / g,''); 

这需要您的文本,找到< img> 标记并将其替换为无,即。它从文本中删除它们,留下你想要的东西。无法记得<,> 是否需要转义。


I'm trying to select some text using regular expressions leaving all img tags intact.

I've found the following code that selects all img tags:

/<img[^>]+>/g

but actually having a text like:

This is an untagged text.
<p>this is my paragraph text</p>
<img src="http://example.com/image.png" alt=""/>
<a href="http://example.com/">this is a link</a>

using the code above will select the img tag only

/<img[^>]+>/g #--> using this code will result in:
<img src="http://example.com/image.png" alt=""/>

but I would like to use some regex that select everything but the image like:

/magical regex/g # --> results in:
This is an untagged text.
<p>this is my paragraph text</p>
<a href="http://example.com/">this is a link</a>

I've also found this code:

/<(?!img)[^>]+>/g

which selects all tags except the img one. but in some cases I will have untagged text or text between tags so this won't work for my case. :(

is there any way to do it? Sorry but I'm really new to regular expressions so I'm really struggling for few days trying to make it work but I can't.

Thanks in advance


UPDATE:

Ok so for the ones thinking I would like to parse it, sorry I don't want it, I just want to select text.

Another thing, I'm not using any language in specific, I'm using Yahoo Pipes which only provide regex and some string tools to accomplish the job. but it doesn't evolves any programming code.

for better understanding here is the way regex module works in yahoo pipes:

http://pipes.yahoo.com/pipes/docs?doc=operators#Regex


UPDATE 2

Fortuntately I'm being able to strip the text near the img tag but on a step-by-step basis as @Blixt recommended, like:

<(?!img)[^>]+> , replace with "" #-> strips out every tag that is not img
(?s)^[^<]*(.*), replace with $1  #-> removes all the text before the img tag
(?s)^([^>]+>).*, replace with $1 #-> removed all the text after the img tag

the problem with this is that it will only catch the first img tag and then I would have to do it manually and catch the others hard-coding it, so I still not sure if this is the best solution.

解决方案

The regexp you have to find the image tags can be used with a replace to get what you want.

Assuming you are using PHP:

$htmlWithoutIMG = preg_replace('/<img[^>]+>/g', '', $html);

If you are using Javascript:

var htmlWithoutIMG = html.replace(/<img[^>]+>/g, '');

This takes your text, finds the <img> tags and replaces them with nothing, ie. it deletes them from the text, leaving what you want. Can not recall if the <,> need escaping.

这篇关于正则表达式:选择一切,但img标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆