Ruby 正则表达式的问题 [英] Problem with Ruby Regular Expression

查看:34
本文介绍了Ruby 正则表达式的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个 HTML 代码,就在一行中:

I have this HTML code, that's on a single line:

<h3 class='r'><a href="www.google.com">fkdsafjldsajl</a></h3><h3 class='r'><a href="www.google.com">fkdsafjldsajl</a></h3>

这是对线路友好的版本(我不能使用)

Here is the line-friendly version (that i can't use)

<h3 class='r'><a href="www.google.com">fkdsafjldsajl</a></h3>
<h3 class='r'><a href="www.google.com">fkdsafjldsajl</a></h3>

我正在尝试使用这个 REGEX 提取 URL

And i'm trying to extract just the URLs, with this REGEX

/<h3 class="r"><a href="(.*)">(.*)<\/a>/

它回来了

www.google.com">fkdsafjldsajl</a></h3><h3 class='r'><a href="www.google.com"

找到 " 时我该怎么做才能阻止它?

What can I do to stop it when find a " ?

推荐答案

叹气.正则表达式和 HTML 真是尴尬的搭档:

Sigh. Regex and HTML are such awkward bedfellows:

require 'nokogiri'

html = %q{<h3 class='r'><a href="www.google.com">fkdsafjldsajl</a></h3><h3 class='r'><a href="www.google.com">fkdsafjldsajl</a></h3>}
doc = Nokogiri::HTML(html)
puts doc.css('a').map{ |a| a['href'] }
# >> www.google.com
# >> www.google.com

这将找到它们,无论它们是嵌套很深还是都在一行上.

This will find them, whether they are deeply nested or all on one line.

这篇关于Ruby 正则表达式的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆