如何使用 ActiveSupport 的“starts_with"删除 HTTP 链接?使用Nokogiri? [英] How do I remove HTTP links with ActiveSupport's "starts_with" using Nokogiri?
问题描述
当我尝试这个时:
item.css("a").each do |a|
if !a.starts_with? 'http://'
a.replace a.content
end
end
我明白了:
NoMethodError: undefined method 'starts_with?' for #<Nokogiri::XML::Element:0x1b48a60>
当然有更简洁的方法,但这似乎有效.
Sure there is a cleaner way, but this seems to be working.
item.css("a").each do |a|
unless a["href"].blank?
if !a["href"].starts_with? 'http://'
a.replace a.content
end
end
end
推荐答案
问题是您试图在未实现它的对象上使用 starts_with
方法.
The problem is you're trying to use the starts_with
method on an object that doesn't implement it.
item.css("a").each do |a|
将返回 a
中的 XML 节点.那些属于Nokogiri.你想要做的是将节点转换为文本,但只有你想检查的部分,因为它是节点的参数,可以这样访问:
will return XML nodes in a
. Those belong to Nokogiri. What you want to do is convert the node to text, but only the part you want to check, which, because it's a parameter of the node, can be accessed like this:
a['href']
所以,你想使用这样的东西:
So, you want to use something like this:
item.css("a").each do |a|
if !(a.starts_with?['href']('http://'))
a.replace(a.content)
end
end
这样做的缺点是您必须遍历文档中的每个 <a>
标记,这在包含大量链接的大页面上可能会很慢.
The downside to this is you have to walk through every <a>
tag in the document, which can be slow on a big page with lots of links.
另一种方法是使用 XPath 的 starts-with
函数:
An alternate way to go about it is to use XPath's starts-with
function:
require 'nokogiri'
item = Nokogiri::HTML('<a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>')
puts item.to_html
输出:
>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>
>> <a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>
>> </body></html>
以下是使用 XPath 的方法:
Here's how to do it using XPath:
item.search('//a[not(starts-with(@href, "http://"))]').each do |a|
a.replace(a.content)
end
puts item.to_html
哪些输出:
>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>foo<a href="http://bar">bar</a>
>> </body></html>
使用 XPath 查找节点的优点是它都在编译的 C 中运行,而不是让 Ruby 来做.
The advantage to using XPath to find the nodes is it all runs in compiled C, rather than letting Ruby do it.
这篇关于如何使用 ActiveSupport 的“starts_with"删除 HTTP 链接?使用Nokogiri?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!