如何使用httparty for rails 3解析og meta标签 [英] How to parse og meta tags using httparty for rails 3
问题描述
我正在尝试使用HTTParty gem使用以下代码来解析og元标记:
I am trying to parse og meta tags using the HTTParty gem using this code:
link = http://www.usatoday.com/story/gameon/2013/01/08/nfl-jets-tony-sparano-fired/1817037/
# link = http://news.yahoo.com/chicago-lottery-winners-death-ruled-homicide-181627271.html
resp = HTTParty.get(link)
ret_body = resp.body
# title
og_title = ret_body.match(/\<[Mm][Ee][Tt][Aa] property\=\"og:title\"\ content\=\"(.*?)\"\/\>/)
og_title = og_title[1].to_s
问题在于它在某些网站(雅虎!)上有效,但在其他网站(今天的美国)上却不可用
The problem is that it worked on some sites (yahoo!) but not others (usa today)
推荐答案
请勿使用正则表达式解析HTML,因为它们除了最简单的问题外,对于任何其他问题都过于脆弱.对HTML的微小更改可能会破坏模式,从而使您开始为保持不断扩展的模式而进行的缓慢战斗.这是一场战争,你不会赢.
Don't parse HTML with regular expressions, because they're too fragile for anything but the simplest problems. A tiny change to the HTML can break the pattern, causing you to begin a slow battle of maintaining an ever expanding pattern. It's a war you won't win.
请使用HTML解析器. Ruby的 Nokogiri 非常好.这就是我想要做的事情:
Instead, use a HTML parser. Ruby has Nokogiri, which is excellent. Here's how I'd do what you want:
require 'nokogiri'
require 'httparty'
%w[
http://www.usatoday.com/story/gameon/2013/01/08/nfl-jets-tony-sparano-fired/1817037/
http://news.yahoo.com/chicago-lottery-winners-death-ruled-homicide-181627271.html
].each do |link|
resp = HTTParty.get(link)
doc = Nokogiri::HTML(resp.body)
puts doc.at('meta[property="og:title"]')['content']
end
哪个输出:
Jets fire offensive coordinator Tony Sparano
Chicago lottery winner's death ruled a homicide
这篇关于如何使用httparty for rails 3解析og meta标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!