如何使用httparty for rails 3解析og meta标签 [英] How to parse og meta tags using httparty for rails 3

查看:72
本文介绍了如何使用httparty for rails 3解析og meta标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用HTTParty gem使用以下代码来解析og元标记:

I am trying to parse og meta tags using the HTTParty gem using this code:

link = http://www.usatoday.com/story/gameon/2013/01/08/nfl-jets-tony-sparano-fired/1817037/
# link = http://news.yahoo.com/chicago-lottery-winners-death-ruled-homicide-181627271.html
resp = HTTParty.get(link)
ret_body = resp.body

# title
  og_title = ret_body.match(/\<[Mm][Ee][Tt][Aa] property\=\"og:title\"\ content\=\"(.*?)\"\/\>/)
  og_title = og_title[1].to_s

问题在于它在某些网站(雅虎!)上有效,但在其他网站(今天的美国)上却不可用

The problem is that it worked on some sites (yahoo!) but not others (usa today)

推荐答案

请勿使用正则表达式解析HTML,因为它们除了最简单的问题外,对于任何其他问题都过于脆弱.对HTML的微小更改可能会破坏模式,从而使您开始为保持不断扩展的模式而进行的缓慢战斗.这是一场战争,你不会赢.

Don't parse HTML with regular expressions, because they're too fragile for anything but the simplest problems. A tiny change to the HTML can break the pattern, causing you to begin a slow battle of maintaining an ever expanding pattern. It's a war you won't win.

请使用HTML解析器. Ruby的 Nokogiri 非常好.这就是我想要做的事情:

Instead, use a HTML parser. Ruby has Nokogiri, which is excellent. Here's how I'd do what you want:

require 'nokogiri'
require 'httparty'

%w[
  http://www.usatoday.com/story/gameon/2013/01/08/nfl-jets-tony-sparano-fired/1817037/
  http://news.yahoo.com/chicago-lottery-winners-death-ruled-homicide-181627271.html
].each do |link|
  resp = HTTParty.get(link)

  doc = Nokogiri::HTML(resp.body)
  puts doc.at('meta[property="og:title"]')['content']
end

哪个输出:


Jets fire offensive coordinator Tony Sparano
Chicago lottery winner's death ruled a homicide

这篇关于如何使用httparty for rails 3解析og meta标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆