通过Rails中的链接获取标题，内容 [英] Get title, content via link in rails

查看：73 发布时间：2020/5/25 0:29:45 ruby-on-rails ruby ruby-on-rails-3 parsing web-scraping

本文介绍了通过Rails中的链接获取标题，内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚刚开始学习Rails.您能帮我理解解析单个链接吗?好的教程也有帮助...

问题:

当您在Digg，Facebook等中提交链接时.说出附加链接后，它将解析该链接以获取标题，内容和特定URL的图像.您能帮我在Rails中实现类似的事情吗?

我查看了feed解析器(例如feedzirra等)，但它们似乎获得了完整的网站供稿.不仅是我们正在寻找的链接.还是我在某个地方出错了?

非常感谢.

解决方案

好像您正在寻找类似Pismo的东西: https://github.com/peterc/pismo

require 'pismo'

# Load a Web page (you could pass an IO object or a string with existing HTML data along, as you prefer)
doc = Pismo::Document.new('http://www.rubyinside.com/cramp-asychronous-event-driven-ruby-web-app-framework-2928.html')

doc.title     # => "Cramp: Asychronous Event-Driven Ruby Web App Framework"
doc.author    # => "Peter Cooper"
doc.lede      # => "Cramp (GitHub repo) is a new, asynchronous evented Web app framework by Pratik Naik of 37signals (and the Rails core team). It's built around Ruby's EventMachine library and was designed to use event-driven I/O throughout - making it ideal for situations where you need to handle a large number of open connections (such as Comet systems or streaming APIs.)"
doc.keywords  # => [["cramp", 7], ["controllers", 3], ["app", 3], ["basic", 2], ..., ... ]

图像警告:

图像提取仅处理具有绝对URL的图像

I just started learning rails. Could you help me understand parsing a single link? Good tutorial will help too...

The question:

When you submit a link in Digg, Facebook etc.. After you say attach link it parses the link to fetch the title, content, images of a particular url. Could you please help me how a similar thing can be implemented in rails?

I have looked at feed parsers like feedzirra etc but they seem to get the complete website feed.. Not just the link we are looking for.. Or is it that I am making a mistake somewhere?

Thanks so much in advance.

解决方案

Looks like you might be looking for something like Pismo: https://github.com/peterc/pismo

require 'pismo'

# Load a Web page (you could pass an IO object or a string with existing HTML data along, as you prefer)
doc = Pismo::Document.new('http://www.rubyinside.com/cramp-asychronous-event-driven-ruby-web-app-framework-2928.html')

doc.title     # => "Cramp: Asychronous Event-Driven Ruby Web App Framework"
doc.author    # => "Peter Cooper"
doc.lede      # => "Cramp (GitHub repo) is a new, asynchronous evented Web app framework by Pratik Naik of 37signals (and the Rails core team). It's built around Ruby's EventMachine library and was designed to use event-driven I/O throughout - making it ideal for situations where you need to handle a large number of open connections (such as Comet systems or streaming APIs.)"
doc.keywords  # => [["cramp", 7], ["controllers", 3], ["app", 3], ["basic", 2], ..., ... ]

An image caveat is:

The image extraction only deals with images with absolute URLs

这篇关于通过Rails中的链接获取标题，内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过Rails中的链接获取标题，内容 [英] Get title, content via link in rails

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

通过Rails中的链接获取标题，内容 [英] Get title, content via link in rails

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭