用Nokogiri解析div元素 [英] Parsing div elements with Nokogiri

查看：110 发布时间：2018/6/25 18:35:45 html ruby-on-rails ruby xpath nokogiri

本文介绍了用Nokogiri解析div元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下代码可成功提取tid和term数据：

（由Uri Agassi慷慨解答）

<$ p $（1..10）
doc = Nokogiri :: HTML（open（http://somewebsite.com/#{i}/））$ b $ b tids = doc.xpath（// div [contains（concat（''，@class，''），'thing'）]）。collect {| node | node ['data-thing-id']} terms = doc.xpath（// div [contains（concat（''，@class，''），'col_a'）]）。collect { |节点| node.text.strip} tids.zip（terms）.each do | tid，term | puts tid ++ term end end

from以下示例html：

 < div class =thing text-textdata-thing-id =29966403> ; 
< div class =thinguser>< i class =ico ico-water ico-blue>< / i> 
< div class =status>在7天内
< / div> 
< / div> 
< div class =ignore-ui pull-right>< input type =check box> 
< / div> 
< div class =col_a col text> 
< div class =text> foobar 
< / div> 
< / div> 
< div class =col_b col text> 
< div class =text> foobar desc 
< / div> 
< / div> 
< / div>

如果我想以相同的方式提取状态（7天内字符串）什么是最好的方式来做到这一点？我似乎无法弄清楚。

有人会善意地详细解释tids和术语赋值线究竟在做什么吗？我不明白这一点，Nokogiri的文档似乎没有涵盖这一点。

非常感谢您提前。

〜Chris
解决方案
我所有关于在nokogiri中使用css选择器。
doc = Nokogiri :: HTML（open（http://somewebsite.com/# {内容

The following code successfully extracts tid and term data:

(answered generously by Uri Agassi)
for i in (1..10) doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/")) tids = doc.xpath("//div[contains(concat(' ', @class, ' '),' thing ')]").collect {|node| node['data-thing-id']} terms = doc.xpath("//div[contains(concat(' ', @class, ' '),' col_a ')]").collect {|node| node.text.strip } tids.zip(terms).each do |tid, term| puts tid+" "+term end end
from the following sample html:
<div class="thing text-text" data-thing-id="29966403"> <div class="thinguser"><i class="ico ico-water ico-blue"></i> <div class="status">in 7 days </div> </div> <div class="ignore-ui pull-right"><input type="check box" > </div> <div class="col_a col text"> <div class="text">foobar </div> </div> <div class="col_b col text"> <div class="text">foobar desc </div> </div> </div>
If I wanted to pull status (the "in 7 days" string) info in the same fashion, what's the best way to do that? I can't seem to figure it out.

Would someone be kind enough to explain in detail what the tids and terms assignment lines are actually doing? I don't get it and the Nokogiri documentation doesn't seem to cover this.

Big thanks in advance.

~Chris
解决方案
I'm all about using css selectors in nokogiri. Something like this should work.
doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/")) seven_days = doc.css('status').content

这篇关于用Nokogiri解析div元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用Nokogiri解析div元素 [英] Parsing div elements with Nokogiri

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

用Nokogiri解析div元素 [英] Parsing div elements with Nokogiri

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭