用Nokogiri解析div元素 [英] Parsing div elements with Nokogiri

查看:110
本文介绍了用Nokogiri解析div元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码可成功提取tid和term数据:

(由Uri Agassi慷慨解答)

<$ p $(1..10)
doc = Nokogiri :: HTML(open(http://somewebsite.com/#{i}/))$ b $ b tids = doc.xpath(// div [contains(concat('',@class,''),'thing')])。collect {| node | node ['data-thing-id']}
terms = doc.xpath(// div [contains(concat('',@class,''),'col_a')])。collect { |节点| node.text.strip}

tids.zip(terms).each do | tid,term |
puts tid ++ term
end
end

from以下示例html:

 < div class =thing text-textdata-thing-id =29966403> ; 
< div class =thinguser>< i class =ico ico-water ico-blue>< / i>
< div class =status>在7天内
< / div>
< / div>
< div class =ignore-ui pull-right>< input type =check box>
< / div>
< div class =col_a col text>
< div class =text> foobar
< / div>
< / div>
< div class =col_b col text>
< div class =text> foobar desc
< / div>
< / div>
< / div>

如果我想以相同的方式提取状态(7天内字符串)什么是最好的方式来做到这一点?我似乎无法弄清楚。



有人会善意地详细解释tids和术语赋值线究竟在做什么吗?我不明白这一点,Nokogiri的文档似乎没有涵盖这一点。



非常感谢您提前。



〜Chris

解决方案

我所有关于在nokogiri中使用css选择器。

  doc = Nokogiri :: HTML(open(http://somewebsite.com/# {内容


The following code successfully extracts tid and term data:

(answered generously by Uri Agassi)

for i in (1..10)
  doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
  tids =  doc.xpath("//div[contains(concat(' ', @class, ' '),' thing ')]").collect {|node|    node['data-thing-id']}
  terms = doc.xpath("//div[contains(concat(' ', @class, ' '),' col_a ')]").collect {|node| node.text.strip }

  tids.zip(terms).each do |tid, term|
    puts tid+" "+term
  end
end

from the following sample html:

<div class="thing text-text" data-thing-id="29966403">
  <div class="thinguser"><i class="ico ico-water ico-blue"></i>
    <div class="status">in 7 days
    </div>
  </div>
  <div class="ignore-ui pull-right"><input type="check box" >
  </div>
  <div class="col_a col text">
    <div class="text">foobar
    </div>
  </div>
  <div class="col_b col text">
    <div class="text">foobar desc
    </div>
  </div>
</div>

If I wanted to pull status (the "in 7 days" string) info in the same fashion, what's the best way to do that? I can't seem to figure it out.

Would someone be kind enough to explain in detail what the tids and terms assignment lines are actually doing? I don't get it and the Nokogiri documentation doesn't seem to cover this.

Big thanks in advance.

~Chris

解决方案

I'm all about using css selectors in nokogiri. Something like this should work.

doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
seven_days = doc.css('status').content

这篇关于用Nokogiri解析div元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆