nokogiri +机械化css选择器的文本 [英] nokogiri + mechanize css selector by text

查看:122
本文介绍了nokogiri +机械化css选择器的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新的nokogiri和目前最熟悉的CSS选择器,我试图解析表中的信息,下面是表和我使用的代码示例,我坚持适当的if



表:

 

code>< div class =holder>
< div class =row>
< div class =c1>
<! - 内容我不需要 - >
< / div>
< div class =c2>
< span class =data>
<! - 内容我不需要 - >
< span class =data>
< / div>
< / div>
...
< div class =row>
< div class =c1>
SPECIFIC TEXT
< / div>
< div class =c2>
< span class =data>
我想要什么
< / span>
< / div>
< / div>
< / div>

我的脚本:(如果在表中找到SPECIFIC TEXT,数据变量 - 所以我搞错了do循环或if语句的知识)

  data = [] 
page.agent.get(url)
page.search('div.row')。each do | row_data |
if(row_data.search('div.c1:contains(/ SPECIFIC TEXT /)')。text.strip
temp = row_data.search('div.c2 span.data')。 text.strip
data<< temp
end
end


解决方案

当你可以在单个CSS选择器中提取所需的内容时,不需要停止和插入ruby逻辑。

  data = page.search('div.row> div.c1:contains(SPECIFIC TEXT)+ div.c2 span.data')



这将只包括与选择器匹配的那些(例如遵循SPECIFIC TEXT)。



这是您的逻辑可能出错的地方:



此代码

  if(row_data.search('div.c1:contains(SPECIFIC TEXT)'... 
temp = row_data.search('div.c2 span.data')...

首先搜索特定文本的行,然后如果匹配,则返回与第二个查询匹配的所有行,关键是CSS选择器中的 + ,它将返回紧随其后的元素(例如下一个兄弟元素)。我假设,下一个元素总是你想要的。


I am new to nokogiri and so far most familiar with CSS selectors, I am trying to parse information from a table, below is a sample of the table and the code I'm using, I'm stuck on the appropriate if statement, as it seems to return the whole contents of the table.

Table:

<div class="holder">
  <div class ="row">
   <div class="c1">
     <!-- Content I Don't need -->
   </div>
   <div class="c2">
    <span class="data">
     <!-- Content I Don't Need -->
    <span class="data">
   </div>
 </div>
 ...
 <div class="row">
  <div class="c1">
   SPECIFIC TEXT
  </div>
  <div class="c2">
   <span class="data">
    What I want
   </span>
  </div>
 </div>
</div>

My Script: (if SPECIFIC TEXT is found in the table it returns every "div.c2 span.data" variable - so I've either screwed up my knowledge of do loops or if statements)

data = []
page.agent.get(url)
page.search('div.row').each do |row_data|
 if (row_data.search('div.c1:contains("/SPECIFIC TEXT/")').text.strip
  temp = row_data.search('div.c2 span.data').text.strip
  data << temp
 end
end

解决方案

There's no need to stop and insert ruby logic when you can extract what you need in a single CSS selector.

data = page.search('div.row > div.c1:contains("SPECIFIC TEXT") + div.c2 span.data')

This will include only those that match the selector (e.g. follow the SPECIFIC TEXT).

Here's where your logic may have gone wrong:

This code

if (row_data.search('div.c1:contains("SPECIFIC TEXT")'...
  temp = row_data.search('div.c2 span.data')...

first searches the row for the specific text, then if it matches, returns ALL rows matching the second query, which has the same starting point. The key is the + in the CSS selector above which will return elements immediately following (e.g. the next sibling element). I'm making an assumption, of course, that the next element is always what you want.

这篇关于nokogiri +机械化css选择器的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆