使用 Nokogiri 提取元素 [英] Extracting elements with Nokogiri

查看：41 发布时间：2021/6/8 18:47:04 ruby-on-rails ruby ruby-on-rails-3 xpath nokogiri

本文介绍了使用 Nokogiri 提取元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

想知道是否有人可以帮助解决以下问题.我正在使用 Nokogiri 从 http://www.bbc.co.uk/中抓取一些数据运动/足球/桌子

我想获取联赛表信息，到目前为止我得到了这个

def get_league_table # 给我英超联赛表doc = Nokogiri::HTML(open(FIXTURE_URL))table = doc.css('.table-stats')team = table.xpath('following-sibling::*[1]').css('tr.team')team.each 做 |team|position = team.css('.position-number').text.stripLeague.create!(位置:位置)结尾结尾

所以我想我会获取 .table-stats 然后获取表格中的每一行，其中包含一个团队类别，这些行包含我需要的所有信息，例如位置编号、出场次数、团队名称等.

进入 tr.team 后，我想我可以做一个循环来从行中获取相关信息.

这是我坚持的 xpath 部分(除非我处理整个事情是错误的?)，如何从 .table-stats 进入 tr.team 类?

有大佬可以指点一下吗?

谢谢

解决方案

这是我为动态解析表而制作的脚本，我根据您的情况对其进行了调整:

需要'open-uri'需要'nokogiri'url = 'http://www.bbc.co.uk/sport/football/tables'doc = Nokogiri::HTML.parse(打开网址)团队 = doc.search('tbody tr.team')keys = team.first.search('td').map do |k|k['class'].gsub('-', '_').to_sym结尾hsh = team.flat_map do |team|哈希[keys.zip(team.search('td').map(&:text))]结尾把 hsh

Was wondering if someone could help out with the following. I am using Nokogiri to scrape some data from http://www.bbc.co.uk/sport/football/tables

I would like to get the league table info, so far ive got this

def get_league_table # Get me Premier League Table
  doc = Nokogiri::HTML(open(FIXTURE_URL))
  table = doc.css('.table-stats')
  teams = table.xpath('following-sibling::*[1]').css('tr.team')
  teams.each do |team|
  position = team.css('.position-number').text.strip
  League.create!(position: position)
  end
end

So i thought i would grab the .table-stats and then get each row in the table with a class of team, these rows contain all the info I need, like position number, played,team-name etc.

Once I'm in the tr.team I thought I could do a loop to grab the relevant info from the rows.

Its the xpath part I am stuck on (unless I'm approaching the whole thing wrong?), how to get to the tr.team class from .table-stats?

Could anyone offer any pointers please?

Thanks

解决方案

This is a script I made to dynamically parse tables, I adapted it to your case:

require 'open-uri'
require 'nokogiri'

url = 'http://www.bbc.co.uk/sport/football/tables'
doc = Nokogiri::HTML.parse(open url)
teams = doc.search('tbody tr.team')

keys = teams.first.search('td').map do |k|
  k['class'].gsub('-', '_').to_sym
end

hsh = teams.flat_map do |team|
  Hash[keys.zip(team.search('td').map(&:text))]
end

puts hsh

这篇关于使用 Nokogiri 提取元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 Nokogiri 提取元素 [英] Extracting elements with Nokogiri

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 Nokogiri 提取元素 [英] Extracting elements with Nokogiri

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭