在包含特定图标的 html 表格中查找单元格 [英] Find cell in html table containing a specific icon

查看:38
本文介绍了在包含特定图标的 html 表格中查找单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找可以告诉我特定图标驻留在 html 表的哪个单元格中的代码.这是我正在使用的内容:

I am looking for code that can inform me in which cell of an html table a particular icon resides. Here is what I am working with:

u <- "http://www.transfermarkt.nl/lionel-messi/leistungsdaten/spieler/28003/saison/2014/plus/1"
doc <- rvest::html(u)
tab <- rvest::html_table(doc, fill = TRUE)[[6]]

Pos"列.指定球员在场上的位置.其中一些有一个额外的图标.我可以在页面上看到这些图标的存在如下:

The column "Pos." designates the player's position in the field. Some of these have an additional icon. I can see the presence of these icons on the page as follows:

rvest::html_nodes(doc, ".kapitaenicon-table")

但这并没有告诉我它们在哪里.我希望我的代码返回图标出现在表中位置列"的第 2、10、11、27 行.我该怎么做?

but this doesn't tell me WHERE they are. I would like my code to return that the icon occurs in rows 2, 10, 11, 27 of the "Pos. column" in the table. How can I do that?

推荐答案

多一点 rvest 和 XPath 魔法可以为您提供索引:

A little bit more rvest and XPath magic can get you the indices:

library(rvest)
library(magrittr)
library(XML)

pg <- html("http://www.transfermarkt.nl/lionel-messi/leistungsdaten/spieler/28003/saison/2014/plus/1")

pg %>% 
  html_nodes("table") %>% 
  extract2(6) %>% 
  html_nodes("tbody > tr") %>% 
  sapply(function(x) {
    length(xpathSApply(x, "./td[8]/span[@class='kapitaenicon-table icons_sprite']")) == 1
  }) %>% which

## [1]  2 10 11 27

得到第 6 个表,提取 trs 然后通过它们查找第 8 个 td 和正确的 span/ 在其中.如果 XPath 搜索失败,它会返回一个空列表,因此您可以使用长度来确定哪些行具有带有图标的 td,哪些没有.

That gets the 6th table, extracts the trs then looks through them for an 8th td with the proper span/class in it. If the XPath search fails it returns an empty list, so you can use the length to determine which rows have the td with the icon in them and which do not.

这个:

pg %>% 
  html_nodes(xpath="//table[6]/tbody/tr/td[8]") %>% 
  xmlSApply(xpathApply, "boolean(./span[@class='kapitaenicon-table icons_sprite'])") %>% 
  which

也可以工作,而且它更紧(更快).它使用 XPath boolean 操作来测试是否存在.如果您没有在节点上执行其他操作,这会更方便.

also works and it a bit tighter (and faster). It uses the XPath boolean operation to test for existence. This is handier if you have no other operations to perform on the node(s).

这是一个 xml2 版本,但我不得不相信在 xml2 中必须有更好的方法来做到这一点:

This is an xml2 version, though I have to believe there has to be a better way to do this in xml2:

library(xml2)
library(magrittr)

pg2 <- read_html("http://www.transfermarkt.nl/lionel-messi/leistungsdaten/spieler/28003/saison/2014/plus/1")
pg2 %>% 
  xml_find_all("//table[6]/tbody/tr/td[8]") %>% 
  as_list %>% 
  sapply(function(x) {
    inherits(try(xml_find_one(x, "./span"), silent=TRUE), "xml_node")
  }) %>% which

更新

对于 xml20.1.0.9000 版本,我必须执行以下操作:

For version 0.1.0.9000 of xml2 I had to do the following:

pg2 %>% xml_find_all("//table") %>% 
  as_list %>% 
  extract2(6) %>% 
  xml_find_all("./tbody/tr/td[8]") %>% 
  as_list %>% 
  sapply(function(x) {
    inherits(try(xml_find_one(x, "./span"), silent=TRUE), "xml_node")
  }) %>% which

事实并非如此,我已经提交了错误报告.

That should not be the case and I've filed a bug report.

Session info -------------------------------------------------------------------------
 setting  value                       
 version  R version 3.2.0 (2015-04-16)
 system   x86_64, darwin13.4.0        
 ui       RStudio (0.99.441)          
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/New_York            

Packages -----------------------------------------------------------------------------
 package    * version date       source        
 curl       * 0.5     2015-02-01 CRAN (R 3.2.0)
 devtools   * 1.7.0   2015-01-17 CRAN (R 3.2.0)
 magrittr     1.5     2014-11-22 CRAN (R 3.2.0)
 Rcpp       * 0.11.5  2015-03-06 CRAN (R 3.2.0)
 rstudioapi * 0.3.1   2015-04-07 CRAN (R 3.2.0)
 xml2         0.1.0   2015-04-20 CRAN (R 3.2.0)

这篇关于在包含特定图标的 html 表格中查找单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆