使用 rvest 抓取网站 - 选择 html 节点? [英] Using rvest to scrape a website - Selecting html node?

查看:64
本文介绍了使用 rvest 抓取网站 - 选择 html 节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对我最新的 r 背心刮擦有疑问.

I have a question about my latest r vest scrape.

我想抓取这个页面(以及其他一些股票):http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1

I want to scrape this page (and some other stocks as well): http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1

我需要市场资本的列表,这是第二行的第一个框.此列表应包含大约 50-100 只股票.

I need a list of the Market Capital, which is the first box in the second line. This list should contain approx 50-100 stocks.

我为此使用了 rvest.

I am using rvest for that.

library(rvest)

html = read_html("http://www.finviz.com/quote.ashx?t=A")

cast = html_nodes(html, "table-dark-row")

问题是,我无法绕过 html_nodes.知道如何找出 html_nodes 的正确节点吗?

The problem is, I can not get around the html_nodes. Any idea about how to find out the correct node for the html_nodes?

我正在使用 firebug/firefinder 查看网页.

I am using firebug/firefinder to check out the webpage.

推荐答案

不确定这是否是您想要的,因为我找不到带有 aprox 的列表.50-100 只股票.

Not sure if this is what you want because I cannot find a list with aprox. 50-100 stocks.

但是为了什么是值得的,使用 SelectorGadget 我想出了这个节点 .table-dark-row:nth-child(2) .snapshot-td2:nth-child(2),选择市值(本页第二行的第一个框http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1).

But for what is worth, using SelectorGadget I came up with this node .table-dark-row:nth-child(2) .snapshot-td2:nth-child(2), to select the Market Cap (first box in the second line of this page http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1).

> library(rvest)
> 
> html = read_html("http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1")
> 
> cast = html_nodes(html, ".table-dark-row:nth-child(2) .snapshot-td2:nth-child(2)")
> cast
{xml_nodeset (1)}
[1] <td width="8%" class="snapshot-td2" align="left">\n  <b>11.58B</b>\n</td>
> 

如果这不是您想要的,只需使用 SelectorGadget 找到您想要的.

If this is not exactly what you want, just use SelectorGadget to locate what you want.

希望这会有所帮助.

这里是完整的解决方案:

Here complete solution:

library(rvest)

html = read_html("http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1")

cast = html_nodes(html, ".table-dark-row:nth-child(2) .snapshot-td2:nth-child(2)")

html_text(cast) %>%
    gsub(pattern = "B", replacement = "") %>%
    as.numeric()

这篇关于使用 rvest 抓取网站 - 选择 html 节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆