使用 xpathSApply 进行网页抓取.获取 xmlValue [英] Web-scraping with xpathSApply. Getting xmlValue

查看:63
本文介绍了使用 xpathSApply 进行网页抓取.获取 xmlValue的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,我想提取价格(右上角)和空间(容纳:2,浴室:1 等)https://www.airbnb.com/rooms/12949270?guests=1&s=_JaPbz-J

For example, I want to extract the price(top-right) and The space(Accommodates: 2,Bathrooms: 1 etc) https://www.airbnb.com/rooms/12949270?guests=1&s=_JaPbz-J

这是我的价格代码:

remDr$navigate(url)
doc <- htmlParse(remDr$getPageSource()[[1]])
var <- remDr$findElement('id','details')

varxml <- htmlTreeParse(vartxt, useInternalNodes=T)
Price <- xpathApply(varxml,"//div[@class='book-it__price-amount h3 text-special pull-left']",xmlValue)

但它返回空列表.也许它发生了,因为'book-it__price-amount h3 text-special pull-left'不是上层阶级?如果是这样 - 如何纠正?如果不是,我在哪里犯了错误?

But it returns me empty list. Maybe it hapepend, beacause the class "'book-it__price-amount h3 text-special pull-left' is not the upper class? If so - how to correct that? If not, where did I make a mistake?

推荐答案

对我来说,下面的代码有效.关于网络上禁止的刮刀.一般来说,如果不允许使用刮刀,如果您将数据用于商业目的或者您定期发送获取请求,则您将承担风险.所以取决于你将如何使用它

For me the code below works. About forbidden scraper on the web. In general if it's not allowed to use scraper you take risk if you use data for commercial purposes or you on regular bases send get requests. So depends how you are gonna use it

library(RCurl)
library(XML)

url<-getURL("https://www.airbnb.cz/rooms/12949270?guests=1&s=_JaPbz-J",ssl.verifypeer = F)
url2<-htmlParse(url)
Price <- xpathSApply(url2,"//div[@class='book-it__price-amount h3 text-special pull-left']",xmlValue)
conditions <- xpathSApply(url2,"//div[@class='col-md-6']",xmlValue)

这篇关于使用 xpathSApply 进行网页抓取.获取 xmlValue的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆