使用 R JSON 抓取网页 [英] Webscraping with R JSON

查看:32
本文介绍了使用 R JSON 抓取网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过地区角色名称两列获取公司名称.我已经在每个页面上找到 json 链接,但是使用 RJSonio 它不起作用.它正在收集数据,但我怎么能把它变成一个可读的视图?谁能帮忙,谢谢.

I want to get the names of the companies by two columns Region and Name of role-player. I find json links on each page already, but with RJSonio it didnt work. It's collect data, but how could I get it to a readable view? Could anybody help, thanks.

这是链接

我在 Stackoverflow 上的另一个类似问题中尝试了这段代码

I try this code from another similiar question on Stackoverflow

library(RJSONIO)

library(RCurl)

抓取数据

raw_data <- getURL("http://www.milksa.co.za/admin/settings/mis_rest/webservicereceive/GET/index/page:1/regionID:7.json")
#Then covert from JSON into a list in R
data <- fromJSON(raw_data)

length(data)

final_data <- do.call(rbind, data)

head (final_data)

推荐答案

我个人对此的偏好是使用库 jsonlite 而根本不使用 fromJSON>

My personal preference for this is to use the library jsonlite and not use fromJSON at all

require(jsonlite)
data<-jsonlite::fromJSON(raw_data, simplifyDataFrame = TRUE)
finalData<-data.frame(cbind(data$rolePlayers$RolePlayer$orgName, data$rolePlayers$Region$RegionNameEng))
colnames(finalData)<-c("Name", "Region")  

为您提供以下数据框:

                                   Name       Region
                GoodHope Cheese (Pty) Ltd Western Cape
                       Jay Chem (Pty) Ltd Western Cape
                Coltrade International cc Western Cape
 GC Rieber Compact South Africa (Pty) Ltd Western Cape
                    Latana Cheese Pty Ltd Western Cape
                       Marco Frischknecht Western Cape

可在此处找到一种可视化查询方式和 JSON 字符串内容的好方法:Chris PhotoJSON 查看器

A great way to visualize how to query and what is in your JSON string can be found here:Chris Photo JSON viewer

您可以将它从 raw_data 剪切并粘贴到那里(删除外部引号).从那里可以很容易地看到如何像使用传统数据框和 $ 运算符那样使用寻址来构建数据.

You can just cut and paste it in there from the raw_data (removing external quotation marks). From there it becomes easy to see how to structure your data using addressing like you would with a traditional data frame and the $ operator.

这篇关于使用 R JSON 抓取网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆