R Web 抓取 Excel 电子表格 URL 以使用 openxlsx 读取 [英] R Web scrape Excel spreadsheet URLs to read with openxlsx

查看:24
本文介绍了R Web 抓取 Excel 电子表格 URL 以使用 openxlsx 读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将 Excel 文件的一部分读入 R.我有一些现有代码,但权限更改了源代码.以前,有指向文档的直接 URL,现在只能通过网站登录页面访问指向文档的链接.

I need to read parts of an Excel file into R. I have some existing code, but the authority changed the source. Previously, there was a direct URL to the document, now the link to the document can only be accessed through a website landing page.

有人能告诉我用哪个包可以实现吗?Excel 文件的链接是:http://www.snamretegas.it/it/business-servizi/dati-operativi-business/8_dati_operativi_bilanciamento_sistema/我在那里查看文档:Dati opetivi relativi al bilanciamento del sistema post Del. 312/2016/R/gas - Database 2018"

Could someone tell me with which package I can achieve that? The link to the Excel file is: http://www.snamretegas.it/it/business-servizi/dati-operativi-business/8_dati_operativi_bilanciamento_sistema/ There i am looking at the document: "Dati operativi relativi al bilanciamento del sistema post Del. 312/2016/R/gas - Database 2018"

我添加了之前的代码以了解我做了什么.如您所见,第一步我只需要 read.xlsx.

I add the previous code to give an idea what I did. As you can see, I only required read.xlsx for this first step.

非常感谢!

  library(ggplot2)
  library(lubridate)
  library(openxlsx)
  library(reshape2)
  library(dplyr)

  Bilres <- read.xlsx(xlsxFile = "http://www.snamretegas.it/repository/file/Info-storiche-qta-gas-trasportato/dati_operativi/2017/DatiOperativi_2017-IT.xlsx",sheet = "Storico_G", startRow = 1, colNames = TRUE)


  # Selecting Column R from Storico_G and stored in variable Bilres_df

  Bilres_df <- data.frame(Bilres$pubblicazione, Bilres$BILANCIAMENTO.RESIDUALE )

  # Conerting pubblicazione in date format and time
  Bilres_df$pubblicazione <- ymd_h(Bilres_df$Bilres.pubblicazione)
  Bilreslast=tail(Bilres_df,1)
  Bilreslast=data.frame(Bilreslast)
  Bilreslast$Bilres.BILANCIAMENTO.RESIDUALE <- as.numeric(as.character((Bilreslast$Bilres.BILANCIAMENTO.RESIDUALE)))

推荐答案

如果从网页中复制 URL,则可以先使用 download.files() 下载为二进制文件并使用 read.xlsx() 读取数据.根据网页内容更改的频率,您最好只复制 URL,而不是从页面解析它.

If you copy the URL from the web page, you can then use download.files() first to download as a binary file and use read.xlsx() to read the data. Depending on how frequently the content changes on the web page, you may be better off just copying the URL than parsing it from the page.

oldFile <- "http://www.snamretegas.it/repository/file/Info-storiche-qta-gas-trasportato/dati_operativi/2017/DatiOperativi_2017-IT.xlsx"
newFile <- "http://www.snamretegas.it/repository/file/it/business-servizi/dati-operativi-business/dati_operativi_bilanciamento_sistema/2017/DatiOperativi_2017-IT.xlsx"

if(!file.exists("./data/downloadedXlsx.xlsx")){
     download.file(newFile,"./data/downloadedXlsx.xlsx",
                   method="curl", #use "curl" for OS X / Linux, "wininet" for Windows
                   mode="wb") # "wb" means "write binary"

} else message("file already loaded locally, using disk version")

library(openxlsx)
Bilres <- read.xlsx(xlsxFile = "./data/downloadedXlsx.xlsx",
                sheet = "Storico_G", startRow = 1, colNames = TRUE)
head(Bilres[,1:3])

...和输出:

> head(Bilres[,1:3])
  pubblicazione aggiornato.il IMMESSO
1 2017_01_01_06      42736.24 1915484
2 2017_01_01_07      42736.28 1915484
3 2017_01_01_08      42736.33 1866326
4 2017_01_01_09      42736.36 1866326
5 2017_01_01_10      42736.41 1866326
6 2017_01_01_11      42736.46 1866326
> 

更新:添加了避免在文件下载后下载文件的逻辑.

UPDATE: Added logic to avoid downloading the file once it has been downloaded.

这篇关于R Web 抓取 Excel 电子表格 URL 以使用 openxlsx 读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆