如何在R中下载Google工作表中的所有工作表 [英] How to download all sheets in a google sheet in R

查看:73
本文介绍了如何在R中下载Google工作表中的所有工作表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望在R中的单个Google工作表中下载所有工作表.我目前正在使用 [maxconway] [1] gsheet 包,该程序包允许我使用其URL下载工作表,但仅适用于单个工作表,由 gid 区分.我要下载的一组Google表格有100多个表格,这使得使用 gsheet 一次一个地下载它们非常不便-有人知道有任何R软件包可以自动执行此操作或以任何方式在一个Google工作表中循环浏览所有工作表?
这是我当前拥有的代码,该代码仅作为小标题下载100多张纸中的第一张:

I'm looking to download all of the sheets in a single google sheet in R. I'm currently using the gsheet package by [maxconway][1], which allows me to download a sheet using its URL, but it only works on individual sheets, which are differentiated by a gid. The set of google sheets I'm trying to download has over 100 sheets, which makes downloading them one by one with gsheet massively inconvenient - does anyone know of any R packages that automate this or of any way to loop through all of the sheets in a single google sheet?
Here is the code I currently have which downloads just the first of over 100 sheets as a tibble:

all_rolls <- gsheet2tbl('https://docs.google.com/spreadsheets/d/1OEg29XbL_YpO0m5JrLQpOPYTnxVsIg8iP67EYUrtRJg/edit#gid=26346344')

> head(all_rolls)
# A tibble: 6 x 14
  Episode Time   Character `Type of Roll` `Total Value` `Natural Value` `Crit?` `Damage Dealt` `# Kills`
    <int> <drtn> <chr>     <chr>          <chr>         <chr>           <chr>   <chr>              <int>
1       1 37'53" Vex'ahlia Intelligence   20            18              <NA>    <NA>                  NA
2       1 41'48" Grog      Persuasion     19            18              <NA>    <NA>                  NA
3       1 43'25" Keyleth   Persuasion     2             2               <NA>    <NA>                  NA
4       1 46'35" Tiberius  Persuasion     12            3               <NA>    <NA>                  NA
5       1 46'35" Tiberius  Persuasion     27            18              <NA>    <NA>                  NA
6       1 46'35" Percy     Assist         21            15              <NA>    <NA>                  NA
# … with 5 more variables: Notes <chr>, `Non-Roll Kills` <chr>, X12 <chr>, X13 <chr>, X14 <chr>

注意:我尝试删除 #gid 字段,但随后它只是下载了第一张表.

Note: I've tried removing the #gid field, but then it just downloads the first sheet.

推荐答案

UPDATE 2021-01-31 :已更新代码以使用新功能替换了 sheets_find()和从 googlesheets4 0.2.0版开始的 sheets_sheets().

UPDATE 2021-01-31: updated code to use new functions that replaced sheets_find() and sheets_sheets() as of googlesheets4 version 0.2.0.

googlesheets4 软件包包括一个列出与帐户的Google云端硬盘关联的所有工作表的功能: sheets_find().从工作表列表中,可以使用工作表ID将工作表读入R.

The googlesheets4 package includes a function to list all sheets associated with an account's Google Drive: sheets_find(). From the list of sheets one can use the sheet IDs to read the sheets into R.

library(googlesheets4)
sheets_auth()
theSheets <- gs4_find()
theSheets

我在Google上的测试帐户有一个Google表格,即《神奇宝贝统计》的电子表格.

My test account on Google has one Google sheet, a spreadsheet of Pokémon Stats.

> theSheets
# A tibble: 1 x 3
  name         id                                           drive_resource   
* <chr>        <chr>                                        <list>           
1 PokemonStats 13rGxY7ScDUl7bFJ9NipO7QUafEACYTH4MagFjcj4pVw <named list [34]>

我们可以使用ID字段下载工作表.

We can use the ID field to download the sheet.

pokemonData <- sheets_read(theSheets$id[1])
head(pokemonData)


> head(pokemonData)
# A tibble: 6 x 13
  Number Name  Type1 Type2 Total    HP Attack Defense SpecialAtk SpecialDef Speed
   <dbl> <chr> <chr> <chr> <dbl> <dbl>  <dbl>   <dbl>      <dbl>      <dbl> <dbl>
1      1 Bulb… Grass Pois…   318    45     49      49         65         65    45
2      2 Ivys… Grass Pois…   405    60     62      63         80         80    60
3      3 Venu… Grass Pois…   525    80     82      83        100        100    80
4      3 Venu… Grass Pois…   625    80    100     123        122        120    80
5      4 Char… Fire  NA      309    39     52      43         60         50    65
6      5 Char… Fire  NA      405    58     64      58         80         65    80
# … with 2 more variables: Generation <dbl>, Legendary <lgl>
> 

一个人可以将向量 theSheets $ id lapply()一起使用,以从Google云端硬盘读取一组工作表,如下所示:

One could use the vector theSheets$id with lapply() to read a group of sheets from Google Drive as follows:

sheetList <- lapply(theSheets$id,sheet_read)

要读取Google表格电子表格中的多个工作表,请在 sheet_read()中添加 sheet = 参数.在这里,我们从神奇宝贝统计资料"电子表格的第二个工作表中读取了神奇宝贝类型".

To read multiple worksheets within a Google Sheets spreadsheet, we add the sheet= argument to sheet_read(). Here we read the Pokémon Types from the second worksheet in the Pokémon Stats spreadsheet.

pokemonTypes <- sheets_read(theSheets$id[1],sheet = 2)
head(pokemonTypes)

...以及输出:

> head(pokemonTypes)
# A tibble: 6 x 1
  Type    
  <chr>   
1 Fire    
2 Grass   
3 Poison  
4 Water   
5 Bug     
6 Fighting
> 

阅读电子表格中的所有工作表

我们可以自动化从单个电子表格中读取多个标签的过程. sheets_sheets()函数可用于此目的.

# technique where we read multiple worksheets by worksheet name
# using functions from googlesheets4 version 0.2.0. 
theSheets <-gs4_find()
# get metadata from first sheet
sheetMetadata <- gs4_get(theSheets$id[1])
# get worksheet tab names
sheetNames <- sheet_names(theSheets$id[1])
sheetNames

这时我们可以看到PokémonStats电子表格中有两个工作表标签.我们将向量 sheetNames lapply()结合使用以读取主电子表格中的所有工作表.

At this point we can see that there are two worksheet tabs in the Pokémon Stats spreadsheet. We use the vector sheetNames with lapply() to read all the worksheets within the main spreadsheet.

theWorksheets <- lapply(sheetNames, function(x){             
     sheets_read(theSheets$id[1],sheet = x) 
})
# use the `names()` function to name the data frames stored in the list
names(theWorksheets) <- sheetNames
lapply(theWorksheets,head)

...以及输出:

> lapply(theWorksheets,head)
$Pokemon
# A tibble: 6 x 13
  Number Name  Type1 Type2 Total    HP Attack Defense SpecialAtk SpecialDef Speed
   <dbl> <chr> <chr> <chr> <dbl> <dbl>  <dbl>   <dbl>      <dbl>      <dbl> <dbl>
1      1 Bulb… Grass Pois…   318    45     49      49         65         65    45
2      2 Ivys… Grass Pois…   405    60     62      63         80         80    60
3      3 Venu… Grass Pois…   525    80     82      83        100        100    80
4      3 Venu… Grass Pois…   625    80    100     123        122        120    80
5      4 Char… Fire  NA      309    39     52      43         60         50    65
6      5 Char… Fire  NA      405    58     64      58         80         65    80
# … with 2 more variables: Generation <dbl>, Legendary <lgl>

$Metadata
# A tibble: 6 x 1
  Type    
  <chr>   
1 Fire    
2 Grass   
3 Poison  
4 Water   
5 Bug     
6 Fighting

> 

此时,可以使用提取操作符的 $ 形式访问单个工作表,如 theWorksheets $ Pokemon theWorksheets $ Metadata .

At this point individual worksheets can be accessed with the $ form of the extract operator, as theWorksheets$Pokemon or theWorksheets$Metadata.

这篇关于如何在R中下载Google工作表中的所有工作表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆