使用 R 通过 API 访问 google docs 修订历史记录? [英] Accessing google docs revision history through the API using R?

查看:27
本文介绍了使用 R 通过 API 访问 google docs 修订历史记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望使用 R 下载并分析我的一个 google 文档的修订历史记录,找出统计数据,例如对谁进行了多少次编辑.

I wish to download and analyse the revision history of one of my google docs using R, finding out statistics like how many edits did whom.

我看到已经有一些方法 使用 R 访问谷歌文档.

I see that there are already some ways for accessing google docs using R.

有没有人预先知道(在我继续尝试破解我的方式之前),是否或如何(相当容易地)完成?

Does anyone know upfront (before I go ahead and try to hack my way), if or how it might be (reasonably easily) done?

谢谢.

推荐答案

googledrive 包中包含一些可用于此目的的低级 API 函数.例如,我们可以通过以下方式获取一个 Google 文档的修订列表:

The googledrive package includes some low-level API functions that can be used for this. For example, here's how we can get a list of revisions for one Google doc:

library(googledrive)
library(tidyverse)

# replace this with the ID of your google doc
# this doc is private, it wont work for you
fileId <- "1s0CPFXnMQjZNts6gYAnkcGXGSAgugTupzMf8YeoCbps"

# Get the name of the file and some other metadata
file <- build_request(
  path = "drive/v3/files/{fileId}",
  method = "GET",
  params = list(
    fileId = fileId,
    fields = "*"
  ),
  token = drive_token()
)
file_ret <-  process_response(make_request(file))

# Now for this doc, query the Drive API to get get URLs and other meta-data for all the revisions available to us

req2 <- build_request(
  path = "drive/v2/files/{fileId}/revisions",
  method = "GET",
  params = list(
    fileId = fileId
  ),
  token = drive_token()
)
revs2 <-  process_response(make_request(req2))

# See 
# https://developers.google.com/drive/api/v2/reference/revisions#resource
# for an explanation of each variable that we have here

# tidy revisions into a dataframe
revs2_df <-
  map_df(
    revs2$items,
    `[`,
    c(
      "kind",
      "etag" ,
      "id",
      "selfLink"   ,
      "mimeType"     ,
      "modifiedDate",
      "published"   ,
      "lastModifyingUserName"
    )
  )
# get exportLinks URLs out of its nest
revs2_export_url <- map_df(revs2$items, "exportLinks")
# bind together
revs2_df_bind <- bind_cols(revs2_df, revs2_export_url)

结果包括每个修订的日期、时间、执行修订的用户的姓名以及将该修订导出到下载文件的 URL:

The result includes, for each revision, the date, time, name of the user that did the revision, and URLs to export that revision into a download file:

# A tibble: 140 x 16
   kind   etag  id    selfLink mimeType modifiedDate published lastModifyingUs… `application/rt…
   <chr>  <chr> <chr> <chr>    <chr>    <chr>        <lgl>     <chr>            <chr>           
 1 drive… "\"H… 28367 https:/… applica… 2017-09-12T… FALSE     Gayoung Park     https://docs.go…
 2 drive… "\"H… 28487 https:/… applica… 2017-09-12T… FALSE     Gayoung Park     https://docs.go…
 3 drive… "\"H… 28862 https:/… applica… 2017-09-13T… FALSE     Gayoung Park     https://docs.go…
 4 drive… "\"H… 29221 https:/… applica… 2017-09-13T… FALSE     Gayoung Park     https://docs.go…
 5 drive… "\"H… 29258 https:/… applica… 2017-09-13T… FALSE     Gayoung Park     https://docs.go…
 6 drive… "\"H… 29434 https:/… applica… 2017-09-13T… FALSE     Gayoung Park     https://docs.go…
 7 drive… "\"H… 29454 https:/… applica… 2017-09-18T… FALSE     Gayoung Park     https://docs.go…
 8 drive… "\"H… 29603 https:/… applica… 2017-09-18T… FALSE     Gayoung Park     https://docs.go…
 9 drive… "\"H… 30108 https:/… applica… 2017-09-18T… FALSE     Gayoung Park     https://docs.go…
10 drive… "\"H… 30115 https:/… applica… 2017-09-21T… FALSE     Gayoung Park     https://docs.go…
# ... with 130 more rows, and 7 more variables: `application/vnd.oasis.opendocument.text` <chr>,
#   `text/html` <chr>, `application/pdf` <chr>, `application/epub+zip` <chr>,
#   `application/zip` <chr>,
#   `application/vnd.openxmlformats-officedocument.wordprocessingml.document` <chr>,
#   `text/plain` <chr>

然后,我们可以遍历导出 URL 以下载所有修订版,并比较大小或字数或其他内容,最终得到如下图:

We can then loop over the export URLs to download all the revisions, and compare the size or word count or whatever, and eventually get some plots like this:

这些图的完整代码在这里:https://gist.github.com/benmarwick/1feaa2b2f0d7bc920aed/a>

Full code for those plots is here: https://gist.github.com/benmarwick/1feaa2b2f0d7bc5f7e97903b8ff92aed

请注意,通过 API 提供的 Google Drive 修订历史有一些严重的限制.例如,

Be aware that there are some severe limitations to the Google Drive revision history that is available via the API. For example,

  • 当许多用户同时进行编辑时,我们只会获得该会话中第一个活动的编辑器的名称.其他人没有被捕获.
  • 当短时间内发生多次修改时,Google 会将这些修改合并为一个修订版本,我们无法单独查看它们.我们没有很好的时间分辨率.
  • Google 会删除旧的修改以节省空间.我们不知道他们对此有何规定.

这篇关于使用 R 通过 API 访问 google docs 修订历史记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆