Web锁定CSV到r中的数据帧 [英] Web locked CSV to dataframe in r

查看：315 发布时间：2017/2/25 0:54:49 r csv rstudio-server

本文介绍了Web锁定CSV到r中的数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在我试图访问的私人网络服务器上有一个文件。我必须首先去一个网站，用我的凭据登录，然后我可以键入一个URL（没有链接）来访问该文件，它立即下载一个csv文件到计算机。我试图让该csv文件自动加载到R或直接从在线或自动下载和从我的硬盘驱动器上传

我要刷新这个数据每天10-15次，这就是为什么我需要它自动，而不是每次手动下载。

我尝试过几个软件包，并对Hadley的软件包rvest印象深刻，这显示比我过去使用的一些东西容易得多。我正在成功下载数据：

 库（rvest）
 fs<  -  html_session（somewebsite.org ）
 fs.login<  -  fs％>％follow_link（登录）
 login.form<  -  html_form（fs.login）[[1]] 
 login .form< -set_values（login.form，userName =myusername，password =mypassword）
 active.session<  -  submit_form（fs.login，login.form）
 my.data <  -  jump_to（active.session，somewebsite.org/report/groups）

使用计时器运行它几次，它平均需要27秒，这表示它正在下载文件（大致相同的谷歌Chrome浏览器）。结果是具有7个元素的变量类会话43.7 Mb

  my.data

somewebsite / report / groups

状态： p>

类型：text / csv

大小：45856046

我的问题是如何访问r中的实际csv文件或数据？

  str （my.data）
 
 7 $ b的列表$ b $ handle：2的列表
 .. $ handle：Class'curl_handle'< externalptr> 
 .. $ url：chrsomewebsite.org
 ..- attr（*，class）= chrhandle
 $ config：7 
的列表。 。$ method：NULL 
 .. $ url：NULL 
 .. $ headers：NULL 
 .. $ fields：NULL 
 .. $ options：list of 1 
 .. .. $ autoreferer：int 1 
 .. $ auth_token：NULL 
 .. $ output：NULL 
 ..- attr（*，class）= chrrequest 
 $ url：chrhttps://somewebsite.org/report/groups
 $ back：chrhttps://somewebsite.org/report/groups
 $ forward：chr （0）
 $响应：列表10 
 .. $ url：chrhttps://somewebsite.org/report/groups
 .. $ status_code：int 200 
 .. $ headers：6 of 6 
 .. .. $ content-disposition：chrattachment; filename = \groups-2016-0318-063749.csv\
 .. .. $ content-type：chrtext / csv
 .. .. $ date：chrFri，18 Mar 2016 18:37:49 GMT
 .. .. $ server：chr Apache-Coyote / 1.1
 .. .. $ transfer-encoding：chrchunked
 .. .. $ connection：chr关闭
 .. ..- attr *，class）= chr [1：2]不敏感列表
 .. $ all_headers：1的列表
 .. .. $：3的列表
 .. .. .. $ status：int 200 
 .. .. .. $ version：chrHTTP / 1.1
 .. .. .. $ headers：6 
的列表.. .. .. .. $ content-disposition：chrattachment; filename = \groups-2016-0318-063749.csv\
 .. .. .. .. $ content-type：chrtext / csv
 .. .. 。$ date：chrFri，18 Mar 2016 18:37:49 GMT
 .. .. .. .. $ server：chrApache-Coyote / 1.1
 .. .. .. $ transfer-encoding：chrchunked
 .. .. .. .. $ connection：chr关闭
 .. .. .. ..- attr（*， class）= chr [1：2]insensitivelist
 .. $ cookies：'data.frame'：7个变量的6个obs：
 .. .. $ domain： chr [1：6]somewebsite.org＃HttpOnly_.site.orgsignin.site.org.site.org... 
 .. .. $ flag：logi [1： 6] FALSE TRUE FALSE TRUE FALSE TRUE 
 .. .. $ path：chr [1：6]////... 
 .. .. ：logi [1：6] FALSE TRUE FALSE FALSE TRUE TRUE 
 .. .. $ expiration：POSIXct [1：6]，格式：2017-03-18 12:37:16NA NA NA .. 。
 .. .. $ name：chr [1：6]fs_experimentsObssOCookieTS01289383TS01b89640... 
 .. .. $ value：chr [1：6] u％3D-anon-％2Ca％3Dshared-ui％2Cs％3Dac76fc702b255a493a5856b5432b92b4％2Cv％3D0100110011010000000111111111001110101101000000000001100| __truncated__15yUK2dU％2B78GK7o587gtwh3i％2ByORXGD8ne5XJBiGkiHpDAJ3％2F7rQ4Gql6T5DqQIwCg％2FSwSioAMIzzaRxGEFKsCkc％2BGohi1fdWhbR0urah6％2BJikm9lA6| __truncated__01999b7023d69473f53740d0f7f2969d9d79e1a18c7e259f6baf643ce642a330fc0a3604d701999b7023960237ab42ec3f429e5a452fe3559d683a090b19a65cf66ce0c01bc21bdb29bf78f030d36d4eeff4dec21ff185c54b06...... 
 .. $内容：生[1：45857717] 69 64 2C 6E ... 
 .. $日期：POSIXct [1：1]，格式为： 2016-03-18 18:37:49
 .. $ times：Named num [1：6] 0 0 0.062 0.156 27.425 ... 
 .. ..- attr（*， name）= chr [1：6]redirectnamelookupconnectpretransfer... 
 .. $ request：7的列表
 .. .. $ method：chr GET
 .. .. $ url：chrhttps://somewebsite.org/report/groups
 .. .. $ headers：Named chrapplication / json，text / xml，application / xml，* / *
 .. .. ..- attr（*，names）= chrAccept
 .. .. $ fields：NULL 
 .. 。$ options：4 of 4 
 .. .. .. $ useragent：chrlibcurl / 7.43.0 r-curl / 0.9.6 httr / 1.0.0
 .. .. .. $ cainfo：chrC：/Users/Thisuser/Documents/R/win-library/3.2/httr/cacert.pem
 .. .. .. $ autoreferer：int 1 
 .. .. $ customrequest：chrGET
 .. .. $ auth_token：NULL 
 .. .. $ output：list（）
 .. .. ..- attr（* ，class）= chr [1：2]write_memorywrite_function
 .. ..- attr（*，class）= chrrequest
 .. $ handle：Class 'curl_handle'< externalptr> 
 ..- attr（*，class）= chrresponse
 $ html：< environment：0x000000001aad2f60& 
  -  attr（*，class）= chrsession

解决方案

数据存储在名为content的列表项中。来自readr包的 read_csv 应该能够直接读取。

请尝试以下操作：

  library（httr）
 library（readr）
 
 read_csv（my.data $ content）

I have a file on a private web server I am trying to access. I must first go to a site and login with my credentials and then I can type a URL (there is no link) to access the file, which immediately downloads a csv file to the computer. I am trying to get that csv file to automatically load into R either direct from online or have it automatically download and uploaded from my hard drive

I am going to be refreshing this data 10-15 times a day which is why I need it automatic rather than manually downloading it every time.

I have tried a with several packages and have been impressed with Hadley's package rvest which has shown much easier than some things I have used in the past. I am succeeding in downloading the data:

library(rvest)
fs <- html_session("somewebsite.org")
fs.login <- fs %>% follow_link("Sign In")
login.form <- html_form(fs.login)[[1]]
login.form <-set_values(login.form, userName = "myusername", password =      "mypassword")
active.session <- submit_form(fs.login, login.form)
my.data <- jump_to(active.session, "somewebsite.org/report/groups")

I have ran it with a timer several times and it takes an average of 27 seconds which indicates it is downloading the file (roughly the same amount that it takes Google Chrome). The result is a variable class session with 7 elements 43.7 Mb

my.data

somewebsite/report/groups

Status: 200

Type: text/csv

Size: 45856046

My question is how can I access the actual csv file or data in r?

str(my.data)

List of 7  
 $ handle  :List of 2  
  ..$ handle:Class 'curl_handle' <externalptr>   
  ..$ url   : chr "somewebsite.org"  
  ..- attr(*, "class")= chr "handle"  
 $ config  :List of 7  
  ..$ method    : NULL  
  ..$ url       : NULL  
  ..$ headers   : NULL  
  ..$ fields    : NULL  
  ..$ options   :List of 1  
  .. ..$ autoreferer: int 1  
  ..$ auth_token: NULL  
  ..$ output    : NULL  
  ..- attr(*, "class")= chr "request"  
 $ url     : chr "https://somewebsite.org/report/groups"  
 $ back    : chr "https://somewebsite.org/report/groups"  
 $ forward : chr(0)   
 $ response:List of 10  
  ..$ url        : chr "https://somewebsite.org/report/groups"  
  ..$ status_code: int 200  
  ..$ headers    :List of 6  
  .. ..$ content-disposition: chr "attachment; filename=\"groups-2016-0318-063749.csv\""  
  .. ..$ content-type       : chr "text/csv"  
  .. ..$ date               : chr "Fri, 18 Mar 2016 18:37:49 GMT"  
  .. ..$ server             : chr "Apache-Coyote/1.1"  
  .. ..$ transfer-encoding  : chr "chunked"  
  .. ..$ connection         : chr "Close"  
  .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"  
  ..$ all_headers:List of 1  
  .. ..$ :List of 3  
  .. .. ..$ status : int 200  
  .. .. ..$ version: chr "HTTP/1.1"  
  .. .. ..$ headers:List of 6  
  .. .. .. ..$ content-disposition: chr "attachment; filename=\"groups-2016-0318-063749.csv\""  
  .. .. .. ..$ content-type       : chr "text/csv"  
  .. .. .. ..$ date               : chr "Fri, 18 Mar 2016 18:37:49 GMT"  
  .. .. .. ..$ server             : chr "Apache-Coyote/1.1"  
  .. .. .. ..$ transfer-encoding  : chr "chunked"  
  .. .. .. ..$ connection         : chr "Close"  
  .. .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"  
  ..$ cookies    :'data.frame': 6 obs. of  7 variables:  
  .. ..$ domain    : chr [1:6] "somewebsite.org" "#HttpOnly_.site.org" "signin.site.org" ".site.org" ...  
  .. ..$ flag      : logi [1:6] FALSE TRUE FALSE TRUE FALSE TRUE  
  .. ..$ path      : chr [1:6] "/" "/" "/" "/" ...  
  .. ..$ secure    : logi [1:6] FALSE TRUE FALSE FALSE TRUE TRUE  
  .. ..$ expiration: POSIXct[1:6], format: "2017-03-18 12:37:16" NA NA NA ...  
  .. ..$ name      : chr [1:6] "fs_experiments" "ObSSOCookie" "TS01289383" "TS01b89640" ...  
  .. ..$ value     : chr [1:6] "u%3D-anon-%2Ca%3Dshared-ui%2Cs%3Dac76fc702b255a493a5856b5432b92b4%2Cv%3D0100110011010000000111111111001110101101000000000001100"| __truncated__ "15yUK2dU%2B78GK7o587gtwh3i%2ByORXGD8ne5XJBiGkiHpDAJ3%2F7rQ4Gql6T5DqQIwCg%2FSwSioAMIzzaRxGEFKsCkc%2BGohi1fdWhbR0urah6%2BJikm9lA6"| __truncated__ "01999b7023d69473f53740d0f7f2969d9d79e1a18c7e259f6baf643ce642a330fc0a3604d7" "01999b7023960237ab42ec3f429e5a452fe3559d683a090b19a65cf66ce0c01bc21bdb29bf78f030d36d4eeff4dec21ff185c54b06" ...  
  ..$ content    : raw [1:45857717] 69 64 2c 6e ...  
  ..$ date       : POSIXct[1:1], format: "2016-03-18 18:37:49"  
  ..$ times      : Named num [1:6] 0 0 0.062 0.156 27.425 ...  
  .. ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...  
  ..$ request    :List of 7  
  .. ..$ method    : chr "GET"  
  .. ..$ url       : chr "https://somewebsite.org/report/groups"  
  .. ..$ headers   : Named chr "application/json, text/xml, application/xml, */*"  
  .. .. ..- attr(*, "names")= chr "Accept"  
  .. ..$ fields    : NULL  
  .. ..$ options   :List of 4  
  .. .. ..$ useragent    : chr "libcurl/7.43.0 r-curl/0.9.6 httr/1.0.0"  
  .. .. ..$ cainfo       : chr "C:/Users/Thisuser/Documents/R/win-library/3.2/httr/cacert.pem"  
  .. .. ..$ autoreferer  : int 1  
  .. .. ..$ customrequest: chr "GET"  
  .. ..$ auth_token: NULL  
  .. ..$ output    : list()  
  .. .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"  
  .. ..- attr(*, "class")= chr "request"  
  ..$ handle     :Class 'curl_handle' <externalptr>   
  ..- attr(*, "class")= chr "response"  
 $ html    :<environment: 0x000000001aad2f60>   
 - attr(*, "class")= chr "session"

解决方案

The data are stored in the list item named "content". read_csv from the "readr" package should be able to read it directly.

Try the following:

library(httr)
library(readr)

read_csv(my.data$content)

这篇关于Web锁定CSV到r中的数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Web锁定CSV到r中的数据帧 [英] Web locked CSV to dataframe in r

问题描述

相关文章

Office最新文章

热门教程

热门工具

登录关闭

Web锁定CSV到r中的数据帧 [英] Web locked CSV to dataframe in r

问题描述

相关文章

Office最新文章

热门教程

热门工具

登录 关闭

登录关闭