将 url 中的 xls 文件下载到数据框 (Rcurl) 中? [英] Download a xls file from url into a dataframe (Rcurl)?

查看:23
本文介绍了将 url 中的 xls 文件下载到数据框 (Rcurl) 中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将以下网址下载到 R 数据框中:

I'm trying to download the following url into an R dataframe:

http://www.fantasypros.com/nfl/rankings/qb.php/?export=xls

(这是公共页面上的导出"链接:http://www.fantasypros.com/nfl/rankings/qb.php/)

(It's the 'Export' link on the public page: http://www.fantasypros.com/nfl/rankings/qb.php/)

但是,我不确定如何解析"数据?我还希望自动执行此操作并每周执行一次,因此非常感谢有关如何将其构建为每周访问工作流程的任何想法!已经在谷歌搜索和搜索 stackoverflow 几个小时了,但无济于事...... :-)

However, I'm not sure how to 'parse' the data? I'm also looking to automate this and perform it weekly, so any thoughts on how to build this into a weekly-access workflow would be greatly appreciated! Have been google searching and scouring stackoverflow for a couple hours now to no avail... :-)

谢谢,

贾斯汀

尝试的代码:

getURL("http://www.fantasypros.com/nfl/rankings/qb.php?export=xls")

这只是给了我一个像这样开头的字符串:

This just gives me a string that starts like:

[1] "FantasyPros.com \t \n第 8 周 - QB 排名 \t \n专家共识排名 (ECR) \t \n\n 排名 \t 球员姓名 \tTeam \t Matchup \tBest Rank \t 最差排名 \t Ave Rank \t Std Dev \t\n1\tPeyton Manning\tDEN\t vs. WAS\t1\t5\t1.2105263157895\t0.58877509625419\t\t\n2\tDrew Brees\tNO对比BUF\t1\t7\t2.6287878787879\t1.0899353819483\t\t\n3\tA...

推荐答案

欢迎使用 R.听起来您喜欢在 Excel 中进行分析.那完全没问题,但事实上,您要求从网络上抓取数据并询问 R,我认为可以安全地假设您将开始为您的分析寻找编程方法.

Welcome to R. It sounds like you love to do your analysis in Excel. Thats completely fine, but the fact that you are asking to crawl data from the web AND are asking about R, I think its safe to assume that you will start to find programming your analyses is the way to go.

也就是说,您真正想做的是抓取网络.有很多关于如何用 R 做到这一点的例子,就在 SO 上.寻找诸如网页抓取"、抓取"和屏幕抓取"之类的内容.

That said, what you really want to do is crawl the web. There are tons of examples of how to do this with R, right here on SO. Look for things like "web scraping", "crawling", and "screen scraping".

好的,对话放在一边.不用担心抓取 XL 格式的数据.您可以直接使用 R 解析数据.大多数网站使用一致的命名约定,因此使用 for 循环并为您的数据集构建 URL 将很容易.

Ok, dialogue aside. Don't worry about grabbing the data in XL format. You can parse the data directly with R. Most websites use a consistent naming convention, so using a for loop and building the URLs for your datasets will be easy.

以下是直接使用 R 将页面解析为 data.frame 的示例,其作用与 XL 中的表格数据非常相似.

Below is an example of parsing your page, directly with R, into a data.frame which acts very similar to tablular data in XL.

## load the packages you will need
# install.packages("XML")
library(XML)

## Define the URL -- you could dynamically build this
URL = "http://www.fantasypros.com/nfl/rankings/qb.php"

## Read the tables form the page into R
tables = readHTMLTable(URL)

## how many do we have
length(tables)

## look at the first one
tables[1]
## thats not it

## lets look at the 2nd table
tables[2]

## bring it into a dataframe
df = as.data.frame(tables[2])

如果您是第一次使用 R,您可以使用 install.packages("PackageNameHere") 命令轻松安装外部软件包.但是,如果您认真学习 R,我会考虑使用 RStudio IDE.它真的让我的学习曲线在很多层面上都变得平坦.

If you are using R for the first time, you can install external packages pretty easily with the command install.packages("PackageNameHere"). However, if you are serious about learning R, I would look into using the RStudio IDE. It really flattened the learning curve for me on a ton of levels.

这篇关于将 url 中的 xls 文件下载到数据框 (Rcurl) 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆