R删除标题，评论数和“喜欢”的博客 [英] R scrape a blog for Title, number of comments and 'likes'

查看：294 发布时间：2017/3/6 2:22:12 facebook r curl web-scraping

本文介绍了R删除标题，评论数和“喜欢”的博客的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图使用R从几个博客中获取一些信息。我想抓取的数据是：

I'm trying to use R to grab some information from a few blogs. The data I'd like to grab is:

1) Date posted
2) Blog Post Title
3) Number of Comments
4) Number of Facebook likes.

此网志这里有我想要收集的所有字段。

This blog here has all the fields I'm looking to collect.

理想情况下，我想要一个数据框，如下所示：

Ideally I'd like a data frame that looks like this:

Post_Date      CommentCount       FB_Likes   Title
2012-12-05          1                 629      The James and Claudia Kripalu Workshop– The Daily Practice: Finding Success From Within
  ...              ...                ...          ...

在R做这个？它似乎可以用 RCurl 来做，但我不太熟悉 html / xml / js / etc 。

Is there a way to do this in R? It seems like something that might be doable with RCurl but I'm not too familiar with html/xml/js/etc.

到目前为止，这是我有的：

So far this is what I have:

library(RCurl)
library(XML)
xmlTreeParse(getURI("http://www.jamesaltucher.com"))

当我运行这个时，我收到开始和结束括号不匹配的错误。

when I run this I get errors that the opening and closing brackets don't match.

注意：这些不是我的博客，所以我没有管理员访问博客或他们的FB帐户。

NOTE: These are not my blogs so I don't have admin access to the blog or their FB account.

推荐答案

很难得到facebook。
我插入看看一个解决方案。我用gsub处理日期以获得漂亮的格式。

It is hard to get facebook like. I am intersting to see a solution. I treat dates with gsub to get pretty format.

library(XML)
library(RCurl)
url.link <- 'http://www.jamesaltucher.com/'
blog <- getURL(url.link)
blog          <- htmlParse(blog, encoding = "UTF-8")
titles  <- xpathSApply (blog ,"//*[@class='article']/h2/a",xmlValue)             ## titles
dates   <- xpathSApply (blog ,"//*[@class='article']/h2/span/text()",
             function(x) {
                 y <- gsub('.*on(.*)Post.*','\\1',xmlValue(x))
               }
             )
dates <- dates[dates != 'Posted by ']

这篇关于R删除标题，评论数和“喜欢”的博客的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R删除标题，评论数和“喜欢”的博客 [英] R scrape a blog for Title, number of comments and 'likes'

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

R删除标题，评论数和“喜欢”的博客 [英] R scrape a blog for Title, number of comments and &#39;likes&#39;

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

R删除标题，评论数和“喜欢”的博客 [英] R scrape a blog for Title, number of comments and 'likes'

登录关闭