如何获得谷歌搜索结果 [英] How to get google search results

查看:35
本文介绍了如何获得谷歌搜索结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用了以下代码:

library(XML)
library(RCurl)
getGoogleURL <- function(search.term, domain = '.co.uk', quotes=TRUE) 
    {
    search.term <- gsub(' ', '%20', search.term)
    if(quotes) search.term <- paste('%22', search.term, '%22', sep='') 
        getGoogleURL <- paste('http://www.google', domain, '/search?q=',
        search.term, sep='')
    }

    getGoogleLinks <- function(google.url) 
    {
       doc <- getURL(google.url, httpheader = c("User-Agent" = "R(2.10.0)"))
       html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){})
       nodes <- getNodeSet(html, "//a[@href][@class='l']")
       return(sapply(nodes, function(x) x <- xmlAttrs(x)[[1]]))
    }

search.term <- "cran"
quotes <- "FALSE"
search.url <- getGoogleURL(search.term=search.term, quotes=quotes)

links <- getGoogleLinks(search.url)

我想找到所有搜索结果的链接,结果如下:

I would like to find all the links that resulted from my search and I get the following result:

> links
list()

如何获取链接?另外我想获得谷歌搜索结果的标题和摘要,我怎样才能获得呢?最后,有没有办法获取位于 ChillingEffects.org 结果中的链接?

How can I get the links? In addition I would like to get the headlines and summary of google results how can I get it? And finally is there a way to get the links that resides in ChillingEffects.org results?

推荐答案

如果查看html变量,可以看到搜索结果链接全部嵌套在<h3class="r"> 标签.

If you look at the htmlvariable, you can see that the search result links all are nested in <h3 class="r"> tags.

尝试将您的 getGoogleLinks 函数更改为:

Try to change your getGoogleLinks function to:

getGoogleLinks <- function(google.url) {
   doc <- getURL(google.url, httpheader = c("User-Agent" = "R
                                             (2.10.0)"))
   html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function
                          (...){})
   nodes <- getNodeSet(html, "//h3[@class='r']//a")
   return(sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]]))
}

这篇关于如何获得谷歌搜索结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆