Scrapy Google 搜索 [英] Scrapy Google Search

查看：43 发布时间：2021/7/16 22:15:36 python web-scraping scrapy

本文介绍了Scrapy Google 搜索的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试抓取谷歌搜索，人们也在搜索链接.

I am trying to scrape google search and people also search links.

例如，当您在 google 上搜索Christopher Nolan"时.谷歌还制作了一个人们也在搜索"其中包括与我们的搜索相关的人的图像.在这种情况下，我们的 People 还会搜索产品(Christian bale、Emma Thomas、Zack Synder 等).我有兴趣抓取这些数据.

Example when you go on google and you search for "Christopher Nolan". Google also produces a "people also search for" which includes images of people related to the our search. In this case our People also search produces (Christian bale, Emma Thomas, Zack Synder etc). I am interested in scraping this data.

我正在使用 scrapy 框架并编写了一个简单的 scraper，但它返回一个空的 CSV 数据文件.以下是我到目前为止的代码，感谢您的帮助.希望一切都清楚我想要实现的目标.我使用 Xpath 助手(谷歌应用程序)来帮助找到 Xpath.

I am using scrapy framework and wrote a simple scraper but it returns an empty CSV data file. Below is code I have so far your help is appreciated. Hope everything is clear in what i want to achieve. I used Xpath helper (google app) to help find the Xpath.

我的代码:

# PyGSSpider(spidder folder)
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from PyGoogleSearch.items import PyGSItem
import sys

class PyGSSpider(CrawlSpider):
    name = "google"
    allowed_domains = ["www.google.com"]
    start_urls = ["https://www.google.com/#q=christopher+nolan"]

    #Extracts Christopher Nolan link     
    rules = [
        Rule(SgmlLinkExtractor(allow=("https://www.google.com/search?q=christpher+noaln&oq=christpher+noaln&aqs")), follow=True),
        Rule(SgmlLinkExtractor(allow=()), callback='parse_item')
    ]
   
    #Parse function for extracting the people also search link.
    def parse_item(self,response):
        self.log('Hi, this is an item page! %s' % response.url)
        sel=Selector(response)
        item=PyGSItem()
        item['peoplealsosearchfor'] = sel.xpath('//div[@id="cnt"]/@href').extract()
       
        return item

items.py:

from scrapy.item import Item, Field

class PyGSItem(Item):
    peoplealsosearchfor = Field()

Scrapy Google 搜索 [英] Scrapy Google Search

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Scrapy Google 搜索 [英] Scrapy Google Search

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭