爬虫数据表单网站使用 Scrapy 1.5.0 - Python [英] Crawler data form website use Scrapy 1.5.0 - Python

查看:31
本文介绍了爬虫数据表单网站使用 Scrapy 1.5.0 - Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用 Scrapy (1.5.0)- Python 从网站中抓取数据

I try to crawler data form a website with Scrapy (1.5.0)- Python

项目目录:

stack/
    scrapy.cfg           

    stack/            
        __init__.py

        items.py          

        pipelines.py      

        settings.py       

        spiders/         
            __init__.py
              stack_spider.py

这是我的 items.py

Here is my items.py

import scrapy

class StackItem(scrapy.Item):
    title = scrapy.Field()

这里是 stack_spider.py

and here is stack_spider.py

from scrapy import Spider
from scrapy.selector import Selector

from stack.items import StackItem

class StackSpider(Spider):
    name = "stack"
    allowed_domains = ["batdongsan.com.vn"]
    start_urls = [
        "https://batdongsan.com.vn/nha-dat-ban",
    ]

    def parse(self, response):
        questions = Selector(response).xpath('//div[@class="p-title"]/h3')

        for question in questions:
            item = StackItem()
            item['title'] = question.xpath(
                'a/text()').extract()[0]

            yield item

我不知道为什么我不能抓取数据,我真的需要你的帮助.谢谢

I don't know why i can't crawler the data, i really need your help. Thanks

推荐答案

设置用户代理

去你的scrapy项目settings.py

goto your scrapy projects settings.py

并将其粘贴进去,

USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'

这篇关于爬虫数据表单网站使用 Scrapy 1.5.0 - Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆