Scrapy Image Pipeline:如何重命名图像? [英] Scrapy Image Pipeline: How to rename images?
本文介绍了Scrapy Image Pipeline:如何重命名图像?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个蜘蛛,可以同时获取数据和图像.我想用我正在获取的相应标题"重命名图像.
I've a spider which fetches both the data and images. I want to rename the images with the respective 'title' which i'm fetching.
以下是我的代码:
spider1.py
from imageToFileSystemCheck.items import ImagetofilesystemcheckItem
import scrapy
class TestSpider(scrapy.Spider):
name = 'imagecheck'
def start_requests(self):
searchterms=['keyword1','keyword2',]
for item in searchterms:
yield scrapy.Request('http://www.example.com/s?=%s' % item,callback=self.parse, meta={'item': item})
def parse(self,response):
start_urls=[]
item = response.meta.get('item')
for i in range(0,2):
link=str(response.css("div.tt a.chek::attr(href)")[i].extract())
start_urls.append(link)
for url in start_urls:
print(url)
yield scrapy.Request(url=url, callback=self.parse_info ,meta={'item': item})
def parse_info(self, response):
url=response.url
title=str(response.xpath('//*[@id="Title"]/text()').extract_first())
img_url_1=response.xpath("//img[@id='images']/@src").extract_first()
scraped_info = {
'url' : url,
'title' : title,
'image_urls': [img_url_1]
}
yield scraped_info
items.py
import scrapy
class ImagetofilesystemcheckItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()
pass
pipelines.py
class ImagetofilesystemcheckPipeline(object):
def process_item(self, item, spider):
return item
settings.py
BOT_NAME = 'imageToFileSystemCheck'
SPIDER_MODULES = ['imageToFileSystemCheck.spiders']
NEWSPIDER_MODULE = 'imageToFileSystemCheck.spiders'
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = '/home/imageToFileSystemCheck/images/'
ROBOTSTXT_OBEY = True
能否请您帮我进行必要的更改,以使scrapy可以将抓取的图像保存为'title'.jpg 格式,以便由蜘蛛抓取标题?
Can you please help me with the required changes so that scrapy could save the scraped images in the 'title'.jpg format where title is scraped by the spider?
推荐答案
创建这样的蜘蛛
class ShopeeSpider(scrapy.Spider):
_TEMP_IMAGES_STORE = "/home/crawler/scrapers/images"
custom_settings = {
'ITEM_PIPELINES': {
'coszi.pipelines.CustomImagePipeline': 400,
}
"IMAGES_STORE": _TEMP_IMAGES_STORE
}
def parse(self, response):
data = {}
data['images'] = {"image_link_here": "image_name_here"}
然后您的pipes.py应该是这样
Then your pipelines.py should be like this
class CustomImagePipeline(ImagesPipeline):
def get_media_requests(self, item, info):
if 'images' in item:
for image_url, img_name in item['images'].iteritems():
if os.path.exists(os.path.join(item['images_path'], img_name)) == False:
request = scrapy.Request(url=image_url)
request.meta['img_name'] = img_name
request.meta['this_prod_img_folder'] = item['img_name_here']
request.dont_filter = True
yield request
def file_path(self, request, response=None, info=None):
return os.path.join(info.spider.CRAWLER_IMAGES_STORE, request.meta['this_prod_img_folder'], request.meta['img_name'])
这篇关于Scrapy Image Pipeline:如何重命名图像?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文