Scrapy图像下载如何使用自定义文件名 [英] Scrapy image download how to use custom filename
问题描述
对于我的 scrapy 项目,我目前正在使用 ImagesPipeline.下载的图像使用其 URL 的 SHA1 哈希值存储,如下所示文件名.
For my scrapy project I'm currently using the ImagesPipeline. The downloaded images are stored with a SHA1 hash of their URLs as the file names.
如何使用我自己的自定义文件名来存储文件?
如果我的自定义文件名需要包含来自同一项目的另一个抓取字段怎么办?例如使用 item['desc']
和带有 item['image_url']
的图像的文件名.如果我理解正确,那将涉及以某种方式访问图像管道中的其他项目字段.
What if my custom file name needs to contain another scraped field from the same item? e.g. use the item['desc']
and the filename for the image with item['image_url']
. If I understand correctly, that would involve somehow accessing the other item fields from the Image Pipeline.
任何帮助将不胜感激.
推荐答案
这是我在 Scrapy 0.10 中解决问题的方式.检查 FSImagesStoreChangeableDirectory 的方法 persist_image.下载图片的文件名是关键
This was the way I solved the problem in Scrapy 0.10 . Check the method persist_image of FSImagesStoreChangeableDirectory. The filename of the downloaded image is key
class FSImagesStoreChangeableDirectory(FSImagesStore):
def persist_image(self, key, image, buf, info,append_path):
absolute_path = self._get_filesystem_path(append_path+'/'+key)
self._mkdir(os.path.dirname(absolute_path), info)
image.save(absolute_path)
class ProjectPipeline(ImagesPipeline):
def __init__(self):
super(ImagesPipeline, self).__init__()
store_uri = settings.IMAGES_STORE
if not store_uri:
raise NotConfigured
self.store = FSImagesStoreChangeableDirectory(store_uri)
这篇关于Scrapy图像下载如何使用自定义文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!