刮板 convert_image [英] scrapy convert_image
问题描述
我用Scrapy爬取了一些图片,图片需要剪掉一部分或者加水印.我覆盖了 pipelines.py
中的 convert_image
函数,但它不起作用.代码如下所示:
I use Scrapy to crawl some images, the images need to cut a part or add water mark. I overwrite the function convert_image
in pipelines.py
but it didn't work. The code looks like this:
class MyImagesPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield Request(image_url)
def convert_image(self, image, size=None):
if image.format == 'PNG' and image.mode == 'RGBA':
background = Image.new('RGBA', image.size, (255, 255, 255))
background.paste(image, image)
image = background.convert('RGB')
elif image.mode != 'RGB':
image = image.convert('RGB')
if size:
image = image.copy()
image.thumbnail(size, Image.ANTIALIAS)
else:
# cut water image TODO use defined image replace Not cut
x,y = image.size
if(y>120):
image = image.crop((0,0,x,y-25))
buf = StringIO()
try:
image.save(buf, 'JPEG')
except Exception, ex:
raise ImageException("Cannot process image. Error: %s" % ex)
return image, buf
有什么想法吗?
更新:
@warwaruk
你如何确定它不起作用?有什么例外吗?<也不例外.我使用此代码重写函数 item_completed.它运行良好,代码如下:
how have you decided it didn't work? any exception or what? < no exception .I use this code for rewrite function item_completed.and it works good, here is the code:
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
if item['refer'] == 'someurl.com' :
for a in image_paths:
o_img = os.path.join(self.store.basedir,a)
if os.path.isfile(o_img):
image = Image.open(o_img)
x,y = image.size
if(y>120):
image = image.crop((0,0,x,y-35))
image.save(o_img,'JPEG');
return item
推荐答案
ImagePipleline 自动将图像转换为 JPEG(RGB 模式),不存在切换器".尽管您可以修改它的实现,但它可能会弄乱它的其他逻辑.因此,使用 MediaPipeline 更好——只需下载文件.您可以编写另一个应用程序来对图像文件进行后期处理.它使您的逻辑清晰并使scrapy更快.
ImagePipleline convert images to JPEG(RGB mode) automatically, and no "toggler" exists. Although you can modify its implmentaion, it may mess its other logic. So, use MediaPipeline is better -- just download the files. You can write another application to do post-processing for your image files. It make your logic clear and make scrapy faster.
这篇关于刮板 convert_image的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!