imghdr/python-无法检测某些图像的类型(图像扩展名) [英] imghdr / python - Can't detec type of some images (image extension)
问题描述
我正在使用Python脚本从imgur.com下载很多图像,并且由于我具有 http://imgur.com/{id}
格式的所有链接,因此我必须通过使用 http://i.imgur.com/{id} .gif
替换原始网址来强制下载它们,然后保存所有不带扩展名的图像.(我知道有一个Imgur的API,但是我不能使用它,因为它对这种工作有限制)
I'm downloading a lot of images from imgur.com with a Python script and since I have all the links in the format http://imgur.com/{id}
I have to force download them by replacing the original url with http://i.imgur.com/{id}.gif
, then saving all the images without extension. (I know that there is an Imgur's API but I can't use it since it have limitations for this kind of job)
现在减少图像的色彩后,我想使用imghdr模块来确定图像的原始扩展名:
Now after downoading images, I want to use imghdr module to determine the original extension of the image:
>>> import imghdr
>>> imghdr.what('/images/GrEdc')
'gif'
问题在于,此方法的成功率为80%,其余20%都被标识为'None'
,并检查了其中的一些,我注意到它们很可能都是.jpg图片.
The problem is that this works with a success rate of 80%, the remaining 20% are all identified as 'None'
and checking some of them I noticed that they are most likely all .jpg images.
为什么imghdr无法检测格式?即使没有扩展名,我也可以使用Ubuntu的默认图像查看器打开这些图像,所以我不认为它们已损坏.
Why imghdr can't detect the format? I'm able to open theese images with Ubuntu's default image viewer even without extension, so I don't think they are corrupted.
推荐答案
请注意,在2019年,此错误尚未得到修复.Paul R的链接上提供了该解决方案.
Note that in 2019, this bug has not been fixed. The solution is available at the link from Paul R.
解决问题的一种方法是对问题进行修补:
A way to overcome the problem is to monkeypatch the problem:
# Monkeypatch bug in imagehdr
from imghdr import tests
def test_jpeg1(h, f):
"""JPEG data in JFIF format"""
if b'JFIF' in h[:23]:
return 'jpeg'
JPEG_MARK = b'\xff\xd8\xff\xdb\x00C\x00\x08\x06\x06' \
b'\x07\x06\x05\x08\x07\x07\x07\t\t\x08\n\x0c\x14\r\x0c\x0b\x0b\x0c\x19\x12\x13\x0f'
def test_jpeg2(h, f):
"""JPEG with small header"""
if len(h) >= 32 and 67 == h[5] and h[:32] == JPEG_MARK:
return 'jpeg'
def test_jpeg3(h, f):
"""JPEG data in JFIF or Exif format"""
if h[6:10] in (b'JFIF', b'Exif') or h[:2] == b'\xff\xd8':
return 'jpeg'
tests.append(test_jpeg1)
tests.append(test_jpeg2)
tests.append(test_jpeg3)
这篇关于imghdr/python-无法检测某些图像的类型(图像扩展名)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!