带有假霍夫曼表的jpeg是否可以恢复? [英] is a jpeg with a bogus huffman table recoverable?
问题描述
我有一个在任何程序中都无法打开的JPEG:
I have a JPEG that is un-openable in any program:
在Ubuntu Image Viewer中打开将产生:
Opening in Ubuntu Image Viewer yields:
将照片通过convert
会产生类似的结果:
Passing the photo through convert
yields similar results:
$ convert corrupt.jpg out.jpg
convert.im6: Bogus Huffman table definition `corrupt.jpg' @ error/jpeg.c/JPEGErrorHandler/316.
convert.im6: no images defined `out.jpg' @ error/convert.c/ConvertImageCommand/3044.
通过exiftool
运行照片会产生:
ExifTool Version Number : 9.46
File Name : corrupt.jpg
Directory : .
File Size : 47 kB
File Modification Date/Time : 2015:04:11 01:31:14-07:00
File Access Date/Time : 2018:05:04 10:26:04-07:00
File Inode Change Date/Time : 2018:05:04 10:26:03-07:00
File Permissions : r--------
File Type : JPEG
MIME Type : image/jpeg
Comment : Y�.�.�..2..Q.Q.
Image Width : 640
Image Height : 480
Encoding Process : Baseline DCT, Huffman coding
Bits Per Sample : 8
Color Components : 3
Y Cb Cr Sub Sampling : YCbCr4:2:2 (2 1)
Image Size : 640x480
包含相似图像内容的未损坏照片的平均值为45-48k
,因此我认为照片数据本身位于JPEG中.
Un-corrupted photos containing similar image contents average 45-48k
, so I reckon the photo data itself is inside this JPEG somewhere.
我将照片托管在S3上.您可以使用wget
下载它:
I hosted the photo on S3. You can download it w/ wget
:
wget https://s3.amazonaws.com/jordanarseno.com/corrupt.jpg
我用hexedit
打开文件,发现以下内容:
I opened the file with hexedit
and found the following:
-
前几百个字节之外的照片内容被随机分布,足以表明它包含图像.也就是说,我没有看到
F
的0
的连续流.
它实际上是从FF D8
文件签名开始的,就像JPEG应该那样.
it does in-fact start with the FF D8
file signature, as JPEGs ought to.
以小尾数16位Unicode编码的文本文件的字节顺序标记 传输格式
Byte-order mark for text file encoded in little-endian 16-bit Unicode Transfer Format
-
在
FF FE
之后不久,我看到了字节的ASCII表示是:&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
.对于JPEG来说似乎很奇怪.这是什么?not long after the
FF FE
, I see bytes whose ascii representation is:&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
. Seems rather strange for a JPEG. What is this?同样,ASCII字符串
&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
大约在100个字节后出现.likewise, the ASCII string
&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
appears about 100 bytes later.FF D9
(JPEG终止符字符串)位于文件中,但是在此终止符之后确实出现了字符:FF D9
(the JPEG terminator string) is in the file, but characters do appear after this terminator:FF D9 5C 72 78 E0 7C 94 CD B2 9C FF 00 C4 BF 53 C0 E7 FE 41 D3 9C FF 00 E3 95 7C F1 B6 92 5F 7A 2B EB 54 AF BF E6 30 FD A0 7F CC 3B 53 E9 FF 00 40 F9 FF 00 F8 8A 4D F7 08 30
FF D9 5C 72 78 E0 7C 94 CD B2 9C FF 00 C4 BF 53 C0 E7 FE 41 D3 9C FF 00 E3 95 7C F1 B6 92 5F 7A 2B EB 54 AF BF E6 30 FD A0 7F CC 3B 53 E9 FF 00 40 F9 FF 00 F8 8A 4D F7 08 30
切换到Windows并使用JPEGsnoop产生:
Switching over to Windows and using JPEGsnoop yields:
JPEGsnoop 1.8.0 by Calvin Hass http://www.impulseadventure.com/photo/ ------------------------------------- Filename: [C:\corrupt.jpg] Filesize: [47760] Bytes Start Offset: 0x00000000 *** Marker: SOI (xFFD8) *** OFFSET: 0x00000000 *** Marker: COM (Comment) (xFFFE) *** OFFSET: 0x00000002 Comment length = 36 Comment=Y.Ò................à.....2..Q.Q... *** Marker: DQT (xFFDB) *** Define a Quantization Table. OFFSET: 0x00000028 Table length = 132 ---- Precision=8 bits Destination ID=0 (Luminance) DQT, Row #0: 3 2 2 3 4 7 9 10 DQT, Row #1: 2 2 2 3 4 10 10 9 DQT, Row #2: 2 2 3 4 7 10 12 10 DQT, Row #3: 2 3 4 5 9 15 14 11 DQT, Row #4: 3 4 6 10 12 19 18 13 DQT, Row #5: 4 6 9 11 14 18 19 16 DQT, Row #6: 8 11 13 15 18 21 21 17 DQT, Row #7: 12 16 16 17 19 17 18 17 Approx quality factor = 91.45 (scaling=17.09 variance=0.95) ---- Precision=8 bits Destination ID=1 (Chrominance) DQT, Row #0: 3 3 4 8 17 17 17 17 DQT, Row #1: 3 4 4 11 17 17 17 17 DQT, Row #2: 4 4 10 17 17 17 17 17 DQT, Row #3: 8 11 17 17 17 17 17 17 DQT, Row #4: 17 17 17 17 17 17 17 17 DQT, Row #5: 17 17 17 17 17 17 17 17 DQT, Row #6: 17 17 17 17 17 17 17 17 DQT, Row #7: 17 17 17 17 17 17 17 17 Approx quality factor = 91.44 (scaling=17.11 variance=0.19) *** Marker: COM (Comment) (xFFFE) *** OFFSET: 0x000000AE Comment length = 5 Comment=... *** Marker: SOF0 (Baseline DCT) (xFFC0) *** OFFSET: 0x000000B5 Frame header length = 17 Precision = 8 Number of Lines = 480 Samples per Line = 640 Image Size = 640 x 480 Raw Image Orientation = Landscape Number of Img components = 3 Component[1]: ID=0x01, Samp Fac=0x21 (Subsamp 1 x 1), Quant Tbl Sel=0x00 (Lum: Y) Component[2]: ID=0x02, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cb) Component[3]: ID=0x03, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cr) *** Marker: DHT (Define Huffman Table) (xFFC4) *** OFFSET: 0x000000C8 Huffman table length = 418 ---- Destination ID = 0 Class = 0 (DC / Lossless Table) Codes of length 01 bits (000 total): Codes of length 02 bits (001 total): 00 Codes of length 03 bits (005 total): 01 02 03 04 05 Codes of length 04 bits (001 total): 06 Codes of length 05 bits (001 total): 07 Codes of length 06 bits (001 total): 08 Codes of length 07 bits (001 total): 09 Codes of length 08 bits (001 total): 0A Codes of length 09 bits (001 total): 0B Codes of length 10 bits (000 total): Codes of length 11 bits (000 total): Codes of length 12 bits (000 total): Codes of length 13 bits (000 total): Codes of length 14 bits (000 total): Codes of length 15 bits (000 total): Codes of length 16 bits (000 total): Total number of codes: 012 ---- Destination ID = 1 Class = 0 (DC / Lossless Table) Codes of length 01 bits (000 total): Codes of length 02 bits (003 total): 13 0E 0F Codes of length 03 bits (001 total): 10 Codes of length 04 bits (001 total): 11 Codes of length 05 bits (001 total): 12 Codes of length 06 bits (001 total): 12 Codes of length 07 bits (012 total): 12 0B 0D 13 15 13 11 15 10 11 12 11 Codes of length 08 bits (016 total): 01 03 03 03 04 04 04 08 04 04 08 11 0B 0A 0B 11 Codes of length 09 bits (013 total): 11 11 11 11 11 11 11 11 11 11 11 11 11 Codes of length 10 bits (011 total): 11 11 11 11 11 11 11 11 11 11 11 Codes of length 11 bits (012 total): 11 11 11 11 11 11 11 11 11 11 11 01 Codes of length 12 bits (015 total): 01 01 01 01 00 00 00 00 00 00 01 02 03 04 05 Codes of length 13 bits (012 total): 06 07 08 09 0A 0B 10 00 02 01 03 03 Codes of length 14 bits (009 total): 02 04 03 05 05 04 04 00 00 Codes of length 15 bits (010 total): 01 7D 01 02 03 00 04 11 05 12 Codes of length 16 bits (014 total): 21 31 41 06 13 51 61 07 22 71 14 32 81 91 Total number of codes: 131 ---- Destination ID = 1 Class = 10 (AC Table) ERROR: Invalid DHT Class (10). Aborting DHT Load. ERROR: Expected marker 0xFF, got 0x73 @ offset 0x0000026C. Consider using [Tools->Img Search Fwd/Rev]. *** Searching Compression Signatures *** Signature: 01FF5BA518B453CC8F224A4C85505196 Signature (Rotated): 01D13AFD01FF0B6EC46EA4081D25BB4D File Offset: 0 bytes Chroma subsampling: 2x1 EXIF Make/Model: NONE EXIF Makernotes: NONE EXIF Software: NONE Searching Compression Signatures: (3347 built-in, 0 user(*) ) EXIF.Make / Software EXIF.Model Quality Subsamp Match? ------------------------- ----------------------------------- ---------------- -------------- CAM:[NIKON ] [NIKON D40 ] [FINE ] Yes Based on the analysis of compression characteristics and EXIF metadata: ASSESSMENT: Class 1 - Image is processed/edited This may be a new software editor for the database. If this file is processed, and editor doesn't appear in list above, PLEASE ADD TO DATABASE with [Tools->Add Camera to DB] *** Additional Info *** NOTE: Data exists after EOF, range: 0x00000000-0x0000BA90 (47760 bytes)
最后,由JPEGSnoop标识的
EXIF.Model
不正确.这张照片是用VC0706 UART Model: LCF - 23T 0V528
As a last note, the
EXIF.Model
identified by JPEGSnoop is incorrect. This photo would have been taken with aVC0706 UART Model: LCF - 23T 0V528
摘要:此JPEG是否可恢复?
In summary: Is this JPEG recoverable?
推荐答案
使这种情况恢复正常的方法比判断要幸运.我想我可以解释一下,尽管要知道它涉及一个十六进制编辑器...
The approach used to get this back was more luck than judgement. I think I can explain, though be aware it involves a hex editor...
关于JPEG文件语法的维基百科页面解释说它是由由一系列 segments 组成,每个 segments 都由一个两个字节的标记-
0xFF
和另一个字节来指示段的类型.The Wikipedia page for the syntax of a JPEG file explains that it is made up of a series of segments each started by a two byte marker -
0xFF
and another byte to indicate the type of segment.希望是错误的只是文件的Huffman表段-如错误消息所建议.无需了解霍夫曼表是什么,就足以使Wikipedia上的同一部分说明它是霍夫曼表段的
0xFF
0xC4
标记.The hope was that it was just the Huffman table segment of the file that was wrong - as suggested by the error message. Without needing to understand what a Huffman table is, it was enough to see that the same section on Wikipedia explains it is a
0xFF
0xC4
marker for a Huffman table segment.在页面的下方,它提到:
Further down the page, it mentions:
JPEG标准提供了通用的霍夫曼表;编码器 也可以选择生成霍夫曼表...
The JPEG standard provides general-purpose Huffman tables; encoders may also choose to generate Huffman tables...
打开其他一些JPEG文件,发现看起来像是由4个连续的霍夫曼表段组成的标准集合-每个段均以
0xFF
0xC4
标记开头.但是,样本corrupt.jpg
仅有一个霍夫曼表-从位置0x00c8
到下面的0x02bc
.Opening up a few other JPEG files found what looks like a standard set of 4 consecutive Huffman table segments - each starting with that
0xFF
0xC4
marker. The samplecorrupt.jpg
however just had one Huffman table - from position0x00c8
to0x02bc
below.(两者都包含您在其Huffman表中提到的
&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
序列.在损坏的文件中,该序列在该单个Huffman表中出现两次,在更传统的" JPEG中,它出现在第二和第四Huffman表中.)(Both contain that
&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
sequence you mentioned in their Huffman tables. In the corrupt file it appears twice in that single Huffman table, in the 'more conventional' JPEGs it appears in the second and fourth Huffman tables.)从那里开始,固定图像是标准4个霍夫曼表的复制和粘贴,代替了
corrupt.jpg
中的字节范围-现在从固定文件中的0x00c8
到0x0278
.From there, the fixed image is a copy and paste of the standard 4 Huffman tables, in place of that range of bytes in
corrupt.jpg
- now from0x00c8
to0x0278
in the fixed file.因为JPEG格式基于扫描这些
0xff
标记之间的段,所以您只需换出霍夫曼段即可-文件中没有其他指针可担心.正如您所说,文件的其余部分看起来像是合理的JPEG.Because the JPEG format is based around scanning for segments between those
0xff
markers, you can just swap out the Huffman segments - there are no other pointers in the file to worry about. As you said, the rest of the file looked like a plausible JPEG.已采取的步骤摘要:
- 十六进制在
corrupt.jpg
中搜索FF C4
,并记下偏移量 - 十六进制搜索下一个
FF
.如果是另一个FF C4
(第二张霍夫曼表),请继续 - 从第一个
FF C4
(包括)开始删除内容,直到但不包括下一个FF
- 改为将其替换为标准4霍夫曼表".这些是下面最后一个样本中的字节,或者可以在固定文件中从
0x00c8
复制到0x0278
- Hex search the
corrupt.jpg
forFF C4
and note the offset - Hex search for the next
FF
. If it's anotherFF C4
(so a second Huffman table) keep going - Delete the content from the first
FF C4
(included) up to but not including the nextFF
- Instead replace it with the 'standard 4 Huffman tables'. These are the bytes in the last sample below, or can be copied from
0x00c8
to0x0278
in the fixed file
霍夫曼表损坏:
0000-00d0: xx xx xx xx xx xx xx xx-ff c4 01 a2-00 00 01 05 !....... ........ 0000-00e0: 01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02 ........ ........ 0000-00f0: 03 04 05 06-07 08 09 0a-0b 01 00 03-01 01 01 01 ........ ........ 0000-0100: 0c 10 0d 0b-0c 0f 0c 09-0a 0e 13 0e-0f 10 11 12 ........ ........ 0000-0110: 12 12 0b 0d-13 15 13 11-15 10 11 12-11 01 03 03 ........ ........ 0000-0120: 03 04 04 04-08 04 04 08-11 0b 0a 0b-11 11 11 11 ........ ........ 0000-0130: 11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11 ........ ........ 0000-0140: 11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11 ........ ........ 0000-0150: 01 01 01 01-01 00 00 00-00 00 00 01-02 03 04 05 ........ ........ 0000-0160: 06 07 08 09-0a 0b 10 00-02 01 03 03-02 04 03 05 ........ ........ 0000-0170: 05 04 04 00-00 01 7d 01-02 03 00 04-11 05 12 21 ......}. .......! 0000-0180: 31 41 06 13-51 61 07 22-71 14 32 81-91 a1 08 23 1A..Qa." q.2....# 0000-0190: 42 b1 c1 15-52 d1 f0 24-33 62 72 82-09 0a 16 17 B...R..$ 3br..... 0000-01a0: 18 19 1a 25-26 27 28 29-2a 34 35 36-37 38 39 3a ...%&'() *456789: 0000-01b0: 43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a CDEFGHIJ STUVWXYZ 0000-01c0: 63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a cdefghij stuvwxyz 0000-01d0: 83 84 85 86-87 88 89 8a-92 93 94 95-96 97 98 99 ........ ........ 0000-01e0: 9a a2 a3 a4-a5 a6 a7 a8-a9 aa b2 b3-b4 b5 b6 b7 ........ ........ 0000-01f0: b8 b9 ba c2-c3 c4 c5 c6-c7 c8 c9 ca-d2 d3 d4 d5 ........ ........ 0000-0200: d6 d7 d8 d9-da e1 e2 e3-e4 e5 e6 e7-e8 e9 ea f1 ........ ........ 0000-0210: f2 f3 f4 f5-f6 f7 f8 f9-fa 11 00 02-01 02 04 04 ........ ........ 0000-0220: 03 04 07 05-04 04 00 01-02 77 00 01-02 03 11 04 ........ .w...... 0000-0230: 05 21 31 06-12 41 51 07-61 71 13 22-32 81 08 14 .!1..AQ. aq."2... 0000-0240: 42 91 a1 b1-c1 09 23 33-52 f0 15 62-72 d1 0a 16 B.....#3 R..br... 0000-0250: 24 34 e1 25-f1 17 18 19-1a 26 27 28-29 2a 35 36 $4.%.... .&'()*56 0000-0260: 37 38 39 3a-43 44 45 46-47 48 49 4a-53 54 55 56 789:CDEF GHIJSTUV 0000-0270: 57 58 59 5a-63 64 65 66-67 68 69 6a-73 74 75 76 WXYZcdef ghijstuv 0000-0280: 77 78 79 7a-82 83 84 85-86 87 88 89-8a 92 93 94 wxyz.... ........ 0000-0290: 95 96 97 98-99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2 ........ ........ 0000-02a0: b3 b4 b5 b6-b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9 ........ ........ 0000-02b0: ca d2 d3 d4-d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7 ........ ........ 0000-02c0: e8 e9 ea f2-f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx ........ ........
然后接下来的两个字节是
ff dd
,用于下一段的开始:Then the next two bytes are
ff dd
for the start of the next segment:0000-02c0: xx xx xx xx-xx xx xx xx-xx xx xx xx-ff dd 00 04 ........ ........
取而代之的是标准的4个通用霍夫曼表-查找
ff c4
标记:This was replaced with the standard 4 general-purpose Huffman tables instead - look for the
ff c4
markers:0000-00d0: xx xx xx xx xx xx xx xx-ff c4 00 1f-00 00 01 05 !....... ........ 0000-00e0: 01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02 ........ ........ 0000-00f0: 03 04 05 06-07 08 09 0a-0b ff c4 00-b5 10 00 02 ........ ........ 0000-0100: 01 03 03 02-04 03 05 05-04 04 00 00-01 7d 01 02 ........ .....}.. 0000-0110: 03 00 04 11-05 12 21 31-41 06 13 51-61 07 22 71 ......!1 A..Qa."q 0000-0120: 14 32 81 91-a1 08 23 42-b1 c1 15 52-d1 f0 24 33 .2....#B ...R..$3 0000-0130: 62 72 82 09-0a 16 17 18-19 1a 25 26-27 28 29 2a br...... ..%&'()* 0000-0140: 34 35 36 37-38 39 3a 43-44 45 46 47-48 49 4a 53 456789:C DEFGHIJS 0000-0150: 54 55 56 57-58 59 5a 63-64 65 66 67-68 69 6a 73 TUVWXYZc defghijs 0000-0160: 74 75 76 77-78 79 7a 83-84 85 86 87-88 89 8a 92 tuvwxyz. ........ 0000-0170: 93 94 95 96-97 98 99 9a-a2 a3 a4 a5-a6 a7 a8 a9 ........ ........ 0000-0180: aa b2 b3 b4-b5 b6 b7 b8-b9 ba c2 c3-c4 c5 c6 c7 ........ ........ 0000-0190: c8 c9 ca d2-d3 d4 d5 d6-d7 d8 d9 da-e1 e2 e3 e4 ........ ........ 0000-01a0: e5 e6 e7 e8-e9 ea f1 f2-f3 f4 f5 f6-f7 f8 f9 fa ........ ........ 0000-01b0: ff c4 00 1f-01 00 03 01-01 01 01 01-01 01 01 01 ........ ........ 0000-01c0: 00 00 00 00-00 00 01 02-03 04 05 06-07 08 09 0a ........ ........ 0000-01d0: 0b ff c4 00-b5 11 00 02-01 02 04 04-03 04 07 05 ........ ........ 0000-01e0: 04 04 00 01-02 77 00 01-02 03 11 04-05 21 31 06 .....w.. .....!1. 0000-01f0: 12 41 51 07-61 71 13 22-32 81 08 14-42 91 a1 b1 .AQ.aq." 2...B... 0000-0200: c1 09 23 33-52 f0 15 62-72 d1 0a 16-24 34 e1 25 ..#3R..b r...$4.% 0000-0210: f1 17 18 19-1a 26 27 28-29 2a 35 36-37 38 39 3a .....&'( )*56789: 0000-0220: 43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a CDEFGHIJ STUVWXYZ 0000-0230: 63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a cdefghij stuvwxyz 0000-0240: 82 83 84 85-86 87 88 89-8a 92 93 94-95 96 97 98 ........ ........ 0000-0250: 99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2-b3 b4 b5 b6 ........ ........ 0000-0260: b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9-ca d2 d3 d4 ........ ........ 0000-0270: d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7-e8 e9 ea f2 ........ ........ 0000-0280: f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx xx xx xx xx ........ .....(..
这篇关于带有假霍夫曼表的jpeg是否可以恢复?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- 十六进制在