带有假霍夫曼表的jpeg是否可以恢复? [英] is a jpeg with a bogus huffman table recoverable?

查看:110
本文介绍了带有假霍夫曼表的jpeg是否可以恢复?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在任何程序中都无法打开的JPEG:

I have a JPEG that is un-openable in any program:

在Ubuntu Image Viewer中打开将产生:

Opening in Ubuntu Image Viewer yields:

将照片通过convert会产生类似的结果:

Passing the photo through convert yields similar results:

$ convert corrupt.jpg out.jpg
convert.im6: Bogus Huffman table definition `corrupt.jpg' @ error/jpeg.c/JPEGErrorHandler/316.
convert.im6: no images defined `out.jpg' @ error/convert.c/ConvertImageCommand/3044.

通过exiftool运行照片会产生:

ExifTool Version Number         : 9.46
File Name                       : corrupt.jpg
Directory                       : .
File Size                       : 47 kB
File Modification Date/Time     : 2015:04:11 01:31:14-07:00
File Access Date/Time           : 2018:05:04 10:26:04-07:00
File Inode Change Date/Time     : 2018:05:04 10:26:03-07:00
File Permissions                : r--------
File Type                       : JPEG
MIME Type                       : image/jpeg
Comment                         : Y�.�.�..2..Q.Q.
Image Width                     : 640
Image Height                    : 480
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:2 (2 1)
Image Size                      : 640x480

包含相似图像内容的未损坏照片的平均值为45-48k,因此我认为照片数据本身位于JPEG中.

Un-corrupted photos containing similar image contents average 45-48k, so I reckon the photo data itself is inside this JPEG somewhere.

我将照片托管在S3上.您可以使用wget下载它:

I hosted the photo on S3. You can download it w/ wget:

wget https://s3.amazonaws.com/jordanarseno.com/corrupt.jpg

我用hexedit打开文件,发现以下内容:

I opened the file with hexedit and found the following:

  • 前几百个字节之外的照片内容被随机分布,足以表明它包含图像.也就是说,我没有看到F0的连续流.

它实际上是从FF D8文件签名开始的,就像JPEG应该那样.

it does in-fact start with the FF D8 file signature, as JPEGs ought to.

以小尾数16位Unicode编码的文本文件的字节顺序标记 传输格式

Byte-order mark for text file encoded in little-endian 16-bit Unicode Transfer Format

  • FF FE之后不久,我看到了字节的ASCII表示是:&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz.对于JPEG来说似乎很奇怪.这是什么?

    • not long after the FF FE, I see bytes whose ascii representation is: &'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz. Seems rather strange for a JPEG. What is this?

      同样,ASCII字符串&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz大约在100个字节后出现.

      likewise, the ASCII string &'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz appears about 100 bytes later.

      FF D9(JPEG终止符字符串)位于文件中,但是在此终止符之后确实出现了字符:

      FF D9 (the JPEG terminator string) is in the file, but characters do appear after this terminator:

      FF D9 5C 72 78 E0 7C 94 CD B2 9C FF 00 C4 BF 53 C0 E7 FE 41 D3 9C FF 00 E3 95 7C F1 B6 92 5F 7A 2B EB 54 AF BF E6 30 FD A0 7F CC 3B 53 E9 FF 00 40 F9 FF 00 F8 8A 4D F7 08 30

      FF D9 5C 72 78 E0 7C 94 CD B2 9C FF 00 C4 BF 53 C0 E7 FE 41 D3 9C FF 00 E3 95 7C F1 B6 92 5F 7A 2B EB 54 AF BF E6 30 FD A0 7F CC 3B 53 E9 FF 00 40 F9 FF 00 F8 8A 4D F7 08 30

      切换到Windows并使用JPEGsnoop产生:

      Switching over to Windows and using JPEGsnoop yields:

      JPEGsnoop 1.8.0 by Calvin Hass
        http://www.impulseadventure.com/photo/
        -------------------------------------
      
        Filename: [C:\corrupt.jpg]
        Filesize: [47760] Bytes
      
      Start Offset: 0x00000000
      *** Marker: SOI (xFFD8) ***
        OFFSET: 0x00000000
      
      *** Marker: COM (Comment) (xFFFE) ***
        OFFSET: 0x00000002
        Comment length = 36
          Comment=Y.Ò................à.....2..Q.Q...
      
      *** Marker: DQT (xFFDB) ***
        Define a Quantization Table.
        OFFSET: 0x00000028
        Table length = 132
        ----
        Precision=8 bits
        Destination ID=0 (Luminance)
          DQT, Row #0:   3   2   2   3   4   7   9  10 
          DQT, Row #1:   2   2   2   3   4  10  10   9 
          DQT, Row #2:   2   2   3   4   7  10  12  10 
          DQT, Row #3:   2   3   4   5   9  15  14  11 
          DQT, Row #4:   3   4   6  10  12  19  18  13 
          DQT, Row #5:   4   6   9  11  14  18  19  16 
          DQT, Row #6:   8  11  13  15  18  21  21  17 
          DQT, Row #7:  12  16  16  17  19  17  18  17 
          Approx quality factor = 91.45 (scaling=17.09 variance=0.95)
        ----
        Precision=8 bits
        Destination ID=1 (Chrominance)
          DQT, Row #0:   3   3   4   8  17  17  17  17 
          DQT, Row #1:   3   4   4  11  17  17  17  17 
          DQT, Row #2:   4   4  10  17  17  17  17  17 
          DQT, Row #3:   8  11  17  17  17  17  17  17 
          DQT, Row #4:  17  17  17  17  17  17  17  17 
          DQT, Row #5:  17  17  17  17  17  17  17  17 
          DQT, Row #6:  17  17  17  17  17  17  17  17 
          DQT, Row #7:  17  17  17  17  17  17  17  17 
          Approx quality factor = 91.44 (scaling=17.11 variance=0.19)
      
      *** Marker: COM (Comment) (xFFFE) ***
        OFFSET: 0x000000AE
        Comment length = 5
          Comment=...
      
      *** Marker: SOF0 (Baseline DCT) (xFFC0) ***
        OFFSET: 0x000000B5
        Frame header length = 17
        Precision = 8
        Number of Lines = 480
        Samples per Line = 640
        Image Size = 640 x 480
        Raw Image Orientation = Landscape
        Number of Img components = 3
          Component[1]: ID=0x01, Samp Fac=0x21 (Subsamp 1 x 1), Quant Tbl Sel=0x00 (Lum: Y)
          Component[2]: ID=0x02, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cb)
          Component[3]: ID=0x03, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cr)
      
      *** Marker: DHT (Define Huffman Table) (xFFC4) ***
        OFFSET: 0x000000C8
        Huffman table length = 418
        ----
        Destination ID = 0
        Class = 0 (DC / Lossless Table)
          Codes of length 01 bits (000 total): 
          Codes of length 02 bits (001 total): 00 
          Codes of length 03 bits (005 total): 01 02 03 04 05 
          Codes of length 04 bits (001 total): 06 
          Codes of length 05 bits (001 total): 07 
          Codes of length 06 bits (001 total): 08 
          Codes of length 07 bits (001 total): 09 
          Codes of length 08 bits (001 total): 0A 
          Codes of length 09 bits (001 total): 0B 
          Codes of length 10 bits (000 total): 
          Codes of length 11 bits (000 total): 
          Codes of length 12 bits (000 total): 
          Codes of length 13 bits (000 total): 
          Codes of length 14 bits (000 total): 
          Codes of length 15 bits (000 total): 
          Codes of length 16 bits (000 total): 
          Total number of codes: 012
      
        ----
        Destination ID = 1
        Class = 0 (DC / Lossless Table)
          Codes of length 01 bits (000 total): 
          Codes of length 02 bits (003 total): 13 0E 0F 
          Codes of length 03 bits (001 total): 10 
          Codes of length 04 bits (001 total): 11 
          Codes of length 05 bits (001 total): 12 
          Codes of length 06 bits (001 total): 12 
          Codes of length 07 bits (012 total): 12 0B 0D 13 15 13 11 15 10 11 12 11 
          Codes of length 08 bits (016 total): 01 03 03 03 04 04 04 08 04 04 08 11 0B 0A 0B 11 
      
          Codes of length 09 bits (013 total): 11 11 11 11 11 11 11 11 11 11 11 11 11 
          Codes of length 10 bits (011 total): 11 11 11 11 11 11 11 11 11 11 11 
          Codes of length 11 bits (012 total): 11 11 11 11 11 11 11 11 11 11 11 01 
          Codes of length 12 bits (015 total): 01 01 01 01 00 00 00 00 00 00 01 02 03 04 05 
          Codes of length 13 bits (012 total): 06 07 08 09 0A 0B 10 00 02 01 03 03 
          Codes of length 14 bits (009 total): 02 04 03 05 05 04 04 00 00 
          Codes of length 15 bits (010 total): 01 7D 01 02 03 00 04 11 05 12 
          Codes of length 16 bits (014 total): 21 31 41 06 13 51 61 07 22 71 14 32 81 91 
          Total number of codes: 131
      
        ----
        Destination ID = 1
        Class = 10 (AC Table)
      ERROR: Invalid DHT Class (10). Aborting DHT Load.
      
      ERROR: Expected marker 0xFF, got 0x73 @ offset 0x0000026C. Consider using [Tools->Img Search Fwd/Rev].
      
      *** Searching Compression Signatures ***
      
        Signature:           01FF5BA518B453CC8F224A4C85505196
        Signature (Rotated): 01D13AFD01FF0B6EC46EA4081D25BB4D
        File Offset:         0 bytes
        Chroma subsampling:  2x1
        EXIF Make/Model:     NONE
        EXIF Makernotes:     NONE
        EXIF Software:       NONE
      
        Searching Compression Signatures: (3347 built-in, 0 user(*) )
      
                EXIF.Make / Software        EXIF.Model                            Quality           Subsamp Match?
                -------------------------   -----------------------------------   ----------------  --------------
           CAM:[NIKON                    ] [NIKON D40                          ] [FINE            ] Yes              
      
        Based on the analysis of compression characteristics and EXIF metadata:
      
        ASSESSMENT: Class 1 - Image is processed/edited
      
        This may be a new software editor for the database.
        If this file is processed, and editor doesn't appear in list above,
        PLEASE ADD TO DATABASE with [Tools->Add Camera to DB]
      
      
      *** Additional Info ***
      NOTE: Data exists after EOF, range: 0x00000000-0x0000BA90 (47760 bytes)
      

      最后,由JPEGSnoop标识的EXIF.Model不正确.这张照片是用VC0706 UART Model: LCF - 23T 0V528

      As a last note, the EXIF.Model identified by JPEGSnoop is incorrect. This photo would have been taken with a VC0706 UART Model: LCF - 23T 0V528

      摘要:此JPEG是否可恢复?

      In summary: Is this JPEG recoverable?

      推荐答案

      使这种情况恢复正常的方法比判断要幸运.我想我可以解释一下,尽管要知道它涉及一个十六进制编辑器...

      The approach used to get this back was more luck than judgement. I think I can explain, though be aware it involves a hex editor...

      关于JPEG文件语法的维基百科页面解释说它是由由一系列 segments 组成,每个 segments 都由一个两个字节的标记-0xFF和另一个字节来指示段的类型.

      The Wikipedia page for the syntax of a JPEG file explains that it is made up of a series of segments each started by a two byte marker - 0xFF and another byte to indicate the type of segment.

      希望是错误的只是文件的Huffman表段-如错误消息所建议.无需了解霍夫曼表是什么,就足以使Wikipedia上的同一部分说明它是霍夫曼表段的0xFF 0xC4标记.

      The hope was that it was just the Huffman table segment of the file that was wrong - as suggested by the error message. Without needing to understand what a Huffman table is, it was enough to see that the same section on Wikipedia explains it is a 0xFF 0xC4 marker for a Huffman table segment.

      在页面的下方,它提到:

      Further down the page, it mentions:

      JPEG标准提供了通用的霍夫曼表;编码器 也可以选择生成霍夫曼表...

      The JPEG standard provides general-purpose Huffman tables; encoders may also choose to generate Huffman tables...

      打开其他一些JPEG文件,发现看起来像是由4个连续的霍夫曼表段组成的标准集合-每个段均以0xFF 0xC4标记开头.但是,样本corrupt.jpg仅有一个霍夫曼表-从位置0x00c8到下面的0x02bc.

      Opening up a few other JPEG files found what looks like a standard set of 4 consecutive Huffman table segments - each starting with that 0xFF 0xC4 marker. The sample corrupt.jpg however just had one Huffman table - from position 0x00c8 to 0x02bc below.

      (两者都包含您在其Huffman表中提到的&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz序列.在损坏的文件中,该序列在该单个Huffman表中出现两次,在更传统的" JPEG中,它出现在第二和第四Huffman表中.)

      (Both contain that &'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz sequence you mentioned in their Huffman tables. In the corrupt file it appears twice in that single Huffman table, in the 'more conventional' JPEGs it appears in the second and fourth Huffman tables.)

      从那里开始,固定图像是标准4个霍夫曼表的复制和粘贴,代替了corrupt.jpg中的字节范围-现在从固定文件中的0x00c80x0278.

      From there, the fixed image is a copy and paste of the standard 4 Huffman tables, in place of that range of bytes in corrupt.jpg - now from 0x00c8 to 0x0278 in the fixed file.

      因为JPEG格式基于扫描这些0xff标记之间的段,所以您只需换出霍夫曼段即可-文件中没有其他指针可担心.正如您所说,文件的其余部分看起来像是合理的JPEG.

      Because the JPEG format is based around scanning for segments between those 0xff markers, you can just swap out the Huffman segments - there are no other pointers in the file to worry about. As you said, the rest of the file looked like a plausible JPEG.

      已采取的步骤摘要:

      • 十六进制在corrupt.jpg中搜索FF C4,并记下偏移量
      • 十六进制搜索下一个FF.如果是另一个FF C4(第二张霍夫曼表),请继续
      • 从第一个FF C4(包括)开始删除内容,直到但不包括下一个FF
      • 改为将其替换为标准4霍夫曼表".这些是下面最后一个样本中的字节,或者可以在固定文件中从0x00c8复制到0x0278
      • Hex search the corrupt.jpg for FF C4 and note the offset
      • Hex search for the next FF. If it's another FF C4 (so a second Huffman table) keep going
      • Delete the content from the first FF C4 (included) up to but not including the next FF
      • Instead replace it with the 'standard 4 Huffman tables'. These are the bytes in the last sample below, or can be copied from 0x00c8 to 0x0278 in the fixed file

      霍夫曼表损坏:

      0000-00d0:  xx xx xx xx xx xx xx xx-ff c4 01 a2-00 00 01 05  !....... ........
      0000-00e0:  01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02  ........ ........
      0000-00f0:  03 04 05 06-07 08 09 0a-0b 01 00 03-01 01 01 01  ........ ........
      0000-0100:  0c 10 0d 0b-0c 0f 0c 09-0a 0e 13 0e-0f 10 11 12  ........ ........
      0000-0110:  12 12 0b 0d-13 15 13 11-15 10 11 12-11 01 03 03  ........ ........
      0000-0120:  03 04 04 04-08 04 04 08-11 0b 0a 0b-11 11 11 11  ........ ........
      0000-0130:  11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11  ........ ........
      0000-0140:  11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11  ........ ........
      0000-0150:  01 01 01 01-01 00 00 00-00 00 00 01-02 03 04 05  ........ ........
      0000-0160:  06 07 08 09-0a 0b 10 00-02 01 03 03-02 04 03 05  ........ ........
      0000-0170:  05 04 04 00-00 01 7d 01-02 03 00 04-11 05 12 21  ......}. .......!
      0000-0180:  31 41 06 13-51 61 07 22-71 14 32 81-91 a1 08 23  1A..Qa." q.2....#
      0000-0190:  42 b1 c1 15-52 d1 f0 24-33 62 72 82-09 0a 16 17  B...R..$ 3br.....
      0000-01a0:  18 19 1a 25-26 27 28 29-2a 34 35 36-37 38 39 3a  ...%&'() *456789:
      0000-01b0:  43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a  CDEFGHIJ STUVWXYZ
      0000-01c0:  63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a  cdefghij stuvwxyz
      0000-01d0:  83 84 85 86-87 88 89 8a-92 93 94 95-96 97 98 99  ........ ........
      0000-01e0:  9a a2 a3 a4-a5 a6 a7 a8-a9 aa b2 b3-b4 b5 b6 b7  ........ ........
      0000-01f0:  b8 b9 ba c2-c3 c4 c5 c6-c7 c8 c9 ca-d2 d3 d4 d5  ........ ........
      0000-0200:  d6 d7 d8 d9-da e1 e2 e3-e4 e5 e6 e7-e8 e9 ea f1  ........ ........
      0000-0210:  f2 f3 f4 f5-f6 f7 f8 f9-fa 11 00 02-01 02 04 04  ........ ........
      0000-0220:  03 04 07 05-04 04 00 01-02 77 00 01-02 03 11 04  ........ .w......
      0000-0230:  05 21 31 06-12 41 51 07-61 71 13 22-32 81 08 14  .!1..AQ. aq."2...
      0000-0240:  42 91 a1 b1-c1 09 23 33-52 f0 15 62-72 d1 0a 16  B.....#3 R..br...
      0000-0250:  24 34 e1 25-f1 17 18 19-1a 26 27 28-29 2a 35 36  $4.%.... .&'()*56
      0000-0260:  37 38 39 3a-43 44 45 46-47 48 49 4a-53 54 55 56  789:CDEF GHIJSTUV
      0000-0270:  57 58 59 5a-63 64 65 66-67 68 69 6a-73 74 75 76  WXYZcdef ghijstuv
      0000-0280:  77 78 79 7a-82 83 84 85-86 87 88 89-8a 92 93 94  wxyz.... ........
      0000-0290:  95 96 97 98-99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2  ........ ........
      0000-02a0:  b3 b4 b5 b6-b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9  ........ ........
      0000-02b0:  ca d2 d3 d4-d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7  ........ ........
      0000-02c0:  e8 e9 ea f2-f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx  ........ ........
      

      然后接下来的两个字节是ff dd,用于下一段的开始:

      Then the next two bytes are ff dd for the start of the next segment:

      0000-02c0:  xx xx xx xx-xx xx xx xx-xx xx xx xx-ff dd 00 04  ........ ........
      

      取而代之的是标准的4个通用霍夫曼表-查找ff c4标记:

      This was replaced with the standard 4 general-purpose Huffman tables instead - look for the ff c4 markers:

      0000-00d0:  xx xx xx xx xx xx xx xx-ff c4 00 1f-00 00 01 05  !....... ........
      0000-00e0:  01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02  ........ ........
      0000-00f0:  03 04 05 06-07 08 09 0a-0b ff c4 00-b5 10 00 02  ........ ........
      0000-0100:  01 03 03 02-04 03 05 05-04 04 00 00-01 7d 01 02  ........ .....}..
      0000-0110:  03 00 04 11-05 12 21 31-41 06 13 51-61 07 22 71  ......!1 A..Qa."q
      0000-0120:  14 32 81 91-a1 08 23 42-b1 c1 15 52-d1 f0 24 33  .2....#B ...R..$3
      0000-0130:  62 72 82 09-0a 16 17 18-19 1a 25 26-27 28 29 2a  br...... ..%&'()*
      0000-0140:  34 35 36 37-38 39 3a 43-44 45 46 47-48 49 4a 53  456789:C DEFGHIJS
      0000-0150:  54 55 56 57-58 59 5a 63-64 65 66 67-68 69 6a 73  TUVWXYZc defghijs
      0000-0160:  74 75 76 77-78 79 7a 83-84 85 86 87-88 89 8a 92  tuvwxyz. ........
      0000-0170:  93 94 95 96-97 98 99 9a-a2 a3 a4 a5-a6 a7 a8 a9  ........ ........
      0000-0180:  aa b2 b3 b4-b5 b6 b7 b8-b9 ba c2 c3-c4 c5 c6 c7  ........ ........
      0000-0190:  c8 c9 ca d2-d3 d4 d5 d6-d7 d8 d9 da-e1 e2 e3 e4  ........ ........
      0000-01a0:  e5 e6 e7 e8-e9 ea f1 f2-f3 f4 f5 f6-f7 f8 f9 fa  ........ ........
      0000-01b0:  ff c4 00 1f-01 00 03 01-01 01 01 01-01 01 01 01  ........ ........
      0000-01c0:  00 00 00 00-00 00 01 02-03 04 05 06-07 08 09 0a  ........ ........
      0000-01d0:  0b ff c4 00-b5 11 00 02-01 02 04 04-03 04 07 05  ........ ........
      0000-01e0:  04 04 00 01-02 77 00 01-02 03 11 04-05 21 31 06  .....w.. .....!1.
      0000-01f0:  12 41 51 07-61 71 13 22-32 81 08 14-42 91 a1 b1  .AQ.aq." 2...B...
      0000-0200:  c1 09 23 33-52 f0 15 62-72 d1 0a 16-24 34 e1 25  ..#3R..b r...$4.%
      0000-0210:  f1 17 18 19-1a 26 27 28-29 2a 35 36-37 38 39 3a  .....&'( )*56789:
      0000-0220:  43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a  CDEFGHIJ STUVWXYZ
      0000-0230:  63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a  cdefghij stuvwxyz
      0000-0240:  82 83 84 85-86 87 88 89-8a 92 93 94-95 96 97 98  ........ ........
      0000-0250:  99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2-b3 b4 b5 b6  ........ ........
      0000-0260:  b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9-ca d2 d3 d4  ........ ........
      0000-0270:  d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7-e8 e9 ea f2  ........ ........
      0000-0280:  f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx xx xx xx xx  ........ .....(..
      

      这篇关于带有假霍夫曼表的jpeg是否可以恢复?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆