NASM Linux x64 |将二进制编码为base64 [英] NASM Linux x64 | Encode binary to base64

查看:93
本文介绍了NASM Linux x64 |将二进制编码为base64的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将二进制文件编码为base64. 一直以来,我都停留在几个步骤上,而且我也不知道这是否是思考的方式,请参见下面的代码中的注释:

I'm trying to encode a binary file into base64. Althrough, I'm stuck at the few steps and I'm also not sure if this is the way to think, see commentaries in code below :

SECTION .bss            ; Section containing uninitialized data

    BUFFLEN equ 6       ; We read the file 6 bytes at a time
    Buff:   resb BUFFLEN    ; Text buffer itself

SECTION .data           ; Section containing initialised data

    B64Str: db "000000"
    B64LEN equ $-B64Str

    Base64: db "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

SECTION .text           ; Section containing code

global  _start          ; Linker needs this to find the entry point!

_start: 
    nop         ; This no-op keeps gdb happy...

; Read a buffer full of text from stdin:
Read:
    mov eax,3       ; Specify sys_read call
    mov ebx,0       ; Specify File Descriptor 0: Standard Input
    mov ecx,Buff        ; Pass offset of the buffer to read to
    mov edx,BUFFLEN     ; Pass number of bytes to read at one pass
    int 80h         ; Call sys_read to fill the buffer
    mov ebp,eax     ; Save # of bytes read from file for later
    cmp eax,0       ; If eax=0, sys_read reached EOF on stdin
    je Done         ; Jump If Equal (to 0, from compare)

; Set up the registers for the process buffer step:
    mov esi,Buff        ; Place address of file buffer into esi
    mov edi,B64Str      ; Place address of line string into edi
    xor ecx,ecx     ; Clear line string pointer to 0


;;;;;;
  GET 6 bits from input
;;;;;;


;;;;;;
  Convert to B64 char
;;;;;;

;;;;;;
  Print the char
;;;;;;

;;;;;;
  process to the next 6 bits
;;;;;;


; All done! Let's end this party:
Done:
    mov eax,1       ; Code for Exit Syscall
    mov ebx,0       ; Return a code of zero 
    int 80H         ; Make kernel call

因此,在文本中,它应该这样做:

So, in text, it should do that :

1)十六进制值:

7C AA 78

2)二进制值:

0111 1100 1010 1010 0111 1000

3)6位分组:

011111 001010 101001 111000

4)转换为数字:

31 10 41 56

5)每个数字都是字母,数字或符号:

5) Each number is a letter, number or symbol :

31 = f
10 = K
41 = p
56 = 4

因此,最终输出是:fKp4

So, final output is : fKp4

所以,我的问题是: 如何获取6位以及如何将这些位转换为char?

So, my questions are : How to get the 6 bits and how to convert those bits in char ?

推荐答案

您可以通过两种主要方法来实现它,要么通过能够选择任意6位的通用循环,要么通过处理24位(3个字节)的固定代码)的输入(将精确生成4个base64字符,并以字节边界结尾,因此您可以从+3偏移量读取下一个24位).

You have two major ways how to implement it, either by generic loop capable to pick any 6 bits, or by having fixed code dealing with 24 bits (3 bytes) of input (will produce exactly 4 base64 characters and end at byte-boundary, so you can read next 24bits from +3 offset).

假设您有esi指向源二进制数据,该源数据用0填充,以使超出输入缓冲区安全性的大量内存访问(在最坏的情况下为+3字节).

Let's say you have esi pointing into source binary data, which are padded enough with zeroes to make abundant memory access beyond input buffer safe (+3 bytes at worst case).

edi指向某个输出缓冲区(至少有((input_length + 2)/3 * 4)个字节,也许有些填充,因为B64需要结束序列).

And edi pointing to some output buffer (having at least ((input_length+2)/3*4) bytes, maybe with some padding as B64 requires for ending sequence).

; convert 3 bytes of input into four B64 characters of output
mov   eax,[esi]  ; read 3 bytes of input
      ; (reads actually 4B, 1 will be ignored)
add   esi,3      ; advance pointer to next input chunk
bswap eax        ; first input byte as MSB of eax
shr   eax,8      ; throw away the 1 junk byte (LSB after bswap)
; produce 4 base64 characters backward (last group of 6b is converted first)
; (to make the logic of 6b group extraction simple: "shr eax,6 + and 0x3F)
mov   edx,eax    ; get copy of last 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bh,[Base64+edx]  ; convert 0-63 value into B64 character (4th)
mov   edx,eax    ; get copy of next 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bl,[Base64+edx]  ; convert 0-63 value into B64 character (3rd)
shl   ebx,16     ; make room in ebx for next character (4+3 in upper 32b)
mov   edx,eax    ; get copy of next 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bh,[Base64+edx]  ; convert 0-63 value into B64 character (2nd)
; here eax contains exactly only 6 bits (zero extended to 32b)
mov   bl,[Base64+eax]  ; convert 0-63 value into B64 character (1st)
mov   [edi],ebx  ; store four B64 characters as output
add   edi,4      ; advance output pointer

在最后一组3B输入之后,必须用适当的'='覆盖最后一个输出,以修复输出的伪零. IE.输入1B(需要8位,2x B64字符)=>输出以'=='结尾,2B输入(需要16b,3x B64字符)=>结尾'=',3B输入=>使用了全24位=>有效4x B64字符

After the last group of 3B input you must overwrite last output with proper amount of '=' to fix the fake zeroes outputted. I.e. input 1B (needs 8 bits, 2x B64 chars) => output ends with '==', 2B input (needs 16b, 3x B64 char) => ends '=', 3B input => full 24bits used => valid 4x B64 char.

如果您不想将整个文件读入内存并在内存中产生整个输出缓冲区,则可以制作有限长度的输入/输出缓冲区,例如仅900B输入-> 1200B输出,并以900B块处理输入.或者,您可以使用3B-> 4B输入/输出缓冲区,然后完全删除前进的指针(甚至使用esi/edi,并使用固定的内存),因为那时必须分别为每个迭代加载/存储/输出.

If you don't want to read whole file into memory and produce whole output buffer in memory, you can make the in/out buffer of limited length, like only 900B input -> 1200B output, and process input in 900B blocks. Or you can use 3B -> 4B in/out buffer, then remove the pointer advancing completely (or even esi/edi usage, and use fixed memory), as you will have to load/store in/out for every iteration separately then.

免责声明:当您询问如何提取6位以及如何将值转换为字符时,此代码写得很简单,而不是高性能,所以我认为最好还是使用基本的x86 asm指令.

Disclaimer: this code is written to be straightforward, not performant, as you asked how to extract 6 bits and how to convert value into character, so I guess staying with the basic x86 asm instructions is best.

我什至不知道如何在不分析瓶颈代码和尝试其他变体的情况下使其性能更好.当然,部分寄存器的使用(bh, bl vs ebx)会很昂贵,因此很有可能会有更好的解决方案(甚至可能是针对较大输入块的某些SIMD优化版本).

I'm not even sure how to make it perform better without profiling the code for bottlenecks and experimenting with other variants. Surely the partial register usage (bh, bl vs ebx) will be costly, so there's very likely better solution (or maybe even some SIMD optimized version for larger input block).

我并没有调试该代码,只是在此处写了答案,因此请谨慎操作并检查调试器的工作方式/是否工作.

And I didn't debug that code, just written in here in answer, so proceed with caution and check in debugger how/if it works.

这篇关于NASM Linux x64 |将二进制编码为base64的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆