NASM Linux x64 |将二进制编码为 base64 [英] NASM Linux x64 | Encode binary to base64

查看：55 发布时间：2021/11/25 7:25:43 linux assembly 64-bit nasm x86-64

本文介绍了NASM Linux x64 |将二进制编码为 base64的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将二进制文件编码为 base64.在整个过程中，我被困在几个步骤上，我也不确定这是否是思考的方式，请参阅下面代码中的注释:

SECTION .bss ;包含未初始化数据的部分BUFFLEN 方程 6 ;我们一次读取文件 6 个字节增益: resb BUFFLEN ;文本缓冲区本身节 .data ;包含初始化数据的部分B64Str:分贝000000"B64LEN equ $-B64StrBase64: db "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"节 .text ;包含代码的部分全局 _start ;链接器需要这个来找到入口点！_开始:不；这个空操作让 gdb 保持快乐......;从标准输入读取一个充满文本的缓冲区:读:移动 eax,3 ;指定 sys_read 调用mov ebx,0 ;指定文件描述符 0:标准输入mov ecx,buff ;传递要读取的缓冲区的偏移量mov edx,BUFFLEN ;传递一次读取的字节数整数 80 小时；调用 sys_read 来填充缓冲区移动 ebp,eax ;保存从文件中读取的字节数以供以后使用cmp eax,0 ;如果 eax=0，sys_read 在 stdin 上达到 EOFje 完成了；如果相等则跳转(从比较到 0);为进程缓冲区步骤设置寄存器:mov esi, buff ;将文件缓冲区的地址放入 esimov edi,B64Str ;将行字符串的地址放入 edi异或 ecx,ecx ;清除指向 0 的行字符串指针;;;;;;从输入中获取 6 位;;;;;;;;;;;;转换为 B64 字符;;;;;;;;;;;;打印字符;;;;;;;;;;;;处理到接下来的 6 位;;;;;;;全部完成！让我们结束这场派对:完毕:移动 eax,1 ;退出系统调用的代码mov ebx,0 ;返回零代码80H ;进行内核调用

所以，在文本中，它应该这样做:

1) 十六进制值:

7C AA 78

2) 二进制值:

0111 1100 1010 1010 0111 1000

3) 6 位分组:

011111 001010 101001 111000

4) 转换为数字:

31 10 41 56

5) 每个数字都是一个字母、数字或符号:

31 = f10 = K41 = p56 = 4

所以，最终输出是:fKp4

所以，我的问题是:如何获得 6 位以及如何在 char 中转换这些位?

解决方案

几年后

最近有人遇到了这个例子，在讨论它是如何工作的以及如何将它转换为 64b linux 的 x64 时，我把它变成了完全可用的例子，源代码在这里:https://gist.github.com/ped7g/c96a7eec86f9b090d0f33ba36af056c1

您有两种主要的方法来实现它，要么通过能够选择任何 6 位的通用循环，要么通过使用固定代码处理 24 位(3 个字节)的输入(将正好产生 4 个 base64 字符并以字节结尾)-boundary，因此您可以从 +3 偏移量读取下一个 24 位).

假设您有 esi 指向源二进制数据，这些数据填充了足够的零，以便在输入缓冲区安全之外进行大量内存访问(最坏情况下为 +3 个字节).

和 edi 指向一些输出缓冲区(至少有 ((input_length+2)/3*4) 个字节，可能有一些填充，因为 B64 需要结束序列).

<代码>;将 3 个字节的输入转换为 4 个 B64 字符的输出mov eax,[esi] ;读取 3 个字节的输入;(实际读取4B，1会被忽略)添加 esi,3 ;前进指向下一个输入块的指针bswap eax ;第一个输入字节作为 eax 的 MSBsh eax,8 ;扔掉 1 个垃圾字节(bswap 后的 LSB);向后生成 4 个 base64 字符(首先转换最后一组 6b);(为了使6b组抽取的逻辑简单:"shr eax,6+和0x3F)移动 edx,eax ;获取最后 6 位的副本sh eax,6 ;扔掉已经处理的 6 位和 edx,0x3F ；只保留最后 6 位mov bh,[Base64+edx];将 0-63 值转换为 B64 字符(第 4 个)移动 edx,eax ;获取接下来 6 位的副本sh eax,6 ;扔掉已经处理的 6 位和 edx,0x3F ；只保留最后 6 位mov bl,[Base64+edx];将 0-63 值转换为 B64 字符(第 3 个)shl ebx,16 ;在 ebx 中为下一个字符腾出空间(上 32b 中的 4+3)移动 edx,eax ;获取接下来 6 位的副本sh eax,6 ;扔掉已经处理的 6 位和 edx,0x3F ；只保留最后 6 位mov bh,[Base64+edx];将 0-63 值转换为 B64 字符(第二个);这里 eax 仅包含 6 位(零扩展到 32b)mov bl,[Base64+eax] ;将 0-63 值转换为 B64 字符(第一个)mov [edi],ebx ;存储四个 B64 字符作为输出添加 edi,4 ;提前输出指针

在最后一组 3B 输入之后，您必须用适量的 '=' 覆盖最后一个输出，以修复输出的假零.IE.输入 1B(需要 8 位，2x B64 字符)=>输出以 '==' 结尾，2B 输入(需要 16b，3x B64 字符)=>结束 '='，3B 输入 =>使用完整的 24 位 =>有效的 4x B64 字符.

如果您不想将整个文件读入内存并在内存中生成整个输出缓冲区，则可以制作有限长度的输入/输出缓冲区，例如只有 900B 输入 ->1200B 输出，并在 900B 块中处理输入.或者你可以使用 3B ->4B 输入/输出缓冲区，然后完全删除指针前进(甚至 esi/edi 用法，并使用固定内存)，因为您将不得不为每次迭代分别加载/存储输入/输出.

免责声明:此代码编写得简单明了，而不是高性能，因为您询问了如何提取 6 位以及如何将值转换为字符，所以我想最好使用基本的 x86 asm 指令.

我什至不确定如何在不分析瓶颈代码和试验其他变体的情况下使其性能更好.当然，部分寄存器的使用(bh, bl vs ebx)会很昂贵，所以很可能有更好的解决方案(甚至可能是一些针对较大输入块的 SIMD 优化版本).

而且我没有调试该代码，只是写在这里作为回答，所以请谨慎操作并检查调试器如何/是否工作.

I'm trying to encode a binary file into base64. Althrough, I'm stuck at the few steps and I'm also not sure if this is the way to think, see commentaries in code below :

SECTION .bss            ; Section containing uninitialized data

    BUFFLEN equ 6       ; We read the file 6 bytes at a time
    Buff:   resb BUFFLEN    ; Text buffer itself

SECTION .data           ; Section containing initialised data

    B64Str: db "000000"
    B64LEN equ $-B64Str

    Base64: db "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

SECTION .text           ; Section containing code

global  _start          ; Linker needs this to find the entry point!

_start: 
    nop         ; This no-op keeps gdb happy...

; Read a buffer full of text from stdin:
Read:
    mov eax,3       ; Specify sys_read call
    mov ebx,0       ; Specify File Descriptor 0: Standard Input
    mov ecx,Buff        ; Pass offset of the buffer to read to
    mov edx,BUFFLEN     ; Pass number of bytes to read at one pass
    int 80h         ; Call sys_read to fill the buffer
    mov ebp,eax     ; Save # of bytes read from file for later
    cmp eax,0       ; If eax=0, sys_read reached EOF on stdin
    je Done         ; Jump If Equal (to 0, from compare)

; Set up the registers for the process buffer step:
    mov esi,Buff        ; Place address of file buffer into esi
    mov edi,B64Str      ; Place address of line string into edi
    xor ecx,ecx     ; Clear line string pointer to 0


;;;;;;
  GET 6 bits from input
;;;;;;


;;;;;;
  Convert to B64 char
;;;;;;

;;;;;;
  Print the char
;;;;;;

;;;;;;
  process to the next 6 bits
;;;;;;


; All done! Let's end this party:
Done:
    mov eax,1       ; Code for Exit Syscall
    mov ebx,0       ; Return a code of zero 
    int 80H         ; Make kernel call

So, in text, it should do that :

1) Hex value :

7C AA 78

2) Binary value :

0111 1100 1010 1010 0111 1000

3) Groups in 6 bits :

011111 001010 101001 111000

4) Convert to numbers :

31 10 41 56

5) Each number is a letter, number or symbol :

31 = f
10 = K
41 = p
56 = 4

So, final output is : fKp4

So, my questions are : How to get the 6 bits and how to convert those bits in char ?

解决方案

EDIT after few years:

Lately somebody did run into this example, and while discussing how it works and how to convert it to x64 for 64b linux, I turned it into fully working example, source available here: https://gist.github.com/ped7g/c96a7eec86f9b090d0f33ba36af056c1

You have two major ways how to implement it, either by generic loop capable to pick any 6 bits, or by having fixed code dealing with 24 bits (3 bytes) of input (will produce exactly 4 base64 characters and end at byte-boundary, so you can read next 24bits from +3 offset).

Let's say you have esi pointing into source binary data, which are padded enough with zeroes to make abundant memory access beyond input buffer safe (+3 bytes at worst case).

And edi pointing to some output buffer (having at least ((input_length+2)/3*4) bytes, maybe with some padding as B64 requires for ending sequence).

; convert 3 bytes of input into four B64 characters of output
mov   eax,[esi]  ; read 3 bytes of input
      ; (reads actually 4B, 1 will be ignored)
add   esi,3      ; advance pointer to next input chunk
bswap eax        ; first input byte as MSB of eax
shr   eax,8      ; throw away the 1 junk byte (LSB after bswap)
; produce 4 base64 characters backward (last group of 6b is converted first)
; (to make the logic of 6b group extraction simple: "shr eax,6 + and 0x3F)
mov   edx,eax    ; get copy of last 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bh,[Base64+edx]  ; convert 0-63 value into B64 character (4th)
mov   edx,eax    ; get copy of next 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bl,[Base64+edx]  ; convert 0-63 value into B64 character (3rd)
shl   ebx,16     ; make room in ebx for next character (4+3 in upper 32b)
mov   edx,eax    ; get copy of next 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bh,[Base64+edx]  ; convert 0-63 value into B64 character (2nd)
; here eax contains exactly only 6 bits (zero extended to 32b)
mov   bl,[Base64+eax]  ; convert 0-63 value into B64 character (1st)
mov   [edi],ebx  ; store four B64 characters as output
add   edi,4      ; advance output pointer

After the last group of 3B input you must overwrite last output with proper amount of '=' to fix the fake zeroes outputted. I.e. input 1B (needs 8 bits, 2x B64 chars) => output ends with '==', 2B input (needs 16b, 3x B64 char) => ends '=', 3B input => full 24bits used => valid 4x B64 char.

If you don't want to read whole file into memory and produce whole output buffer in memory, you can make the in/out buffer of limited length, like only 900B input -> 1200B output, and process input in 900B blocks. Or you can use 3B -> 4B in/out buffer, then remove the pointer advancing completely (or even esi/edi usage, and use fixed memory), as you will have to load/store in/out for every iteration separately then.

Disclaimer: this code is written to be straightforward, not performant, as you asked how to extract 6 bits and how to convert value into character, so I guess staying with the basic x86 asm instructions is best.

I'm not even sure how to make it perform better without profiling the code for bottlenecks and experimenting with other variants. Surely the partial register usage (bh, bl vs ebx) will be costly, so there's very likely better solution (or maybe even some SIMD optimized version for larger input block).

And I didn't debug that code, just written in here in answer, so proceed with caution and check in debugger how/if it works.

这篇关于NASM Linux x64 |将二进制编码为 base64的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

NASM Linux x64 |将二进制编码为 base64 [英] NASM Linux x64 | Encode binary to base64

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

NASM Linux x64 |将二进制编码为 base64 [英] NASM Linux x64 | Encode binary to base64

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭