Python ctypes如何从传递给NASM的字符数组中读取字节 [英] Python ctypes how to read a byte from a character array passed to NASM

查看:82
本文介绍了Python ctypes如何从传递给NASM的字符数组中读取字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新:我在下面的马克·托隆宁(Mark Tolonen)的答案的帮助下解决了这个问题.这是解决方案(但我为一件事感到困惑):

UPDATE: I solved this problem with the help of Mark Tolonen's answer below. Here is the solution (but I'm puzzled by one thing):

我首先从下面的Mark Tolonen的答案(UTF-8)中显示的编码字符串开始:

I begin with the encoding string shown in Mark Tolonen's answer below (UTF-8):

CA_f1 = (ctypes.c_char_p * len(f1))(*(name.encode() for name in f1))

关闭优化功能后,我总是在输入时将rcx存储到一个内存变量中.在程序的后面,当我需要在rcx中使用指针时,我从内存中读取了它.这仅适用于单个指针,但不适用于访问下面显示的指针数组Mark Tolonen;也许是因为它是一个指针数组,而不仅仅是单个指针.如果我在入口处将rcx存储到r15中,它确实起作用,而在程序的下游,它的工作原理是这样的:

With optimizations off, I always store rcx into a memory variable on entry. Later in the program when I need to use the pointer in rcx, I read it from memory. That works for a single pointer, but doesn't work for accessing the pointer array Mark Tolonen showed below; maybe that's because it's a pointer array, not just a single pointer. It DOES work if I store rcx into r15 on entry, and downstream in the program it works like this:

;To access the first char of the first name pair: 

xor rax,rax
mov rdx,qword[r15]
movsx eax,BYTE[rdx]
ret

;To access the second char of the second name pair: 

mov rdx,qword[r15+8]
movsx eax,BYTE[rdx+1]

这不是问题,因为我通常在寄存器中存储尽可能多的变量.有时没有足够的寄存器,因此我不得不求助于将一些寄存器存储在内存中.现在,在处理字符串时,我将始终保留r15来保存rcx中传递的指针(如果它是指针数组).

That's not a problem because I usually store as many variables as possible in registers; sometimes there are not enough registers, so I have to resort to storing some in memory. Now, when processing strings, I will always reserve r15 to hold the pointer passed in rcx if it's a pointer array.

是否了解为什么内存位置不起作用?

Any insight into why the memory location doesn't work?

****答案结束****

**** END OF ANSWER ****

我是NASM中字符串处理的新手,我正在从ctypes传递字符串.使用以下Python函数从文本文件(Windows .txt)中读取字符串数据:

I'm new to string processing in NASM, and I am passing a string from ctypes. The string data is read from a text file (Windows .txt), using the following Python function:

with open(fname, encoding = "utf8") as f1:
        for item in f1:
            item = item.lstrip()
            item = item.rstrip()
            return_data.append(item)
    return return_data

.txt文件包含姓氏和名字的列表,以换行符换行符分隔.

The .txt file contains a list of first and last names, separated by newline-linefeed characters.

我使用ctypes将c_char_p指针传递给NASM dll.使用以下命令创建指针:

I pass a c_char_p pointer to a NASM dll using ctypes. The pointer is created with this:

CA_f1 = (ctypes.c_char_p * len(f1))()

Visual Studio确认它是一个指向50 NAMES长的字节字符串的指针,这可能是问题所在,我需要字节,而不是列表元素.然后,我使用以下ctypes语法传递它:

Visual Studio confirms that it is a pointer to a byte string 50 NAMES long, which is where the problem may be, I need bytes, not list elements. Then I pass it using this ctypes syntax:

CallName.argtypes = [ctypes.POINTER(ctypes.c_char_p),ctypes.POINTER(ctypes.c_double),ctypes.POINTER(ctypes.c_double)]

更新:在传递字符串之前,现在我将列表转换成这样的字符串:

UPDATE: before passing the string, now I convert the list to a string like this:

f1_x = ' '.join(f1)

现在VS显示了一个指向558字节字符串的指针,这是正确的,但是我仍然无法读取一个字节.

Now VS shows a pointer to a 558 byte string, which is correct, but I still can't read a byte.

在我的NASM程序中,我通过使用以下代码将随机字节读入al来对其进行测试:

In my NASM program, I test it by reading a random byte into al using the following code:

lea rdi,[rel f1_ptr]
mov rbp,qword [rdi] ; Pointer
xor rax,rax
mov al,byte[rbp+1]

但是rax中的返回值为0.

But the return value in rax is 0.

如果我这样创建本地字符串缓冲区:

If I create a local string buffer like this:

name_array: db "Margaret Swanson"

我可以这样阅读:

mov rdi,name_array
xor rax,rax
mov al,[rdi]

但不是从传递给dll的指针中获取.

But not from a pointer passed into a dll.

以下是NASM中一个简单,可重现的示例的完整代码.在将其传递给NASM之前,我检查了随机字节,它们是我所期望的,因此我认为它不是编码的.

Here's the full code for a simple, reproducible example in NASM. Before passing it to NASM, I checked random bytes and they are what I expect, so I don't think it's encoding.

[BITS 64]
[default rel]

extern malloc, calloc, realloc, free
global Main_Entry_fn
export Main_Entry_fn
global FreeMem_fn
export FreeMem_fn

section .data align=16
f1_ptr: dq 0
f1_length: dq 0
f2_ptr: dq 0
f2_length: dq 0
data_master_ptr: dq 0

section .text

String_Test_fn:
;______

lea rdi,[rel f1_ptr]
mov rbp,qword [rdi]
xor rax,rax
mov al,byte[rbp+10]
ret

;__________
;Free the memory

FreeMem_fn:
sub rsp,40
call free
add rsp,40
ret

; __________
; Main Entry

Main_Entry_fn:
push rdi
push rbp
mov [f1_ptr],rcx
mov [f2_ptr],rdx

mov [data_master_ptr],r8
lea rdi,[data_master_ptr]
mov rbp,[rdi]
xor rcx,rcx
movsd xmm0,qword[rbp+rcx]
cvttsd2si rax,xmm0
mov [f1_length],rax
add rcx,8
movsd xmm0,qword[rbp+rcx]
cvttsd2si rax,xmm0
mov [f2_length],rax
add rcx,8

call String_Test_fn

pop rbp
pop rdi
ret

更新2:

为回复请求,以下是要使用的ctypes包装器:

In reply to a request, here is a ctypes wrapper to use:

def Read_Data():

    Dir= "[FULL PATH TO DATA]"

    fname1 = Dir + "Random Names.txt"
    fname2 = Dir + "Random Phone Numbers.txt"

    f1 = Trans_02_Data.StrDataRead(fname1)
    f2 = Trans_02_Data.StrDataRead(fname2)
    f2_Int = [  int(numeric_string) for numeric_string in f2]
    StringTest_asm(f1, f2_Int)

def StringTest_asm(f1,f2):

    f1.append("0")

    f1_x = ' '.join(f1)
    f1_x[0].encode(encoding='UTF-8',errors='strict')

    Input_Length_Array = []
    Input_Length_Array.append(len(f1))
    Input_Length_Array.append(len(f2*8))

    length_array_out = (ctypes.c_double * len(Input_Length_Array))(*Input_Length_Array)

    CA_f1 = (ctypes.c_char_p * len(f1_x))() #due to SO research
    CA_f2 = (ctypes.c_double * len(f2))(*f2)
    hDLL = ctypes.WinDLL("C:/NASM_Test_Projects/StringTest/StringTest.dll")
    CallName = hDLL.Main_Entry_fn
    CallName.argtypes = [ctypes.POINTER(ctypes.c_char_p),ctypes.POINTER(ctypes.c_double),ctypes.POINTER(ctypes.c_double)]
    CallName.restype = ctypes.c_int64

    Free_Mem = hDLL.FreeMem_fn
    Free_Mem.argtypes = [ctypes.POINTER(ctypes.c_double)]
    Free_Mem.restype = ctypes.c_int64
    start_time = timeit.default_timer()

    ret_ptr = CallName(CA_f1,CA_f2,length_array_out)

    abc = 1 #Check the value of the ret_ptr, should be non-zero   

推荐答案

您的姓名读取代码将返回Unicode字符串列表.以下代码会将Unicode字符串列表编码为字符串数组,以将其传递给采用POINTER(c_char_p)的函数:

Your name-reading code would return a list of Unicode strings. The following would encode a list of Unicode strings into an array of strings to be passed to a function taking a POINTER(c_char_p):

>>> import ctypes
>>> names = ['Mark','John','Craig']
>>> ca = (ctypes.c_char_p * len(names))(*(name.encode() for name in names))
>>> ca
<__main__.c_char_p_Array_3 object at 0x000001DB7CF5F6C8>
>>> ca[0]
b'Mark'
>>> ca[1]
b'John'
>>> ca[2]
b'Craig'

如果将ca作为第一个参数传递给函数,则每个

If ca is passed to your function as the first parameter, the address of that array would be in rcx per x64 calling convention. The following C code and its disassembly shows how the VS2017 Microsoft compiler reads it:

DLL代码(test.c)

#define API __declspec(dllexport)

int API func(const char** instr)
{
    return (instr[0][0] << 16) + (instr[1][0] << 8) + instr[2][0];
}

反汇编(经过优化以保持简短,并添加了我的评论)

; Listing generated by Microsoft (R) Optimizing Compiler Version 19.00.24215.1

include listing.inc

INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES

PUBLIC  func
; Function compile flags: /Ogtpy
; File c:\test.c
_TEXT   SEGMENT
instr$ = 8
func    PROC

; 5    :     return (instr[0][0] << 16) + (instr[1][0] << 8) + instr[2][0];

  00000 48 8b 51 08      mov     rdx, QWORD PTR [rcx+8]  ; address of 2nd string
  00004 48 8b 01         mov     rax, QWORD PTR [rcx]    ; address of 1st string
  00007 48 8b 49 10      mov     rcx, QWORD PTR [rcx+16] ; address of 3rd string
  0000b 44 0f be 02      movsx   r8d, BYTE PTR [rdx]     ; 1st char of 2nd string, r8d=4a
  0000f 0f be 00         movsx   eax, BYTE PTR [rax]     ; 1st char of 1st string, eax=4d
  00012 0f be 11         movsx   edx, BYTE PTR [rcx]     ; 1st char of 3rd string, edx=43
  00015 c1 e0 08         shl     eax, 8                  ; eax=4d00
  00018 41 03 c0         add     eax, r8d                ; eax=4d4a
  0001b c1 e0 08         shl     eax, 8                  ; eax=4d4a00
  0001e 03 c2            add     eax, edx                ; eax=4d4a43

; 6    : }

  00020 c3               ret     0
func    ENDP
_TEXT   ENDS
END

Python代码(test.py)

from ctypes import *

dll = CDLL('test')
dll.func.argtypes = POINTER(c_char_p),
dll.restype = c_int

names = ['Mark','John','Craig']
ca = (c_char_p * len(names))(*(name.encode() for name in names))
print(hex(dll.func(ca)))

输出:

0x4d4a43

这是'M','J'和'C'的正确ASCII码.

That's the correct ASCII codes for 'M', 'J', and 'C'.

这篇关于Python ctypes如何从传递给NASM的字符数组中读取字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆