拉斯特FFI C字符串处理 [英] Rust FFI C string handling

查看:131
本文介绍了拉斯特FFI C字符串处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在玩弄锈FFI了一下,现在我试图让一个C库返回C字符串,并将其转换为一个字符串锈

我的code:

mylib.c

 为const char *你好(){
    返回的Hello World!;
}

main.rs

 #![功能(link_args)EXTERN箱的libc;
使用的libc :: c_char;#[link_args =-L。-I。-lmylib]
EXTERN {
    FN你好() - GT; * c_char;
}FN的main(){
    //我怎么打招呼的STR重新presentation()在这里?
}


解决方案

更新2:的previous更新也不再有效生锈1.0。它可以在这个答案的修订历史中找到。

在锈C字符串的工作,最好的办法就是使用结构从 的std :: FFI 模块,即 CStr的 的CString

CStr的是一个动态的大小类型,因此它只能通过指针使用。这使得它非常类似于普通的 STR 类型。你可以构造一个&放大器; CStr的 * const的c_char 使用不安全的<一个href=\"http://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr\"><$c$c>CStr::from_ptr静态方法。这种方法是不安全的,因为没有任何保证,你传递给它的原始指针是有效的,它确实指向一个有效的C字符串,它(字符串)的寿命是正确的。

之后,你可以得到一个&放大器; U8] &安培;使用其的 =htt​​p://doc.rust-lang.org/std/ffi/struct.CStr.html#method.to_bytes> to_bytes() 或<一href=\"http://doc.rust-lang.org/std/ffi/struct.CStr.html#method.to_bytes_with_nul\"><$c$c>to_bytes_with_nul()方法。然后,如果你知道C字符串是UTF-8文本连接codeD,你可以得到一个&放大器; STR &安培;使用 海峡[U8] : :from_utf8 功能

下面是一个例子:

的extern箱的libc;使用的libc :: c_char;
使用std :: FFI :: CStr的;
使用std ::海峡;EXTERN {
    FN你好() - GT; * const的c_char;
}FN的main(){
    让c_buf:* const的c_char = {不安全你好()};
    让c_str:放大器; CStr的不安全= {CStr的:: from_ptr(c_buf)};
    让BUF:放大器; U8] = c_str.to_bytes();
    让str_slice:放大器; STR = STR :: from_utf8(BUF).unwrap();
    让str_buf:字符串= str_slice.to_owned(); //如果有必要
}

您所需要的,但是,考虑到你的 * const的生命周期c_char 指针和谁拥有它们。根据C API,你可能需要调用的字符串特殊释放函数。因此,你需要仔细安排的转换,以便切片不会活得比指针。事实上, CStr的:: from_ptr 返回&放大器; CStr的任意一辈子在这里帮助(虽然它是危险的本身);例如,你可以封装你的C字符串转换成结构提供了DEREF转换,所以你可以使用你的结构,就好像它是一个字符串切片:

的extern {
    FN你好() - GT; * const的c_char;
    FN再见(S:* const的c_char);
}结构问候{
    消息:* const的c_char
}IMPL跌落的问候{
    FN下降(安培; MUT个体经营){
        不安全{再见(self.message); }
    }
}IMPL问候{
    FN新() - GT;问候{
        问候{
            消息:不安全{你好()}
        }
    }
}IMPL DEREF的问候{
    键入目标= str中;    FN DEREF&LT;'A&GT;(放大器;自) - GT; &放大器;'一个STR {
        让c_str = {不安全:: CStr的from_ptr(self.message)};
        让字节= c_str.to_bytes();
        STR :: from_utf8(字节).unwrap()
    }
}

也有另一种类型在此模块中,称为 的CString 。它与 CStr的同样的关系为字符串 STR - 的CString CStr的的国有版本。这意味着,它认为的句柄字节数据的分配和删除的CString 将释放它所提供的内存(本质上, CString的包裹 VEC&LT; U8&GT; ,这是谁都会被丢弃后者)。因此,当你想揭露锈分配为C字符串中的数据是有用的。

不幸的是,C字符串总是以零字节结束,不能包含一个在他们里面,而锈&放大器; U8] / VEC&LT; U8&GT; 是完全相反的事情 - 它们不与零字节结束,并且可以包含它们的任意数目的内部。这意味着,从 VEC&LT去; U8&GT; 的CString 既不是无差错的,也不自由分配 - - 的CString 构造两个检查为您提供的数据中零,如果它发现了一些,并附加一个零字节到可能要求其重新分配的字节向量的端返回一个错误。

如同字符串,它实现了 DEREF&lt;目标= STR&GT; 的CString 工具 DEREF&lt;目标= CStr的&GT; ,所以你可以调用 CStr的直接法的CString 。这是重要的,因为 as_ptr() 方法,它返回 * const的c_char ,这是必要对C互操作,是在 CStr的。因此,你可以在的CString 的价值观,这是方便直接调用此方法。

的CString 可以从一切可转换为创建VEC&LT; U8&GT; 字符串&放大器; STR VEC&LT; U8&GT; &放大器; U8] 是构造函数有效参数,<一个href=\"http://doc.rust-lang.org/std/ffi/struct.CString.html#method.new\"><$c$c>CString::new().当然,如果你传递一个字节的片或串片,一个新的分配将被创建,而 VEC&LT; U8&GT; 字符串将被消耗掉。

让c_str_1 = CString的::新的(你好)解开()。 //从A和STR,创建一个新的分配
让c_str_2 = CString的::新(B世界)解开()。从A和// [U8],创建一个新的分配
让数据:VEC&LT; U8&GT; = B12345678.to_owned(); //从VEC&LT; U8&gt;中消耗它
让c_str_3 = CString的::新的(数据).unwrap();//现在你可以得到一个指向一个有效的0结尾的字符串
//确保你不使用它c_str_2被丢弃后,
让c_ptr:* const的c_char = c_str_2.as_ptr();//以下将打印,因为在源数据中的错误信息
//包含零字节
让数据:VEC&LT; U8&GT; = vec的[1,2,3,0,4,5,0,6]!;
比赛的CString ::新的(数据){
    OK(c_str_4)=&GT; !调用println(有一个C字符串:{:P},c_str_4.as_ptr()),
    ERR(E)=&GT;调用println(错误得到一个C字符串:{},E)!
}

I'm playing around a bit with the Rust FFI, and now I'm trying to get a C string returned by a C library, and convert it to a Rust string.

My code:

mylib.c

const char* hello(){
    return "Hello World!";
}

main.rs

#![feature(link_args)]

extern crate libc;
use libc::c_char;

#[link_args = "-L . -I . -lmylib"]
extern{
    fn hello() -> *c_char;
}

fn main(){
    //how do I get a str representation of hello() here?
}

解决方案

Update 2: the previous update is also not valid anymore for Rust 1.0. It can be found in the revision history of this answer.

The best way to work with C strings in Rust is to use structures from std::ffi module, namely CStr and CString.

CStr is a dynamically sized type and so it can only be used through a pointer. This makes it very similar to the regular str type. You can construct a &CStr from *const c_char using an unsafe CStr::from_ptr static method. This method is unsafe because there is no guarantee that the raw pointer you pass to it is valid, that it really does point to a valid C string and that its (the string) lifetime is correct.

After that you can get a &[u8] from a &CStr using its to_bytes() or to_bytes_with_nul() methods. And then, if you know that the C string is a text encoded in UTF-8, you can get a &str from a &[u8] using str::from_utf8 function.

Here is an example:

extern crate libc;

use libc::c_char;
use std::ffi::CStr;
use std::str;

extern {
    fn hello() -> *const c_char;
}

fn main() {
    let c_buf: *const c_char = unsafe { hello() };
    let c_str: &CStr = unsafe { CStr::from_ptr(c_buf) };
    let buf: &[u8] = c_str.to_bytes();
    let str_slice: &str = str::from_utf8(buf).unwrap();
    let str_buf: String = str_slice.to_owned();  // if necessary
}

You need, however, take into account the lifetime of your *const c_char pointers and who owns them. Depending on the C API you may need to call special deallocation function on the string. Hence you need to carefully arrange conversions so the slices won't outlive the pointer. The fact that CStr::from_ptr returns a &CStr with arbitrary lifetime helps here (though it is dangerous by itself); for example, you can encapsulate your C string into a structure and provide a deref conversion so you can use your struct as if it was a string slice:

extern {
    fn hello() -> *const c_char;
    fn goodbye(s: *const c_char);
}

struct Greeting {
    message: *const c_char
}

impl Drop for Greeting {
    fn drop(&mut self) {
        unsafe { goodbye(self.message); }
    }
}

impl Greeting {
    fn new() -> Greeting {
        Greeting {
            message: unsafe { hello() }
        }
    }
}

impl Deref for Greeting {
    type Target = str;

    fn deref<'a>(&'a self) -> &'a str {
        let c_str = unsafe { CStr::from_ptr(self.message) };
        let bytes = c_str.to_bytes();
        str::from_utf8(bytes).unwrap()
    }
}

There is also another type in this module, called CString. It has the same relationship with CStr as String with str - CString is an owned version of CStr. This means that it "holds" the handle to the allocation of the byte data, and dropping CString would free the memory it provides (essentially, CString wraps Vec<u8>, and it's the latter who will be dropped). Consequently, it is useful when you want to expose the data allocated in Rust as a C string.

Unfortunately, C strings always end with zero byte and can't contain one inside them, while Rust &[u8]/Vec<u8> are exactly the opposite thing - they do not end with zero byte and can contain arbitrary numbers of them inside. This means that going from Vec<u8> to CString is neither error-free nor allocations-free - CString constructor both checks for zeros inside the data you provide, returning an error if it finds some, and appends a zero byte to the end of the byte vector which may require its reallocation.

Like String, which implements Deref<Target=str>, CString implements Deref<Target=CStr>, so you can call methods defined on CStr directly on CString. This is important because as_ptr() method, which returns *const c_char, which is necessary for C interoperation, is defined on CStr. Thus you can call this method directly on CString values, which is convenient.

CString can be created from everything which can be converted to Vec<u8>. String, &str, Vec<u8> and &[u8] are valid arguments for the constructor function, CString::new(). Naturally, if you pass a byte slice or a string slice, a new allocation will be created, while Vec<u8> or String will be consumed.

let c_str_1 = CString::new("hello").unwrap();   // from a &str, creates a new allocation
let c_str_2 = CString::new(b"world").unwrap();  // from a &[u8], creates a new allocation
let data: Vec<u8> = b"12345678".to_owned();  // from a Vec<u8>, consumes it
let c_str_3 = CString::new(data).unwrap();

// and now you can obtain a pointer to a valid zero-terminated string
// make sure you don't use it after c_str_2 is dropped
let c_ptr: *const c_char = c_str_2.as_ptr();

// the following will print an error message because the source data
// contains zero bytes
let data: Vec<u8> = vec![1, 2, 3, 0, 4, 5, 0, 6];
match CString::new(data) {
    Ok(c_str_4) => println!("Got a C string: {:p}", c_str_4.as_ptr()),
    Err(e) => println!("Error getting a C string: {}", e)
}

这篇关于拉斯特FFI C字符串处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆