如何使用JNI将Java字符串转换为宽字符串 [英] How to convert java strings to wide character strings using JNI

查看:308
本文介绍了如何使用JNI将Java字符串转换为宽字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

几个月前,我写了一个Java API,它使用JNI来包装C API. C API使用char字符串,而我使用GetStringUTFChars从Java字符串创建C字符串.

Several months ago, I wrote a Java API that use JNI to wrap around a C API. The C API used char strings and I used GetStringUTFChars to create the C strings from the Java Strings.

我忽略了非ASCII字符可能引起的问题.

I neglected to think through the problems that might arise with non-ASCII characters.

从那时起,C API的创建者为每个需要或返回wchar_t字符串的C函数创建了与之等效的宽字符.我想更新Java API以使用这些宽字符功能,并克服非ASCII字符所带来的问题.

Since then the creator of the C API has created wide character equivalents to each of his C functions that require or return wchar_t strings. I would like to update my Java API to use these wide character functions and overcome the issue I have with non-ASCII characters.

研究了JNI文档后,我对使用GetStringChars或GetStringRegion方法的相对优点感到困惑.

Having studied the JNI documentation, I am a little confused by the relative merits of using the GetStringChars or GetStringRegion methods.

我知道wchar_t字符的大小在Windows和Linux之间有所不同,并且不确定创建C字符串(然后将它们转换回Java字符串)的最有效方法.

I am aware that the size of a wchar_t character varies between Windows and Linux and am not sure of the most efficient way to create the C strings (and convert them back to Java strings afterwards).

这是我目前所拥有的代码,我认为该代码会创建一个每个字符两个字节的字符串:

This is the code I have at the moment which I think creates a string with two bytes per character:

int len;
jchar *Src;

len = (*env)->GetStringLength(env, jSrc);
printf("Length of jSrc is %d\n", len);

Src = (jchar *)malloc((len + 1)*sizeof(jchar));
(*env)->GetStringRegion(env, jSrc, 0, len, Src);
Src[len] = '\0';

但是,当wchar_t的大小与jchar的大小不同时,需要对此进行修改.

However, this will need modifying when the size of a wchar_t differs from jchar.

推荐答案

C API创建者是否不愿意退一步并使用 UTF-8 重新实现? :)您的工作将基本消失,只需要GetStringUTFChars/NewStringUTF.

Isn't the C API creator willing to take step back and reimplement with UTF-8? :) Your work would essentialy disappear, needing only GetStringUTFChars/NewStringUTF.

jchar的类型定义为unsigned short,等效于JVM char,即 UTF-16 .因此,在wchar_t也是2个字节 UTF-16 的Windows上,您也可以删除显示的代码.只需复制原始字节,然后进行相应分配.完成C API调用后,请不要忘记释放它.与NewString互补以转换回jstring.

jchar is typedefed to unsigned short and is equivalent to JVM char which is UTF-16. So on Windows where wchar_t is 2 bytes UTF-16 too, you can do away with the code you presented. Just copy the raw bytes around, allocate accordingly. Don't forget to free after you're finished with the C API call. Complement with NewString for conversion back to jstring.

我知道的唯一的其他wchar_t大小是 UTF-32 (4个字节)(最显着的是Linux).问题出在这里: UTF-32不仅以某种方式填充为4个字节的UTF-16.分配两倍的内存只是一个开始.要做大量的转换,例如似乎很免费.

The only other wchar_t size i am aware of is 4 bytes (most prominently Linux) being UTF-32. And here comes the problem: UTF-32 is not just UTF-16 somehow padded to 4 bytes. Allocating double the amount of memory is just a beginning. There is a substantial conversion to do, like this one which seems to be sufficiently free.

但是,如果您对性能的追求不高,并且愿意放弃Windows上的普通内存复制,那么我建议您将jstring转到UTF-8(这是JNI本身提供的文档功能),然后再使用UTF- 8到UTF-16或UTF-32,具体取决于sizeof(wchar_t).关于每个平台给出的字节顺序和UTF编码,将没有任何假设.您似乎对此很在意,我发现您正在检查sizeof(jchar),对于大多数可见宇宙而言,sizeof(jchar)是2:)

But if you are not after performance that much and are willing to give up the plain memory copying on Windows, i suggest going jstring to UTF-8 (which is what JNI provides natively with documented functionality) and then UTF-8 to UTF-16 or UTF-32 depending on sizeof(wchar_t). There won't be any assumptions about what byte order and UTF encoding each platform gives. You seem to care about it, i see that you are checking sizeof(jchar) which is 2 for the most of the visible universe :)

这篇关于如何使用JNI将Java字符串转换为宽字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆