Java JNI:将多字节字符从Java传递到C [英] Java JNI: Passing multibyte characters from java to c
问题描述
我再次弄乱了java natve接口,并且遇到了另一个有趣的问题.我通过jni将文件路径发送到c,然后执行一些I/O.因此,我遇到的最常见字符是'äåö'.这是一个完全相同问题的程序的简短演示:
I'm once again messing around with the java natve interface, and I've runned into another interesting problem. I'm sending a filepath to c via jni and then doing some I/O. So the most common chars I have troubles with is 'äåö'. Here is a short demo of a program with the exact same problem:
Java:
public class java {
private static native void printBytes(String text);
static{
System.loadLibrary("dll");
}
public static void main(String[] args){
printBytes("C:/Users/ä-å-ö/Documents/Bla.txt");
}
}
C:
#include "java.h"
#include <jni.h>
JNIEXPORT void JNICALL Java_java_printBytes(JNIEnv *env, jclass class, jstring text){
const jbyte* text_input = (*env)->GetStringUTFChars(env, text, 0);
jsize size = (*env)->GetStringUTFLength(env, text);
int i = 0;
printf("%s\n",text_input);
(*env)->ReleaseStringUTFChars(env, text, text_input);
}
输出: C:/用户/├ñ-├Ñ-├Â/Documents/Bla.txt
Output: C:/Users/├ñ-├Ñ-├Â/Documents/Bla.txt
这不是我想要的结果,我希望它输出与Java中相同的字符串.
This is NOT my desired result, I would like it to output the same string as in java.
推荐答案
您正在处理平台特定的字符编码问题.尽管标准c printf应该能够处理多字节(utf-8)编码的字符串,但windows/msvc所提供的只是标准以外的任何东西,不能.在非Windows标准的平台上,您的代码将可以正常工作.来自Java的字符串使用UTF-8(多字节字符),MS printf要求使用ASCII(每个字符单字节).这适用于ASCII字符,因为在UTF-8中这些字符具有相同的值.它不适用于ASCII以外的字符.
You are dealing with platform specific character encoding issues. Although the standard c printf should be able to handle multibyte (utf-8) encoded strings the windows/msvc provided one is anything but standard and cannot. On a non-windows standard conforming platform would expect your code would work. The string coming from java is in UTF-8 (multibyte char) and the MS printf is expecting a ASCII (single byte per char). This is working for ASCII characters because in UTF-8 those characters have the same value. It does not work for characters outside of ASCII.
基本上,您需要将字符串转换为宽字符(text.getBytes(Charset.forName(UTF-16LE"))
)并将其作为数组从java传递到c,或者在接收到多字节字符串后将其转换为c中的宽字符(MultiByteToWideChar(CP_UTF8, ...)
).然后,您可以使用printf(%S")或wprintf(%s")进行输出.
Basically you need to either convert your string to wide characters (text.getBytes(Charset.forName(UTF-16LE"))
) and pass it as an array from java to c or convert the multibyte string to wide characters in c after receiving it (MultiByteToWideChar(CP_UTF8, ...)
). Then you can use printf("%S") or wprintf("%s") to output it.
请参见使用以下命令打印UTF-8字符串printf-宽与多字节字符串文字,以获取更多信息.另外请注意,答案说,如果要在Windows控制台上输出unicode,必须使用_setmode
设置unicode输出模式.
See Printing UTF-8 strings with printf - wide vs. multibyte string literals for more information. Also note that the answer says you have to set unicode output mode with _setmode
if you want unicode output on the windows console.
还请注意,我不相信GetStringUTFLength
可以保证NUL终止符,但已经太久了.
Also note that I don't believe GetStringUTFLength
guarantees a NUL terminator but it's been too long.
这篇关于Java JNI:将多字节字符从Java传递到C的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!