泰语脚本似乎在Java for-each循环中丢失了UTF-8编码 [英] Thai script seems to lose UTF-8 encoding in java for-each loop

查看:102
本文介绍了泰语脚本似乎在Java for-each循环中丢失了UTF-8编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Windows 10的Android Studio中开发应用程序.

问题:以下泰语单词的字符串数组:

  String [] myTHarr = {มาก",เชี่ยว",แน่",ม่อน",บ้าน",พูด","เลื่อย",เมื่อ",ช่ำ",แร่"}; 

...由以下for-each循环处理时:

  for(String s:myTHarr){//s =มาภ执行以下任何代码之前:byte [] utf8EncodedThaiArr = s.getBytes("UTF-8");字符串utf8EncodedThai =新字符串(utf8EncodedThaiArr);//在此处设置断点//s仍然是¸"(我希望它是มาก)//做东西} 

尝试处理第一个单词时会导致s =มาà¸.(其他两个单词都不起作用,但是如果第一个单词失败,这是可以预期的.)

泰语脚本正确显示在字符串数组中(声明是直接从Android Studio复制的),Java文件的文件编码设置为UTF-8(每个

解决方案

根据文档, String(byte [])构造函数通过使用以下方法解码指定的字节数组来构造新的String平台的默认字符集."

我猜测默认字符集不是UTF-8.因此解决方案是为字节数组指定编码.

  String utf8EncodedThai =新的String(utf8EncodedThaiArr,"UTF-8");//在此处设置断点 

I'm trying to develop an application within Android Studio on Windows 10.

PROBLEM: The following string array of Thai words:

String[] myTHarr = {"มาก","เชี่ยว","แน่","ม่อน","บ้าน","พูด","เลื่อย","เมื่อ","ช่ำ","แร่"};

...when processed by the following for-each loop:

for (String s:myTHarr){
  //s = มา� before executing any of the below code:
  byte[] utf8EncodedThaiArr = s.getBytes("UTF-8"); 
  String utf8EncodedThai = new String(utf8EncodedThaiArr); //setting breakpoint here
  // s is still มาà¸�     (I want it to be มาก)
  //do stuff
}

results in s = มา� when attempting to process the first word (none of the other words work either, but that's expected given the first fails).

The Thai script appears in the string array correctly (the declaration was copied straight from Android Studio), the file encoding is set to UTF-8 for the java file (per here), and the File Encoding Settings look like this (per here):

解决方案

According to the documentation, String(byte[]) constructor "Constructs a new String by decoding the specified array of bytes using the platform's default charset."

I'm guessing that the default character set is not UTF-8. So the solution is to specify the encoding for the array of bytes.

String utf8EncodedThai = new String(utf8EncodedThaiArr, "UTF-8"); //setting breakpoint here

这篇关于泰语脚本似乎在Java for-each循环中丢失了UTF-8编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆