如何用土耳其语字符号读取java中的UTF 8编码文件 [英] How to read UTF 8 encoded file in java with turkish characters

查看:219
本文介绍了如何用土耳其语字符号读取java中的UTF 8编码文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取一个UTF-8编码的txt文件,该文件有一些土耳其字符。基本上我已经编写了一个基于轴的Web服务,它读取此文件并将输出作为字符串发回。不知怎的,我无法正确阅读这些角色。代码非常简单,如下所述:

I am trying to read a UTF-8 encoded txt file, which has some turkish characters. Basically I am have written an axis based web service, which reads this file and send the output back as a string. Somehow I am not able to read the characters properly. The code is very simple as mentioned here:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;

public class TurkishWebService {

    public String generateTurkishString() throws IOException {
        InputStream isr = this.getClass().getResourceAsStream(
                "/" + "turkish.txt");

        BufferedReader in = new BufferedReader(new InputStreamReader(isr,
                "UTF8"));
        String str;

        while ((str = in.readLine()) != null) {
            System.out.println(str);
        }

        in.close();
        return str;
    }

    public String normalString() {
        System.out.println("webService normal text");
        return "webService normal text";
    }

    public static void main(String args[]) throws IOException {
        new TurkishWebService().generateTurkishString();
    }
}

以下是turkish.txt的内容,只有一个line

Here are the contents of turkish.txt, just one line

Assalğçğıİİööşş

我得到标准输出

Assal?τ????÷÷??

请在这里建议我做错了什么。

Please suggest what am I doing wrong here.

推荐答案

您似乎正在将文件数据从UTF-8解码为UTF-16字符串。

You appear to be correctly decoding the file data from UTF-8 to UTF-16 strings.

System.out 执行从UTF-16字符串到默认JRE字符编码。如果这不匹配,则接收字符数据的设备使用的编码会被破坏。因此,控制台应设置为默认字符编码或发生数据损坏。如何做到这一点取决于设备。

System.out performs transcoding operations from UTF-16 strings to the default JRE character encoding. If this does not match the encoding used by the device receiving the character data is corrupted. So, the console should be set to the default character encoding or data corruption occurs. How this is done is device-dependent.

如果您使用的是终端,控制台在确定设备编码方面做得更好。

If you are using a terminal, the Console does a better job of determining the device encoding.

注意:最好使用尝试使用资源或至少尝试 - 最终关闭溪流;使用标准编码常量可用。

Note: it is better to use the try-with-resources or at least try-finally to close streams; use the standard encoding constants if available.

这篇关于如何用土耳其语字符号读取java中的UTF 8编码文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆