如何使用UTF-8读取InputStream? [英] How to read a InputStream with UTF-8?

查看:315
本文介绍了如何使用UTF-8读取InputStream?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

欢迎大家

我正在开发一个Java应用程序,该应用程序从Internet调用PHP,它给了我XML响应.

I'm developing a Java app, that calls a PHP from internet that it's giving me a XML response.

响应中包含以下单词:Próximo",但是当我解析XML的节点并将响应返回到String变量时,我收到的单词是:"Pr& oacute; ximo".

In the response is contained this word: "Próximo", but when i parse the nodes of the XML and obtain the response into a String variable, I'm receiving the word like this: "Pr& oacute;ximo".

我确定问题是我在Java应用程序中使用了不同的编码,然后在PHP脚本中使用了不同的编码.然后,我认为我必须将编码设置为与您的PHP xml中的编码相同,即UTF-8

I'm sure that the problem is that i'm using different encoding in the Java app then encoding of PHP script. Then, i supose i must set encoding to the same as in your PHP xml, UTF-8

这是我用来从PHP处理XML文件的代码.

This is the code i'm using to geat the XML file from the PHP.

¿我应将此代码中的哪些内容更改为将编码设置为UTF-8? (请注意,我未使用内置阅读器,我正在使用输入流)

¿What should i change in this code to set the encoding to UTF-8? (note that im not using bufered reader, i'm using input stream)

        InputStream in = null;
        String url = "http://www.myurl.com"
        try {                              
            URL formattedUrl = new URL(url); 
            URLConnection connection = formattedUrl.openConnection();   
            HttpURLConnection httpConnection = (HttpURLConnection) connection;
            httpConnection.setAllowUserInteraction(false);
            httpConnection.setInstanceFollowRedirects(true);
            httpConnection.setRequestMethod("GET");
            httpConnection.connect();               
            if (httpConnection.getResponseCode() == HttpURLConnection.HTTP_OK)
                in = httpConnection.getInputStream();   

            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();                     
            DocumentBuilder db = dbf.newDocumentBuilder();
            Document doc = db.parse(in);
            doc.getDocumentElement().normalize();             
            NodeList myNodes = doc.getElementsByTagName("myNode"); 

推荐答案

当您获取InputStream时,请从中读取byte[].创建字符串时,请在CharSet中传递"UTF-8".示例:

When you get your InputStream read byte[]s from it. When you create your Strings, pass in the CharSetfor "UTF-8". Example:

byte[] buffer = new byte[contentLength];
int bytesRead = inputStream.read(buffer);
String page = new String(buffer, 0, bytesRead, "UTF-8");

请注意,您可能需要使缓冲区达到合理的大小(例如1024),并连续调用inputStream.read(buffer).

Note, you're probably going to want to make your buffer some sane size (like 1024), and continuously called inputStream.read(buffer).

@Amir Pashazadeh

@Amir Pashazadeh

是的,您还可以使用InputStreamReader,然后尝试将parse()行更改为:

Yes, you can also use an InputStreamReader, and try changing the parse() line to:

Document doc = db.parse(new InputSource(new InputStreamReader(in, "UTF-8")));

这篇关于如何使用UTF-8读取InputStream?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆