如何使用utf8字符正确读取网址内容？ [英] How to correctly read url content with utf8 chars?

查看：155 发布时间：2018/12/17 10:11:07 java url encode utf

本文介绍了如何使用utf8字符正确读取网址内容？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

    public class URLReader {
         public static byte[] read(String from, String to, String string){
          try {
           String text = "http://translate.google.com/translate_a/t?"+
                        "client=o&text="+URLEncoder.encode(string, "UTF-8")+
                        "&hl=en&sl="+from+"&tl="+to+"";

           URL url = new URL(text);
           BufferedReader in = new BufferedReader(
                        new InputStreamReader(url.openStream(), "UTF-8"));
           String json = in.readLine();
           byte[] bytes = json.getBytes("UTF-8");
           in.close();
           return bytes;
                    //return text.getBytes();
          }
          catch (Exception e) {
           return null;
          }
         }
        }

和：

public class AbcServlet extends HttpServlet {
 public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
  resp.setContentType("text/plain;charset=UTF-8");
  resp.getWriter().println(new String(URLReader.read("pl", "en", "koń")));
 }
}

当我运行时，我得到： { 句子[{ 反式：结束，原稿： koďż˝， TRANSLIT：， src_translit： }]， SRC： PL， server_time：30}
所以utf无法正常工作但如果我返回编码的网址： http://translate.google.com/translate_a/t?client= o& text = ko％C5％84& hl = en& sl = pl& tl = en 并粘贴在url bar我得到正确： {sentences：[{ 反式：马，原稿： KON， TRANSLIT：， src_translit： }]，字典：[{ POS：名词，术语： [horse]}]，src：pl，server_time：76}

When I run this i get:{"sentences"[{"trans":"end","orig":"koďż˝","translit":"","src_translit":""}],"src":"pl","server_time":30} so utf doesnt work correctly but if i return encoded url: http://translate.google.com/translate_a/t?client=o&text=ko%C5%84&hl=en&sl=pl&tl=en and paste at url bar i get correctly:{"sentences":[{"trans":"horse","orig":"koń","translit":"","src_translit":""}],"dict":[{"pos":"noun","terms":["horse"]}],"src":"pl","server_time":76}

推荐答案

byte[] bytes = json.getBytes("UTF-8");

为您提供UTF-8字节序列，因此URLReader.read也为您提供UTF-8字节序列

gives you a UTF-8 bytes sequences so URLReader.read also give you UTF-8 bytes sequences

但您试图在不指定编码器的情况下进行解码，即 new String（URLReader.read（pl，en，koń）））因此Java将使用您的系统默认编码进行解码（不是UTF-8）

but you tried to decode with without specifying the encoder, i.e. new String(URLReader.read("pl", "en", "koń")) so Java will use your system default encoding to decode (which is not UTF-8)

尝试：

new String(URLReader.read("pl", "en", "koń"), "UTF-8")

更新

这是完全正常工作我机器上的代码：

Here is fully working code on my machine:

public class URLReader {

    public static byte[] read(String from, String to, String string) {
        try {
            String text = "http://translate.google.com/translate_a/t?"
                    + "client=o&text=" + URLEncoder.encode(string, "UTF-8")
                    + "&hl=en&sl=" + from + "&tl=" + to + "";
            URL url = new URL(text);
            URLConnection conn = url.openConnection();
            // Look like faking the request coming from Web browser solve 403 error
            conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)");
            BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
            String json = in.readLine();
            byte[] bytes = json.getBytes("UTF-8");
            in.close();
            return bytes;
            //return text.getBytes();
        } catch (Exception e) {
            System.out.println(e);
            // becarful with returning null. subsequence call will return NullPointException.
            return null;
        }
    }
}

别忘了逃避到\ u0144。 Java编译器可能无法正确编译Unicode文本，因此最好用纯ASCII编写它。

Don't forget to escape ń to \u0144. Java compiler may not compile Unicode text properly so it is good idea to write it in plain ASCII.

public class AbcServlet extends HttpServlet {

    @Override
    public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
        resp.setContentType("text/plain;charset=UTF-8");
        byte[] read = URLReader.read("pl", "en", "ko\u0144");
        resp.getOutputStream().write(read) ;
    }
}

这篇关于如何使用utf8字符正确读取网址内容？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用utf8字符正确读取网址内容？ [英] How to correctly read url content with utf8 chars?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何使用utf8字符正确读取网址内容？ [英] How to correctly read url content with utf8 chars?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭