为什么JSOUP不会读为UTF-8? [英] Why JSOUP does not read as UTF-8?

查看:199
本文介绍了为什么JSOUP不会读为UTF-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想jsoup解析为utf -8,但我不能。我尝试了解我所知道的一切,并在谷歌上搜索。



我的目标是什么:

  String tmp_html_content =Öç; 

InputStream is = new ByteArrayInputStream(tmp_html_content.getBytes());
Document doc_tbl = Jsoup.parse(is,UTF-8,);
doc_tbl.outputSettings()。charset()。forName(UTF-8);
doc_tbl.outputSettings()。escapeMode(EscapeMode.xhtml);

但是 doc_tbl 不是 UTF-8



请帮忙解释一下

解决方案

  public static void main(String [] args){
System.out.println(Hello World);

字符串tmp_html_content =Öçasasa;

InputStream is = new ByteArrayInputStream(tmp_html_content.getBytes());
org.jsoup.nodes.Document doc_tbl;
try {
doc_tbl = Jsoup.parse(is,ISO-8859-9,);
((org.jsoup.nodes.Document)doc_tbl).outputSettings()。charset()。forName(UTF-8);
((org.jsoup.nodes.Document)doc_tbl).outputSettings()。escapeMode(EscapeMode.xhtml);
String htmlString = doc_tbl.toString();
System.out.println(htmlString);
} catch(IOException e){
// TODO自动生成的catch块
e.printStackTrace();

}

}



/ h2>

Hello World


$ bÖçasasa


I want to jsoup parse as utf -8 but I cant. I try everything I know and I searched on google.

What is my goal:

String tmp_html_content ="Öç";

InputStream is = new ByteArrayInputStream(tmp_html_content.getBytes());            
Document doc_tbl  =  Jsoup.parse(is, "UTF-8", ""); 
doc_tbl.outputSettings().charset().forName("UTF-8");
doc_tbl.outputSettings().escapeMode(EscapeMode.xhtml);

But doc_tbl is not UTF-8.

please help about that

解决方案

public static void main(String []args){
        System.out.println("Hello World");

        String tmp_html_content ="Öçasasa";

        InputStream is = new ByteArrayInputStream(tmp_html_content.getBytes());            
        org.jsoup.nodes.Document doc_tbl;
        try {
            doc_tbl = Jsoup.parse(is, "ISO-8859-9", "");
              ((org.jsoup.nodes.Document) doc_tbl).outputSettings().charset().forName("UTF-8");
                ((org.jsoup.nodes.Document) doc_tbl).outputSettings().escapeMode(EscapeMode.xhtml);
                String htmlString = doc_tbl.toString();
                System.out.println(htmlString);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();

        } 

     }

out put

Hello World Öçasasa

这篇关于为什么JSOUP不会读为UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆