如何使用JSOUP获取解析的HTML特殊字符 [英] How do I get parsed HTML special characters using JSOUP

查看:196
本文介绍了如何使用JSOUP获取解析的HTML特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用JSoup从网页获取H1标签值,该标签包含以下HTML.

I am using JSoup to get the H1 tag value from a webpage, this tag contains the following HTML.

己基β-D-吡喃葡萄糖苷

当我使用.text()方法时,我得到以下信息. (请注意?),我认为这是因为它无法计算出β"字符的HTML.如何获得在网页上呈现的值.

When I use the .text() method I get the following. (Note the ?) I assume this is because it cannot work out the HTML for the "β" character. How do I get this value as rendered on a webpage.

己基?-D-吡喃葡萄糖苷

拿起我想要的文字后,我需要做某种转换吗?

Do I need to do some kind of conversion after I have picked up the text I want?

这是我的代码.

        String check = "<title>Hexyl &#946;-D-glucopyranoside &#8805;98.0% (TLC) | &#8805; &#8805;</title>";
        Document doc3 = Jsoup.parse(check);
        doc3.outputSettings().escapeMode(Entities.EscapeMode.base); // default

        doc3.outputSettings().charset("UTF-8");
        System.out.println("UTF-8: " + doc3.html());
        //doc3.outputSettings().charset("ISO 8859-1");
        doc3.outputSettings().charset("ASCII");
        System.out.println("ASCII: " + doc3.html());`

-----在控制台输出-----

-----Output at console-----

    UTF-8: <html>
    <head>
    <title>Hexyl ?-D-glucopyranoside ?98.0% (TLC) | ? ? </title>
     </head>
    <body></body>
   </html>
   ASCII: <html>
    <head>
    <title>Hexyl &#946;-D-glucopyranoside &#8805;98.0% (TLC) | &#8805; &#8805;</title>
     </head>
    <body></body>
    </html>

推荐答案

您正在使用的IDE好像使用了错误的字符编码.

Looks like the IDE you're using is using the wrong character encoding.

这与您的代码无关,因为我已经运行了它,并且很好(输出奇怪的字符).如果您使用的是Eclipse,请转到该特定项目的运行配置设置,然后单击常用"选项卡,然后选择UTF-8.

It's nothing to do with your code as I've ran it and it's fine (outputs the weird characters). If you're using Eclipse go to the run configuration settings for that particular project and click the 'common' tab then choose UTF-8.

这篇关于如何使用JSOUP获取解析的HTML特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆