Jsoup有特殊HTML符号的问题,& lsquo; &安培; MDASH;等等 [英] Jsoup having problems with special HTML symbols, ‘ — etc

查看:174
本文介绍了Jsoup有特殊HTML符号的问题,& lsquo; &安培; MDASH;等等的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些HTML(字符串),我正在通过Jsoup,所以我可以添加一些东西到所有href和src属性,这很好。但是,我注意到对于一些特殊的HTML字符,Jsoup将它们从说明& ldquo; 转换为实际字符。我输出前后的值,我看到了这个变化。

I have some HTML (String) that I am putting through Jsoup just so I can add something to all href and src attributes, that works fine. However, I'm noticing that for some special HTML characters, Jsoup is converting them from say “ to the actual character ". I output the value before and after and I see that change.

之前:

THIS — IS A “TEST”. 5 > 4. trademark: ™

之后:

THIS — IS A "TEST". 5 > 4. trademark: ?

到底发生了什么事?我特意将这些特殊字符转换为HTML实体,然后才能避免这种情况。引号更改为实际引号字符,大于保持不变,商标变为问号。 AAAAAAA。

What the heck is going on? I was specifically converting those special characters to their HTML entities before any Jsoup stuff to avoid this. The quotes changed to the actual quote characters, the greater-than stayed the same, and the trademark changed into a question mark. Aaaaaaa.

仅供参考,我的Jsoup代码正在执行:

FYI, my Jsoup code is doing:

Document document = Jsoup.parse(fileHtmlStr);
//some stuff
String modifiedFileHtmlStr = document.html();

感谢您的帮助!

推荐答案

下面的代码将类似于输入标记。它会更改特定字符的转义模式,并设置ASCII模式以转义不支持Unicode的系统的TM符号。

The code below will give similar to the input markup. It changes the escaping mode for specific characters and sets ASCII mode to escape the TM sign for systems which don't support Unicode.

输出:

<p>THIS &mdash; IS A &ldquo;TEST&rdquor;&period; 5 &gt; 4&period; trademark&colon; &#x99;</p>

代码:

Document doc = Jsoup.parse("" +
    "<p>THIS &mdash; IS A &ldquo;TEST&rdquo;. 5 &gt; 4. trademark: &#153;</p>");

Document.OutputSettings settings = doc.outputSettings();

settings.prettyPrint(false);
settings.escapeMode(Entities.EscapeMode.extended);
settings.charset("ASCII");

String modifiedFileHtmlStr = doc.html();

System.out.println(modifiedFileHtmlStr);

这篇关于Jsoup有特殊HTML符号的问题,&amp; lsquo; &安培; MDASH;等等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆