jsoup:从1.7.3更新到1.8.1后有不同的结果,如何避免这种情况? [英] jsoup: differnt result after updating from 1.7.3 to 1.8.1, how to avoid this?
问题描述
从jsoup 1.7.3更新到1.8.1后,我得到了不同的结果。
在1.7.3中,title属性被返回转义,与输入相同,在1.8.1中br被转换为标签。
有没有办法可以避免这种行为?
After updating from jsoup 1.7.3 to 1.8.1 I get differnt results. In 1.7.3 the title attribute was returned escaped, same as the input, in 1.8.1 the br is converted into a tag. Is there a way I can avoid this behaviour?
String content = "<a href=\"javascript:openObj('Classifier_UUID')\" title=\"Test<br>Test\">Test<br>Test</a>";
Document document = Jsoup.parseBodyFragment(content);
document.outputSettings().charset(Charset.forName("ASCII")); //$NON-NLS-1$
System.out.println(document.body().html());
结果:
// 1.7.3 <a href="javascript:openObj('Classifier_UUID')" title="Test<br>Test">Test<br />Test</a>
// 1.8.1 <a href="javascript:openObj('Classifier_UUID')" title="Test<br>Test">Test<br>Test</a>
推荐答案
这有点晚了但可以帮助其他人。
It's a bit late but could help some others.
我从jsoup 1.7.2 升级到 1.11.3 并且具有与转义不隐式相同的行为再见。
I upgraded from jsoup 1.7.2 to 1.11.3 and had the same behaviour that the escaping is not implicit anymore.
以下代码为我做了诀窍:
The following code did the trick for me:
String cleanText = Jsoup.clean(s, Whitelist.none());
//& and <,> are escaped from .clean call so we have to unescape them
String cleanUnencodedText = StringEscapeUtils.unescapeHtml3(cleanText);
String cleanEncodedText = StringEscapeUtils.escapeHtml3(cleanUnencodedText);
正如你所看到的,我首先要忘记 cleaningText
因为& < <
由 Jsoup.Clean
调用转义。
As you can see i first had to unescape the cleanedText
because & < <
are escaped by Jsoup.Clean
call.
您可以使用 unescapeHtml4
和 escapeHtml4
代替Html 3版本。我必须支持旧的html版本导致
例如Html 4逃脱€
通过& euro;
You can use unescapeHtml4
and escapeHtml4
instead of the Html 3 versions. I had to support the old html version cause
e.g Html 4 escapes €
through €
这篇关于jsoup:从1.7.3更新到1.8.1后有不同的结果,如何避免这种情况?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!