没有添加html实体的Jsoup.clean [英] Jsoup.clean without adding html entities

查看：395 发布时间：2018/6/19 20:06:59 java html jsoup html-entities

本文介绍了没有添加html实体的Jsoup.clean的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用

<$清理不需要的HTML标记中的文本（例如< script> ） p $ p>

 String clean = Jsoup.clean（someInput，Whitelist.basicWithImages（））;

问题在于它取代了å with & aring; （这对我造成麻烦，因为它不是纯xml）。例子

  Jsoup.clean（hello< script>< / script> world，Whitelist.basicWithImages（）） b 
 
 
 
 $ b $ p code>hello& aring; world

但我想要

 helloåworld

有没有一种简单的方法来实现这一目标？（比在结果中将& aring; 返回到å）简单很多。） $ b $你可以配置Jsoup的转义模式：使用 EscapeMode.xhtml 会给你输出没有实体。

下面是一个完整的代码片段，它接受 str 作为输入，并使用白名单.simpleText（）：
//解析一个文档 Document doc = Jsoup.parse（STR）; //清理文档。 doc = new Cleaner（Whitelist.simpleText（））。clean（doc）; //调整转义模式 doc.outputSettings（）。escapeMode（EscapeMode.xhtml）; //取回正文的字符串。 str = doc.body（）。html（）;

I'm cleaning some text from unwanted HTML tags (such as <script>) by using
String clean = Jsoup.clean(someInput, Whitelist.basicWithImages());
The problem is that it replaces for instance å with å (which causes troubles for me since it's not "pure xml").

For example
Jsoup.clean("hello å <script></script> world", Whitelist.basicWithImages())
yields
"hello å world"
but I would like
"hello å world"
Is there a simple way to achieve this? (I.e. simpler than converting å back to å in the result.)
解决方案
You can configure Jsoup's escaping mode: Using EscapeMode.xhtml will give you output w/o entities.

Here's a complete snippet that accepts str as input, and cleans it using Whitelist.simpleText():
// Parse str into a Document Document doc = Jsoup.parse(str); // Clean the document. doc = new Cleaner(Whitelist.simpleText()).clean(doc); // Adjust escape mode doc.outputSettings().escapeMode(EscapeMode.xhtml); // Get back the string of the body. str = doc.body().html();

这篇关于没有添加html实体的Jsoup.clean的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

没有添加html实体的Jsoup.clean [英] Jsoup.clean without adding html entities

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

没有添加html实体的Jsoup.clean [英] Jsoup.clean without adding html entities

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭