我正在寻找一个Java HTML编码器 [英] I'm looking for a Java HTML encoder

查看:152
本文介绍了我正在寻找一个Java HTML编码器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找具有白名单功能的Java HTML编码器,以将 编码为& lt; 还有一些其他的东西。通过白名单功能,我的意思是能够在编码所有内容时保持诸如< b> < i>

另一个名字是HTMLTidy,但我读它使用unicode进行编码,而不是标准的& lt; & gt; 我的要求。



任何想法?



PS:我的意图是让我的用户输入任何他们想要的东西,并通过编码白名单之外的任何东西/潜在的有害的方式来保护它。

解决方案

我明白你想要像SO一样的方法使用。您需要为输入提供DB 2列:一个用于原始数据(以便可以在编辑器中重新显示),另一个用于列出白名单/降序数据(从而可以在输出中显示)。我想象,他们有另一列指示白名单/降序数据的版本。该版本存储在可配置的应用范围变量中。每当要查询列入白名单/降价数据的时候,当版本被更改时,它将根据原始数据重新列入/删除。



另外,您只需要将数据列入白名单,在Java中可以使用
Jsoup 。请注意,它不会编码不需要的HTML标签,只是删除它们。 Jsoup 白名单 API 提供几个预定白名单,它还允许您自定义它们。以下是一个示例,您可以如何使用 白名单#basic()

  String whitelistedHtml = Jsoup.clean (rawHtml,Whitelist.basic()); 
// ...

白名单允许按照 javadoc 以下HTML标签:


a,b,blockquote,br,cite,code,dd,dl,dt,em,i,li,ol,p,pre,q,small, strong,sub,sup,u,ul




另请参见:




I'm looking for a Java HTML encoder with a whitelist feature to encode < to &lt; and a few other things. By whitelist feature I mean the ability to keep tags such as <b> and <i> while encoding everything else. Sort of What SO does.

I looked at ESAPI, but the usage documentation is completely lacking. I wasn't able to find anything on how to call the API (I'm not a java developer).

Another name that came up was HTMLTidy, but I read that it encodes using unicode, instead of the standard &lt; or &gt; which would break one of my requirements.

Any ideas?

PS: My intention is to allow my users to input anything they want, and secure it by encoding anything outside of my whitelist/potentially harmful.

解决方案

I understand that you want the same approach as SO uses. You need to have DB 2 columns for the input: one for raw data (so that it can be redisplayed in editor) and one with whitelisted/markdowned data (so that it can be displayed in output). I imagine that they have another column indicating the version of the whitelisted/markdowned data. The version is stored in a configureable applicationwide variable. Whenever the whitelisted/markdowned data is to be queried, while the version has been changed, then it will be re-whitelisted/markdowned based on the raw data.

The markdown part aside, you just want to whitelist the data, in Java you can use Jsoup for this. Note that it does not encode unwanted HTML tags, it just removes them. The Jsoup Whitelist API offers several predefinied whitelists and it also allows you to customize them. Here's an example how you can use it with Whitelist#basic():

String whitelistedHtml = Jsoup.clean(rawHtml, Whitelist.basic());
// ...

This whitelist allows as per its javadoc the following HTML tags:

a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, strike, strong, sub, sup, u, ul

See also:

这篇关于我正在寻找一个Java HTML编码器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆