正确使用JTidy来净化HTML [英] Proper usage of JTidy to purify HTML

查看:154
本文介绍了正确使用JTidy来净化HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用JTidy(jtidy-r938.jar)来清理输入HTML字符串,但我似乎无法正确获取默认设置。通常,诸如hello world之类的字符串在整理后最终成为helloworld。我想展示我在这里做的事情,任何指针都会非常感激:

I am trying to use JTidy (jtidy-r938.jar) to sanitize an input HTML string, but I seem to have problems getting the default settings right. Often strings such as "hello world" end up as "helloworld" after tidying. I wanted to show what I'm doing here, and any pointers would be really appreciated:

假设 rawHtml 是包含输入(真实世界)HTML的String。这就是我正在做的事情:

Assume that rawHtml is the String containing the input (real world) HTML. This is what I'm doing:

        Tidy tidy = new Tidy();
        tidy.setPrintBodyOnly(true);

        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        PrintStream ps = new PrintStream(baos);

        tidy.parse(new StringReader(rawHtml), ps);
        return baos.toString("UTF8");   

首先,上述代码看起来有什么根本错误吗?我似乎得到了奇怪的结果。

First off, does anything look fundamentally wrong with the above code? I seem to be getting weird results with this.

例如,考虑以下输入:

< p class =MsoNormalstyle =text-autospace:none;>< font color =black>< span style =color:black;> ??? < / span>< / font>< b>< font color =#7f0055>< span style =color:#7f0055; font-weight:bold;> private< / span> < / font>< / b>< font color =black>< span style =color:black;> String parseDescription< / span>< / font>< font>

输出为:

< p class =MsoNormalstyle =text-autospace:none;>< font color =
black>< span style = 颜色:黑色; >&安培; NBSP;&安培; NBSP;&安培; NBSP;< /跨度>< /字体>
< b>< font color =#7F0055>< span style =
color:#7f0055; font-weight:bold;> private< / span>< / font>< / b>< font
color =black>< span style =color:black;> String
parseDescription< / span>< / font> < / p>

所以,

public String parseDescription变为publicString parseDescription

"public String parseDescription" becomes "publicString parseDescription"

提前致谢!

推荐答案

嗯,这似乎是Jtidy的一个错误。对于导致问题的确切文件,请参考:

Well, this seems to be a bug in Jtidy. For the exact file which causes problems, refer here:

http://sourceforge.net/tracker/?func=detail&aid=2985849&group_id=13153&atid=113153

感谢所有帮助人员!

这篇关于正确使用JTidy来净化HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆