HTMLCLEANER处理西班牙字符 [英] HTMLCLEANER handle Spanish characters

查看：102 发布时间：2018/6/20 15:37:22 java html htmlcleaner

本文介绍了HTMLCLEANER处理西班牙字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用HtmlCleaner库来解析/转换java中的HTML文件。

似乎无法处理像'ÁáÉéÍíÑñÓóÚúÜü'这样的西班牙字符

是否有任何属性可以在HtmlCleaner中设置以处理这个或任何其他解决方案？这里是我用来调用它的代码：

  CleanerProperties props = new CleanerProperties（）; 
 props.setRecognizeUnicodeChars（true）; 
 java.io.File file = new java.io.File（C：\\example.html）; 
 TagNode tagNode = new HtmlCleaner（props）.clean（file）;

解决方案

HtmlCleaner使用从JVM读取的默认字符集，除非指定。在Windows上，这将是Cp1512而不是UTF-8，这可能是出错的地方。

您可以

在您的JVM启动行上指定 -Dfile.encoding = UTF-8

使用接受字符集的 HtmlCleaner.clean（）重载
TagNode tagNode = new HtmlCleaner（道具）.clean（文件，UTF-8）;
（如果您在项目中使用Google Guava，您可以使用 Charsets .UTF_8 为常量）

使用 HtmlCleaner.clean（）超载接受一个你已经用正确的字符集构建的InputStreamReader。

I am using HtmlCleaner library in order to parse/convert HTML files in java.

It seems that is not able to handle Spanish characters like 'ÁáÉéÍíÑñÓóÚúÜü'

Is there any property which I can set in HtmlCleaner for handling this or any other solution? Here's the code I'm using to invoke it:
CleanerProperties props = new CleanerProperties(); props.setRecognizeUnicodeChars(true); java.io.File file = new java.io.File("C:\\example.html"); TagNode tagNode = new HtmlCleaner(props).clean(file);

解决方案
HtmlCleaner uses the default character set read from the JVM unless specified. On Windows this will be Cp1512 not UTF-8, which is probably where it's going wrong.

You can either

specify -Dfile.encoding=UTF-8 on your JVM start line

use the HtmlCleaner.clean() overload that accepts a character set
TagNode tagNode = new HtmlCleaner(props).clean(file, "UTF-8");
(if you've got Google Guava in the project you can use Charsets.UTF_8 for the constant)

use the HtmlCleaner.clean() overload that accepts an InputStreamReader which you've already constructed with the correct character set.

这篇关于HTMLCLEANER处理西班牙字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

HTMLCLEANER处理西班牙字符 [英] HTMLCLEANER handle Spanish characters

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

HTMLCLEANER处理西班牙字符 [英] HTMLCLEANER handle Spanish characters

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭