Gradle / Eclipse:德国的不同行为“Umlaute”使用平等时 [英] Gradle/Eclipse: Different behavior of german "Umlaute" when using equality?

查看:297
本文介绍了Gradle / Eclipse:德国的不同行为“Umlaute”使用平等时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当使用Java的等式检查(直接或间接)时,我遇到了德国Umlaute(ä,ö,ü,ß)的怪异行为。
一切都按照预期运行,调试或测试从Eclipse并且包含Umlaute的输入被视为与预期的相等或不相等。

I am experiencing a weird behavior with german "Umlaute" (ä, ö, ü, ß) when using Java's equality checks (either directly or indirectly. Everything works as expected when running, debugging or testing from Eclipse and input containing "Umlaute" is treated as equal or not as expected.

但是,当我使用Spring Boot构建应用程序并运行它时,这些相等检查将失败,包含Umlaute,即用于Nationalität等字。

However when I build the application using Spring Boot and run it, these equality checks fail for words that contain "Umlaute", i.e. for words like "Nationalität".

通过Jsoup从网页检索输入,为某些关键字提取表的内容,编码该页面是UTF-8,我已经处理了Jsoup转换它,如果不是这样的话
源文件的编码也是UTF-8。

Input is retrieved from a webpage via Jsoup and content of a table is extracted for some keywords. The encoding of the page is UTF-8 and I have handling in place for Jsoup to convert it if this is not the case. The encoding of the source files is UTF-8 as well.

    Connection connection = Jsoup.connect(url)
                .header("accept-language", "de-de, de, en")
                .userAgent("Mozilla/5.0")
                .timeout(10000)
                .method(Method.GET);
    Response response = connection.execute();
    if(logger.isDebugEnabled())
        logger.debug("Encoding of response: " +response.charset());
    Document doc;
    if(response.charset().equalsIgnoreCase("UTF-8"))
    {
        logger.debug("Response has expected charset");
        doc = Jsoup.parse(response.body(), baseURL);
    }
    else
    {
        logger.debug("Response doesn't have exepcted charset and is converted");
        doc = Jsoup.parse(new String(response.bodyAsBytes(), "UTF-8"), baseURL);
    }

    logger.debug("Encoding of document: " +doc.charset());
    if(!doc.charset().equals(Charset.forName("UTF-8")))
    {
        logger.debug("Changing encoding of document from " +doc.charset());
        doc.updateMetaCharsetElement(true);
        doc.charset(Charset.forName("UTF-8"));
        logger.debug("Changed encoding of document to: " +doc.charset());
    }
    return doc;

示例日志输出(部署应用程序)阅读内容。

Example log output (from deployed app) of reading content.

Encoding of response: utf-8
Response has expected charset
Encoding of document: UTF-8

示例输入:

<tr><th>Nationalität:</th>     <td> [...] </td>    </tr>

示例代码,包含ä,ö,ü或ß的单词失败,但适用于其他字词:

Example code that fails for words containing ä, ö, ü or ß but works fine for other words:

Element header = row.select("th").first();
String text = header.ownText();
if("Nationalität:".equals(text))
{
 // goes here in eclipse
}
else
{
 // and here in deployed spring boot app
}

Eclipse和一个内置&部署了我失踪的应用程序这个行为来自哪里,我该如何解决?

Is there any difference between running from Eclipse and a built & deployed app that I am missing? Where else could this behavior come from and how I this be resolved?

据我看到,这不是(直接)一个编码问题,因为输入显示Umlaute 正确的...
由于这在调试时不可重现,所以我很难弄清楚究竟出了什么问题。

As far as I can see this is not (directly) an encoding issue since the input shows "Umlaute" correctly... Since this is not reproducible when debugging, I am having a hard time figuring out what exactly goes wrong.

编辑:输入内容在日志中(即,变音符号正确显示),我意识到在控制台中看起来不正确:
< th>Nationalität:< / th>

While input looks fine in logs (i.e. diacritics show up correctly) I realized that they don't look correct in the console: <th>Nationalität:</th>

我正在使用如Mirko所建议的Normalizer:
Normalizer.normalize(input,Form.NFC );
(也尝试用NFD)。
如何(SpringBoot-)控制台和(logback)logoutput不同?

I am currently using a Normalizer as suggested by Mirko like this: Normalizer.normalize(input, Form.NFC); (also tried it with NFD). How do (SpringBoot-) console and (logback) logoutput differ?

推荐答案

我想我跟踪到独立应用程序的构建是罪魁祸首。
如上所述,从Eclipse运行一切都很好,问题只发生在我运行独立的Spring Boot应用程序时。

I think I tracked this down to the build of the standalone app being the culprit. As described above, when running from Eclipse all is fine, the problem only occurred when I ran the standalone Spring Boot app.

这是用Gradle构建的。在我的build.gradle我有

This is being built with Gradle. In my build.gradle I have

compileJava.options.encoding = 'UTF-8'

以强制使用UTF-8进行编码。这应该(通常)就足够了。然而,我也使用AspectJ(通过 gradle-aspectj 插件),这显然会打破这种行为(不自觉地? )并导致使用默认编码而不是明确定义的编码。
为了解决这个问题,我添加了

in order to force UTF-8 being used for encoding. This should (usually) be enough. I however also use AspectJ (via gradle-aspectj plugin) which apparently breaks this behavior (involuntarily?) and results in a default encoding to be used instead of the one explicitly defined. In order to solve this I added

compileAspect {
  additionalAjcArgs = ['encoding' : 'UTF-8']
}

到我的build.gradle,将encoding选项传递给ajc编译器。这似乎已经解决了常规版本的问题。

to my build.gradle which passes the encoding option on to the ajc compiler. This seems to have fixed the problem for the regular build.

但是,当从gradle运行测试时,问题依然存在。我还没有找到需要做的事情,为什么上述配置还不够。
现在,这个跟踪单独的问题

The problem still occurs however when tests are run from gradle. I was not yet able to find out what needs to be done there and why the above configuration is not enough. This is now tracked in a separate question.

这篇关于Gradle / Eclipse:德国的不同行为“Umlaute”使用平等时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆