全角字符的String.format [英] String.format for double-width characters

查看:158
本文介绍了全角字符的String.format的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Java的 String.format 似乎不知道全角字符,例如日语或中文:

  System.out.println(String.format(%1 $ 9s:%2 $ 20s:%3 $ 20s\n, field, expected, 实际)); 
System.out.println(String.format(%1 $ 9s:%2 $ 20s:%3 $ 20s\n, surface,駆け,駆け)));

输出未正确对齐:

 字段:预期:实际
表面:駆け:駆け

是否存在使用 String.format 格式化全角字符的正确方法?如果没有,是否有其他方法或库能够正确执行此操作?

解决方案

Java的 String.format(),因为它无法知道您要如何呈现文本或将要使用的字体。它的作用纯粹是组装格式化的字符串以随后显示。格式化文本的视觉外观(主要)由显示字体控制,开发人员必须相应地明确设置格式。



一个简单的解决方案是使用一种字体,它同时显示具有恒定宽度的字形的拉丁字符和CJK字符,但我找不到一个。有关更多详细信息,请参见




以下是您的对齐问题的可能解决方案:




  • 使用Kosugi Maru字体(在上面的屏幕截图的第一行的中间),日语字符的宽度似乎恰好是拉丁字符的两倍,因此请使用该字体来呈现输出。

  • 呈现格式文本时,必须将每个要显示的日语字符的前导空格减少一个,以确保列对齐(因为日语字形的宽度是其两倍)。
  • li>


因此,在代码中将前导空格的数量减少为要渲染的日文字形的数量:

  System.out .println( *显示字体名为MotoyaLMaru,通过安装Google字体KosugiMaru-Regular.ttf创建。); 
System.out.println( *使用这种字体,日语字形似乎是拉丁字形宽度的两倍。);
System.out.println( *从https://fonts.google.com/specimen/Kosugi+Maru?selection.family=Kosugi+Maru下载);
System.out.println();
System.out.println(String.format(%1 $ 9s:%2 $ 20s:%3 $ 20s\n, field, expected, actual)));
System.out.println(String.format(%1 $ 9s:%2 $ 18s:%3 $ 18s\n, surface,駆け,駆け))); // 18,而不是20!
System.out.println(String.format(%1 $ 9s:%2 $ 12s:%3 $ 12s\n, 1234567,川土空田天生花草,川土空田天生花草)); // 12,而不是20!

这是在Windows 10上的NetBeans中运行该代码的输出,显示正确对齐的列: / p>



注意:




  • 在此示例中,格式字符串经过硬编码以确保列对齐,但是根据要呈现的日语字符数动态构建格式字符串将很简单。 / li>
  • 另请参见同时支持英语和日语的等宽字体


Java's String.format does not appear to be aware of double-width characters, such as Japanese or Chinese:

System.out.println(String.format("%1$9s: %2$20s : %3$20s\n", "field", "expected", "actual"));
System.out.println(String.format("%1$9s: %2$20s : %3$20s\n", "surface", "駆け", "駆け"));

The output is not aligned correctly:

field:             expected :               actual
surface:                   駆け :                   駆け

Is there a correct way to format double-width characters with String.format? If not, is there an alternative method or library which is capable of doing this correctly?

解决方案

There is no issue with Java's String.format() since it can't "know" how you want to render the text, or the font that will be used. Its role is purely to assemble a formatted string of text to be subsequently displayed. The visual appearance of that formatted text is controlled (primarily) by the display font, and the developer must explicitly set the formatting accordingly.

A simple solution would be to use a font that renders both Latin and CJK characters with glyphs of constant width, but I couldn't find one. See a Unicode Technical Report titled "East Asian Width" for more details:

For a traditional East Asian fixed pitch font, this width translates to a display width of either one half or a whole unit width. A common name for this unit width is "Em". While an Em is customarily the height of the letter "M", it is the same as the unit width in East Asian fonts, because in these fonts the standard character cell is square. In contrast, the character width for a fixed-pitch Latin font like Courier is generally 3/5 of an Em.

I'm guessing that there might not be any monospace font displaying CJK characters and Latin characters with the same width simply because it would look very strange. For example, imagine the two Latin characters "li" occupying the same width as the two Japanese characters "駆け". So even if you use a monospaced font to render both Latin and CJK characters, although the characters for each language are monospaced, the widths for each language are probably still different.

Google has a very helpful site for evaluating their fonts, which allows you to:

  • Filter the fonts by language: Japanese, Chinese, etc.
  • View a large number of characters being rendered. For example this page for Noto Sans JP shows:
    • The Japanese glyphs are wider than the Latin glyphs.
    • The Japanese glyphs are fixed width, whereas the Latin glyphs are not.
  • Enter any text you wish, and apply it to all selected fonts for comparison. For example, this screen shot shows how the Latin glyphs for AEIOUY look alongside some Japanese glyphs using different fonts. Note that the width of the Latin glyphs is always smaller, though by varying amounts, depending on the font being used and the specific glyph to be rendered:

Here's a possible solution to your alignment problem:

  • With the Kosugi Maru font (middle of top row in the screen shot above), Japanese characters seem to be exactly twice as wide as Latin characters, so use that font to render the output.
  • When rendering the formatted text, the leading spaces must be reduced by one for each Japanese character to be displayed to ensure column alignment (since Japanese glyphs are twice as wide).

So in the code reduce the number of leading spaces by the number of Japanese glyphs to be rendered:

    System.out.println("* The display font is named MotoyaLMaru, created by installing Google font KosugiMaru-Regular.ttf.");
    System.out.println("* With this font Japanese glyphs seem to be twice the width of Latin glyphs.");
    System.out.println("* Downloaded from https://fonts.google.com/specimen/Kosugi+Maru?selection.family=Kosugi+Maru");
    System.out.println(" ");
    System.out.println(String.format("%1$9s: %2$20s : %3$20s\n", "field", "expected", "actual"));
    System.out.println(String.format("%1$9s: %2$18s : %3$18s\n", "surface", "駆け", "駆け")); // 18, not 20!
    System.out.println(String.format("%1$9s: %2$12s : %3$12s\n", "1234567", "川土空田天生花草", "川土空田天生花草")); // 12, not 20!

This is the output from running that code in NetBeans on Windows 10, showing the columns properly aligned:

Notes:

这篇关于全角字符的String.format的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆