MATLAB:如何显示从文件中读取的 UTF-8 编码文本? [英] MATLAB: how to display UTF-8-encoded text read from file?

查看:87
本文介绍了MATLAB:如何显示从文件中读取的 UTF-8 编码文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题的要点是:

如何在 Matlab 的 GUI (OS X) 中显示 Unicode 字符以便正确呈现它们?

How can I display Unicode characters in Matlab's GUI (OS X) so that they are properly rendered?

详情:

我有一个存储在文件中的字符串表,其中一些字符串包含 UTF-8 编码的 Unicode 字符.我尝试了许多不同的方法(太多了,无法在此列出)在 MATLAB GUI 中显示此文件的内容,但没有成功.例如:

I have a table of strings stored in a file, and some of these strings contain UTF-8-encoded Unicode characters. I have tried many different ways (too many to list here) to display the contents of this file in the MATLAB GUI, without success. For example:

>> fid = fopen('/Users/kj/mytable.txt', 'r', 'n', 'UTF-8');
>> [x, x, x, enc] = fopen(fid); enc

enc =

UTF-8

>> tbl = textscan(fid, '%s', 35, 'delimiter', ',');
>> tbl{1}{1}

ans =

ÎÎÎÎÎΠΣΦΩαβγδεζηθικλμνξÏÏÏÏÏÏÏÏÏÏ
>> 

碰巧的是,如果我把字符串直接粘贴到MATLAB GUI中,粘贴的字符串显示正常,这说明GUI并不是根本无法显示这些字符,但是一旦MATLAB读入,它就不再显示了正确.例如:

As it happens, if I paste the string directly into the MATLAB GUI, the pasted string is displayed properly, which shows that the GUI is not fundamentally incapable of displaying these characters, but once MATLAB reads it in, it longer displays it correctly. For example:

>> pasted = 'ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω'

pasted =


>> 

谢谢!

推荐答案

我在做了一些挖掘之后在下面展示了我的发现......考虑这些测试文件:

I present below my findings after doing some digging... Consider these test files:

ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω

b.txt

தமிழ்

首先,我们读取文件:

%# open file in binary mode, and read a list of bytes
fid = fopen('a.txt', 'rb');
b = fread(fid, '*uint8')';             %'# read bytes
fclose(fid);

%# decode as unicode string
str = native2unicode(b,'UTF-8');

如果你尝试打印字符串,你会得到一堆废话:

If you try to print the string, you get a bunch of nonsense:

>> str
str =

尽管如此,str 确实保存了正确的字符串.我们可以检查每个字符的 Unicode 代码,如您在 ASCII 范围之外所看到的(最后两个是不可打印的 CR-LF 行尾):

Nonetheless, str does hold the correct string. We can check the Unicode code of each character, which are as you can see outside the ASCII range (last two are the non-printable CR-LF line endings):

>> double(str)
ans =
  Columns 1 through 13
   915   916   920   923   926   928   931   934   937   945   946   947   948
  Columns 14 through 26
   949   950   951   952   953   954   955   956   957   958   960   961   962
  Columns 27 through 35
   963   964   965   966   967   968   969    13    10

不幸的是,MATLAB 似乎无法自行在 GUI 中显示此 Unicode 字符串.例如,所有这些都失败了:

Unfortunately, MATLAB seems unable to display this Unicode string in a GUI on its own. For example, all these fail:

figure
text(0.1, 0.5, str, 'FontName','Arial Unicode MS')
title(str)
xlabel(str)

我发现的一个技巧是使用嵌入式 Java 功能:

One trick I found is to use the embedded Java capability:

%# Java Swing
label = javax.swing.JLabel();
label.setFont( java.awt.Font('Arial Unicode MS',java.awt.Font.PLAIN, 30) );
label.setText(str);
f = javax.swing.JFrame('frame');
f.getContentPane().add(label);
f.pack();
f.setVisible(true);

当我准备写上面的内容时,我找到了一个替代解决方案.我们可以使用 DefaultCharacterSet 未公开的特性并将字符集设置为 UTF-8(在我的机器上,默认为 ISO-8859-1):

As I was preparing to write the above, I found an alternative solution. We can use the DefaultCharacterSet undocumented feature and set the charset to UTF-8 (on my machine, it is ISO-8859-1 by default):

feature('DefaultCharacterSet','UTF-8');

现在有了合适的字体(您可以从Preferences > Font 更改命令行窗口中使用的字体),我们可以在提示中打印字符串(注意 DISP 仍然无法打印Unicode):

Now with a proper font (you can change the font used in the Command Window from Preferences > Font), we can print the string in the prompt (note that DISP is still incapable of printing Unicode):

>> str
str =
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω

>> disp(str)
Î"Î"ΘΛΞΠΣΦΩαβγδεζηθικλμνξπÏςστυφχψω

为了在 GUI 中显示它,UICONTROL 应该可以工作(在幕后,我认为它确实是一个 Java Swing 组件):

And to display it in a GUI, UICONTROL should work (under the hood, I think it is really a Java Swing component):

uicontrol('Style','text', 'String',str, ...
    'Units','normalized', 'Position',[0 0 1 1], ...
    'FontName','Arial Unicode MS', 'FontSize',30)

不幸的是,TEXT、TITLE、XLABEL 等仍然显示垃圾:

Unfortunately, TEXT, TITLE, XLABEL, etc.. are still showing garbage:

附带说明:在 MATLAB 编辑器中很难处理包含 Unicode 字符的 m 文件源.我使用的是 Notepad++,文件编码为 UTF-8 无 BOM.

As a side note: It is difficult to work with m-file sources containing Unicode characters in the MATLAB editor. I was using Notepad++, with files encoded as UTF-8 without BOM.

这篇关于MATLAB:如何显示从文件中读取的 UTF-8 编码文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆