MATLAB:如何显示从文件读取的UTF-8编码文本? [英] MATLAB: how to display UTF-8-encoded text read from file?

查看:1502
本文介绍了MATLAB:如何显示从文件读取的UTF-8编码文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题的要点是:

如何在Matlab的GUI(OS X)中显示Unicode字符,以便正确呈现它们?

How can I display Unicode characters in Matlab's GUI (OS X) so that they are properly rendered?

详细信息:

我有一个存储在文件中的字符串表,其中一些字符串包含UTF-8编码的Unicode字符.我尝试了许多不同的方法(在此没有太多列出)在MATLAB GUI中显示此文件的内容,但没有成功.例如:

I have a table of strings stored in a file, and some of these strings contain UTF-8-encoded Unicode characters. I have tried many different ways (too many to list here) to display the contents of this file in the MATLAB GUI, without success. For example:

>> fid = fopen('/Users/kj/mytable.txt', 'r', 'n', 'UTF-8');
>> [x, x, x, enc] = fopen(fid); enc

enc =

UTF-8

>> tbl = textscan(fid, '%s', 35, 'delimiter', ',');
>> tbl{1}{1}

ans =

ÎÎÎÎÎΠΣΦΩαβγδεζηθικλμνξÏÏÏÏÏÏÏÏÏÏ
>> 

碰巧的是,如果我将字符串直接粘贴到MATLAB GUI中,则粘贴的字符串会正确显示,这表明GUI基本上不是无法显示这些字符的,但是一旦MATLAB读入它,它就会更长地显示它正确地.例如:

As it happens, if I paste the string directly into the MATLAB GUI, the pasted string is displayed properly, which shows that the GUI is not fundamentally incapable of displaying these characters, but once MATLAB reads it in, it longer displays it correctly. For example:

>> pasted = 'ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω'

pasted =


>> 

谢谢!

推荐答案

在进行一些挖掘之后,我在下面显示了我的发现...考虑这些测试文件:

I present below my findings after doing some digging... Consider these test files:

ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω

b.txt

தமிழ்

首先,我们读取文件:

%# open file in binary mode, and read a list of bytes
fid = fopen('a.txt', 'rb');
b = fread(fid, '*uint8')';             %'# read bytes
fclose(fid);

%# decode as unicode string
str = native2unicode(b,'UTF-8');

如果您尝试打印字符串,则会出现一堆废话:

If you try to print the string, you get a bunch of nonsense:

>> str
str =

尽管如此,str确实保存了正确的字符串.我们可以检查每个字符的Unicode代码,就像您在ASCII范围之外看到的那样(最后两个是不可打印的CR-LF行结尾):

Nonetheless, str does hold the correct string. We can check the Unicode code of each character, which are as you can see outside the ASCII range (last two are the non-printable CR-LF line endings):

>> double(str)
ans =
  Columns 1 through 13
   915   916   920   923   926   928   931   934   937   945   946   947   948
  Columns 14 through 26
   949   950   951   952   953   954   955   956   957   958   960   961   962
  Columns 27 through 35
   963   964   965   966   967   968   969    13    10

不幸的是,MATLAB似乎无法独自在GUI中显示此Unicode字符串.例如,所有这些均失败:

Unfortunately, MATLAB seems unable to display this Unicode string in a GUI on its own. For example, all these fail:

figure
text(0.1, 0.5, str, 'FontName','Arial Unicode MS')
title(str)
xlabel(str)

我发现的一个技巧是使用嵌入式Java功能:

One trick I found is to use the embedded Java capability:

%# Java Swing
label = javax.swing.JLabel();
label.setFont( java.awt.Font('Arial Unicode MS',java.awt.Font.PLAIN, 30) );
label.setText(str);
f = javax.swing.JFrame('frame');
f.getContentPane().add(label);
f.pack();
f.setVisible(true);

当我准备写上面的内容时,我找到了一个替代解决方案.我们可以使用DefaultCharacterSet未公开的功能并将字符集设置为UTF-8(在我的机器上,默认为ISO-8859-1):

As I was preparing to write the above, I found an alternative solution. We can use the DefaultCharacterSet undocumented feature and set the charset to UTF-8 (on my machine, it is ISO-8859-1 by default):

feature('DefaultCharacterSet','UTF-8');

现在使用适当的字体(您可以从Preferences > Font更改命令窗口中使用的字体),我们可以在提示符下打印字符串(请注意DISP仍然无法打印Unicode):

Now with a proper font (you can change the font used in the Command Window from Preferences > Font), we can print the string in the prompt (note that DISP is still incapable of printing Unicode):

>> str
str =
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω

>> disp(str)
Î"Î"ΘΛΞΠΣΦΩαβγδεζηθικλμνξπÏςστυφχψω

要在GUI中显示它,UICONTROL应该可以工作(实际上,我认为它确实是Java Swing组件):

And to display it in a GUI, UICONTROL should work (under the hood, I think it is really a Java Swing component):

uicontrol('Style','text', 'String',str, ...
    'Units','normalized', 'Position',[0 0 1 1], ...
    'FontName','Arial Unicode MS', 'FontSize',30)

不幸的是,TEXT,TITLE,XLABEL等仍然显示为垃圾:

Unfortunately, TEXT, TITLE, XLABEL, etc.. are still showing garbage:

作为旁注:在MATLAB编辑器中,很难处理包含Unicode字符的m文件源.我使用的是 Notepad ++ ,文件编码为 UTF-8,而没有BOM .

As a side note: It is difficult to work with m-file sources containing Unicode characters in the MATLAB editor. I was using Notepad++, with files encoded as UTF-8 without BOM.

这篇关于MATLAB:如何显示从文件读取的UTF-8编码文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆