MATLAB:如何显示从文件读取的UTF-8编码文本? [英] MATLAB: how to display UTF-8-encoded text read from file?
问题描述
我的问题的要点是:
如何在Matlab的GUI(OS X)中显示Unicode字符,以便正确呈现它们?
How can I display Unicode characters in Matlab's GUI (OS X) so that they are properly rendered?
详细信息:
我有一个存储在文件中的字符串表,其中一些字符串包含UTF-8编码的Unicode字符.我尝试了许多不同的方法(在此没有太多列出)在MATLAB GUI中显示此文件的内容,但没有成功.例如:
I have a table of strings stored in a file, and some of these strings contain UTF-8-encoded Unicode characters. I have tried many different ways (too many to list here) to display the contents of this file in the MATLAB GUI, without success. For example:
>> fid = fopen('/Users/kj/mytable.txt', 'r', 'n', 'UTF-8');
>> [x, x, x, enc] = fopen(fid); enc
enc =
UTF-8
>> tbl = textscan(fid, '%s', 35, 'delimiter', ',');
>> tbl{1}{1}
ans =
ÎÎÎÎÎΠΣΦΩαβγδεζηθικλμνξÏÏÏÏÏÏÏÏÏÏ
>>
碰巧的是,如果我将字符串直接粘贴到MATLAB GUI中,则粘贴的字符串会正确显示,这表明GUI基本上不是无法显示这些字符的,但是一旦MATLAB读入它,它就会更长地显示它正确地.例如:
As it happens, if I paste the string directly into the MATLAB GUI, the pasted string is displayed properly, which shows that the GUI is not fundamentally incapable of displaying these characters, but once MATLAB reads it in, it longer displays it correctly. For example:
>> pasted = 'ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω'
pasted =
>>
谢谢!
推荐答案
在进行一些挖掘之后,我在下面显示了我的发现...考虑这些测试文件:
I present below my findings after doing some digging... Consider these test files:
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω
b.txt
தமிழ்
首先,我们读取文件:
%# open file in binary mode, and read a list of bytes
fid = fopen('a.txt', 'rb');
b = fread(fid, '*uint8')'; %'# read bytes
fclose(fid);
%# decode as unicode string
str = native2unicode(b,'UTF-8');
如果您尝试打印字符串,则会出现一堆废话:
If you try to print the string, you get a bunch of nonsense:
>> str
str =
尽管如此,str
确实保存了正确的字符串.我们可以检查每个字符的Unicode代码,就像您在ASCII范围之外看到的那样(最后两个是不可打印的CR-LF行结尾):
Nonetheless, str
does hold the correct string. We can check the Unicode code of each character, which are as you can see outside the ASCII range (last two are the non-printable CR-LF line endings):
>> double(str)
ans =
Columns 1 through 13
915 916 920 923 926 928 931 934 937 945 946 947 948
Columns 14 through 26
949 950 951 952 953 954 955 956 957 958 960 961 962
Columns 27 through 35
963 964 965 966 967 968 969 13 10
不幸的是,MATLAB似乎无法独自在GUI中显示此Unicode字符串.例如,所有这些均失败:
Unfortunately, MATLAB seems unable to display this Unicode string in a GUI on its own. For example, all these fail:
figure
text(0.1, 0.5, str, 'FontName','Arial Unicode MS')
title(str)
xlabel(str)
我发现的一个技巧是使用嵌入式Java功能:
One trick I found is to use the embedded Java capability:
%# Java Swing
label = javax.swing.JLabel();
label.setFont( java.awt.Font('Arial Unicode MS',java.awt.Font.PLAIN, 30) );
label.setText(str);
f = javax.swing.JFrame('frame');
f.getContentPane().add(label);
f.pack();
f.setVisible(true);
当我准备写上面的内容时,我找到了一个替代解决方案.我们可以使用DefaultCharacterSet
未公开的功能并将字符集设置为UTF-8
(在我的机器上,默认为ISO-8859-1
):
As I was preparing to write the above, I found an alternative solution. We can use the DefaultCharacterSet
undocumented feature and set the charset to UTF-8
(on my machine, it is ISO-8859-1
by default):
feature('DefaultCharacterSet','UTF-8');
现在使用适当的字体(您可以从Preferences > Font
更改命令窗口中使用的字体),我们可以在提示符下打印字符串(请注意DISP仍然无法打印Unicode):
Now with a proper font (you can change the font used in the Command Window from Preferences > Font
), we can print the string in the prompt (note that DISP is still incapable of printing Unicode):
>> str
str =
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω
>> disp(str)
Î"Î"ΘΛΞΠΣΦΩαβγδεζηθικλμνξπÏςστυφχψω
要在GUI中显示它,UICONTROL应该可以工作(实际上,我认为它确实是Java Swing组件):
And to display it in a GUI, UICONTROL should work (under the hood, I think it is really a Java Swing component):
uicontrol('Style','text', 'String',str, ...
'Units','normalized', 'Position',[0 0 1 1], ...
'FontName','Arial Unicode MS', 'FontSize',30)
不幸的是,TEXT,TITLE,XLABEL等仍然显示为垃圾:
Unfortunately, TEXT, TITLE, XLABEL, etc.. are still showing garbage:
作为旁注:在MATLAB编辑器中,很难处理包含Unicode字符的m文件源.我使用的是 Notepad ++ ,文件编码为 UTF-8,而没有BOM .
As a side note: It is difficult to work with m-file sources containing Unicode characters in the MATLAB editor. I was using Notepad++, with files encoded as UTF-8 without BOM.
这篇关于MATLAB:如何显示从文件读取的UTF-8编码文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!