如何让Matlab为UTF-16文本文件编写BOM(字节顺序标记)? [英] How do you get Matlab to write the BOM (byte order markers) for UTF-16 text files?

查看:942
本文介绍了如何让Matlab为UTF-16文本文件编写BOM(字节顺序标记)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Matlab创建UTF16文本文件,后来我在使用Java进行阅读。在Matlab中,我打开一个名为fileName的文件,并按如下所示写入:

  fid = fopen(fileName,'w', 'N', 'UTF16-LE'); 
fprintf(fid,Some stuff。);

在Java中,我可以使用以下代码读取文本文件:

  FileInputStream fileInputStream = new FileInputStream(fileName); 
扫描仪扫描仪=新扫描仪(fileInputStream,UTF-16LE);
String s = scanner.nextLine();

这是十六进制输出:

 
抵消(h)00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13
00000000 73 00 6F 00 6D 00 65 00 20 00 73 00 74 00 75 00 66 00 66 00一些。东东。

上述方法正常工作。但是,我希望能够使用带有BOM的UTF16编写文件,给我更多的灵活性,以便我不必担心大小写。在Matlab中,我已经编码:

pre $ f $ fcfn ;
fprintf(fid,Some stuff。);

在Java中,我将代码更改为:

  FileInputStream fileInputStream = new FileInputStream(fileName); 
扫描器扫描器=新的扫描器(fileInputStream,UTF-16);
String s = scanner.nextLine();

在这种情况下,字符串 s 是乱码,因为Matlab不是编写BOM。如果手动添加BOM,我可以使Java代码正常工作。添加BOM,以下文件正常工作。

 
偏移量(h)00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15
00000000 FF FE 73 00 6F 00 6D 00 65 00 20 00 73 00 74 00 75 00 66 00 66 00þ 。东东。

如何让Matlab写出BOM?我知道我可以单独写BOM,但我宁愿让Matlab自动完成。

附录



我从 Amro 中选择了以下答案,因为它完全可以解决问题我提出了。

对我来说,一个关键的发现是Unicode标准和UTF(Unicode转换格式)之间的区别(参见 http://unicode.org/faq/utf_bom.html )。 Unicode标准提供了字符的唯一标识符(代码点)。 UTF提供了每个代码点到唯一字节序列的映射。由于除了少数我使用的字符都在前128个代码点以外,我将转而使用UTF-8作为罗密欧建议。 UTF-8是由Matlab支持的(下面显示的警告不需要被压缩。)和Java,并为我的应用程序将生成较小的文本文件。



I抑制Matlab警告

pre $ $ $ c $警告:不支持编码'UTF-16LE'。



 警告关闭MATLAB:iofun:UnsupportedEncoding; 


解决方案

试试下面的代码=http://www.mathworks.com/help/techdoc/ref/unicode2native.html =nofollow> UNICODE2NATIVE 和 NATIVE2UNICODE 函数进行转换):

 %#转换字符串并写入字节
str ='有些东西。
b = unicode2native(str,'UTF-16');
fid = fopen('utf16.txt','wb');
fwrite(fid,b,'* uint8');
fclose(fid);

我们甚至可以检查写入的字节的十六进制值(前两个是 BOM ):

 >>第1至10列
'FF''FE''53''00''6F''00''6D''00' '65'00'
第11列至第20列
'20'00''73''00''74''00''75''00''66''00'
列21至24
'66''00''2E''00'

>> char(b)
ans =
+ S o m e s t u f f。

现在我们可以使用MATLAB自己的方法读取创建的文件:

%#读取字节并转换回Unicode字符串
fid = fopen('utf16.txt','rb') ;
b = fread(fid,'* uint8')'; %'
fclose(fid);
str = native2unicode(b,'UTF-16')

或者使用 Java 方法,如果你喜欢:

  scanner = java.util.Scanner(java.io.FileInputStream('utf16.txt'),'UTF-16'); 
str = scanner.nextLine()
scanner.close()

应该正确地读取字符串...

I am creating UTF16 text files with Matlab, which I am later reading in using Java. In Matlab, I open a file called fileName and write to it as follows:

fid = fopen(fileName, 'w','n','UTF16-LE');
fprintf(fid,"Some stuff.");

In Java, I can read the text file using the following code:

FileInputStream fileInputStream = new FileInputStream(fileName);
Scanner scanner = new Scanner(fileInputStream, "UTF-16LE"); 
String s = scanner.nextLine();

Here is the hex output:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13
00000000  73 00 6F 00 6D 00 65 00 20 00 73 00 74 00 75 00 66 00 66 00  s.o.m.e. .s.t.u.f.f.

The above approach works fine. But, I want to be able to write out the file using UTF16 with a BOM to give me more flexibility so that I don't have to worry about big or little endian. In Matlab, I've coded:

fid = fopen(fileName, 'w','n','UTF16');
fprintf(fid,"Some stuff.");

In Java, I change the code to:

FileInputStream fileInputStream = new FileInputStream(fileName);
Scanner scanner = new Scanner(fileInputStream, "UTF-16");
String s = scanner.nextLine();

In this case, the string s is garbled, because Matlab is not writing the BOM. I can get the Java code to work just fine if I add the BOM manually. With the added BOM, the following file works fine.

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15
00000000  FF FE 73 00 6F 00 6D 00 65 00 20 00 73 00 74 00 75 00 66 00 66 00  ÿþs.o.m.e. .s.t.u.f.f.

How can I get Matlab to write out the BOM? I know I could write the BOM out separately, but I'd rather have Matlab do it automatically.

Addendum

I selected the answer below from Amro because it exactly solves the question I posed.

One key discovery for me was the difference between the Unicode Standard and a UTF (Unicode transformation format) (see http://unicode.org/faq/utf_bom.html). The Unicode Standard provides unique identifiers (code points) for characters. UTFs provide mappings of every code point "to a unique byte sequence." Since all but a handful of the characters I am using are in the first 128 code points, I'm going to switch to using UTF-8 as Romeo suggests. UTF-8 is supported by Matlab (The warning shown below won't need to be suppressed.) and Java, and for my application will generate smaller text files.

I suppress the Matlab warning

Warning: The encoding 'UTF-16LE' is not supported.

with

warning off MATLAB:iofun:UnsupportedEncoding;

解决方案

Try the following code (I am using UNICODE2NATIVE and NATIVE2UNICODE functions to do the conversions):

%# convert string and write as bytes
str = 'Some stuff.';
b = unicode2native(str,'UTF-16');
fid = fopen('utf16.txt','wb');
fwrite(fid, b, '*uint8');
fclose(fid);

We can even check the hex values of the bytes written (first two being the BOM):

>> cellstr(dec2hex(b))'
ans = 
  Columns 1 through 10
    'FF'    'FE'    '53'    '00'    '6F'    '00'    '6D'    '00'    '65'    '00'
  Columns 11 through 20
    '20'    '00'    '73'    '00'    '74'    '00'    '75'    '00'    '66'    '00'
  Columns 21 through 24
    '66'    '00'    '2E'    '00'

>> char(b)
ans =
ÿþS o m e   s t u f f . 

Now we can read the created file using MATLAB's own methods:

%# read bytes and convert back to Unicode string
fid = fopen('utf16.txt', 'rb');
b = fread(fid, '*uint8')';          %'
fclose(fid);
str = native2unicode(b,'UTF-16')

Or use Java methods directly if you prefer:

scanner = java.util.Scanner(java.io.FileInputStream('utf16.txt'), 'UTF-16');
str = scanner.nextLine()
scanner.close()

both should read the string correctly...

这篇关于如何让Matlab为UTF-16文本文件编写BOM(字节顺序标记)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆