从UTF-16编码的文本文件读取,þÿ放在前面 [英] Reading from UTF-16 encoded text file, þÿ is prepended on the front

查看:217
本文介绍了从UTF-16编码的文本文件读取,þÿ放在前面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下方法将字节数组输出到文本文件:

I'm outputting a byte array to a text file using the following method:

try{
    FileOutputStream fos = new FileOutputStream(filePath+".8102");
    fos.write(concatenatedIVCipherMAC);
    fos.close();
    }catch(Exception e)
    {
        e.printStackTrace();
    }

将以UTF-16编码的数据输出到文件,例如:

which outputs to the file a UTF-16 encoded data, example:

¢¬6î)ªÈP〜m〜LïiƟêàÀe»/#Óö¹¥’þ²XhÃ&¼lG:Öé)GU3«´Dà {+ í—Ã]íò

¢¬6î)ªÈP~m˜LïiƟê•Àe»/#Ó ö¹¥‘þ²XhÃ&¼lG:Öé )GU3«´DÃ{+í—Ã]íò

但是,当我读回它时,会在数据的开头加上þÿ,例如:

However when I'm reading it back in I get þÿ prepended to the front of the data, e.g:

þÿ¢¬6î)ªÈP〜m〜LïiƟêàÀe»/?#Óö¹¥’þ²XhÃamp&¼lG:Öé)GU3«´Dà {+ í—Ã]íò

þÿ¢¬6î)ªÈP~m˜LïiƟê•Àe»/?#Ó ö¹¥‘þ²XhÃ&¼lG:Öé )GU3«´DÃ{+í—Ã]íò

这是我用来读取文件的方法:

This is the method I'm using to read in the file:

private String getFilesContents()
{
    String fileContents = "";
    Scanner sc = null;

    try {
        sc = new Scanner(file, "UTF-16");
        System.out.println("Can read file: "+file.canRead());
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }

    while(sc.hasNextLine()){
        fileContents += sc.nextLine();                     
    }
    sc.close();
    return fileContents;
 }

然后byte[] contentsOfFile = fileContents.getBytes("UTF-16");将String转换为字节数组.

and then byte[] contentsOfFile = fileContents.getBytes("UTF-16"); to convert the String into a byte array.

一个快速的Google告诉我þÿ代表字节顺序,但是Java是在其中放置字节还是Windows?如何避免在要读取的数据的开头加上þÿ?我当时只是想忽略前两个字节,但是如果是Windows,那么这显然会破坏其他平台上的程序.

A quick Google told me that þÿ represents the byte order but is it Java putting that there or Windows? How can I avoid having the þÿ prepended at the start of the data I'm reading in? I was thinking of just ignoring the first two bytes but if it is Windows then this will obviously break the program on other platforms.

更改为追加到前置.

推荐答案

文件为IV + data + MAC.这不是要可读的文字吗?我应该做些不同的事情吗?

The file is the IV+data+MAC. It's not meant to be readable text? Should be I be doing something differently?

是的.您不应该尝试将其视为文本任何地方.

Yes. You shouldn't be trying to treat it as text anywhere.

如果确实需要将任意二进制数据转换为文本,请使用Base64进行转换.除此之外,请坚持使用字节数组InputStreamOutputStream.

If you really need to convert arbitrary binary data into text, use Base64 to convert it. Other than that, stick to byte arrays, InputStream and OutputStream.

我不确切知道为什么您应该获得额外的字符,但是您还没有真正的文字开始这一事实表明,这确实不值得诊断.只需开始将二进制数据作为二进制数据处理即可.

I don't know exactly why you're supposedly getting extra characters, but the fact that you haven't got real text to start suggests that it's not really worth diagnosing that side. Just start handling binary data as binary data instead.

看看番石榴 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆