如何检测字节数组中的字符串结尾到字符串转换? [英] How to detect end of string in byte array to string conversion?

查看:189
本文介绍了如何检测字节数组中的字符串结尾到字符串转换?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从socket接收一个字节数组中的字符串,如下所示:

I receive from socket a string in a byte array which look like :

[128,5,6,3,45,0,0,0,0,0]

网络协议给出的大小是字符串的总长度(包括零)所以,在我的例子中10。

The size given by the network protocol is the total lenght of the string (including zeros) so , in my exemple 10.

如果我只是这样做:

String myString = new String(myBuffer); 

我在字符串5的末尾没有正确的字符。转换似乎没有检测到字符串caracter(0)的结束。

I have at the end of the string 5 non correct caracter. The conversion don't seems to detect the end of string caracter (0).

要获得正确的大小和正确的字符串,我这样做:

To get the correct size and the correct string i do this :

int sizeLabelTmp = 0;
//Iterate over the 10 bit to get the real size of the string
for(int j = 0; j<(sizeLabel); j++) {
    byte charac = datasRec[j];
    if(charac == 0)
        break;
    sizeLabelTmp ++;
}
// Create a temp byte array to make a correct conversion
byte[] label    = new byte[sizeLabelTmp];
for(int j = 0; j<(sizeLabelTmp); j++) {
    label[j] = datasRec[j];
}
String myString = new String(label);

有没有更好的方法来解决这个问题?

Is there a better way to handle the problem ?

谢谢

推荐答案

0不是字符串结束字符。这只是一个字节。它是否只出现在字符串的末尾取决于您正在使用的编码(以及文本可以是什么)。例如,如果您使用UTF-16,则ASCII字符的每隔一个字节为0。

0 isn't an "end of string character". It's just a byte. Whether or not it only comes at the end of the string depends on what encoding you're using (and what the text can be). For example, if you used UTF-16, every other byte would be 0 for ASCII characters.

如果您确定确定 0表示字符串的结尾,您可以使用某些,就像您给出的代码一样,但我会将其重写为:

If you're sure that the first 0 indicates the end of the string, you can use something like the code you've given, but I'd rewrite it as:

int size = 0;
while (size < data.length)
{
    if (data[size] == 0)
    {
        break;
    }
    size++;
}

// Specify the appropriate encoding as the last argument
String myString = new String(data, 0, size, "UTF-8");

强烈建议您不要只使用平台默认编码 - 它不可移植,并且可能不允许所有Unicode字符。但是,您不能随意决定 - 您需要确保生成和使用此数据的所有内容都符合编码。

I strongly recommend that you don't just use the platform default encoding - it's not portable, and may well not allow for all Unicode characters. However, you can't just decide arbitrarily - you need to make sure that everything producing and consuming this data agrees on the encoding.

如果您控制了如果你可以在字符串之前引入一个长度前缀,那么它将更好地更多,以指示编码形式中有多少字节。通过这种方式,您可以准确读取正确数量的数据(没有过度读取),并且您可以判断数据是否因某种原因被截断。

If you're in control of the protocol, it would be much better if you could introduce a length prefix before the string, to indicate how many bytes are in the encoded form. That way you'd be able to read exactly the right amount of data (without "over-reading") and you'd be able to tell if the data was truncated for some reason.

这篇关于如何检测字节数组中的字符串结尾到字符串转换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆