如何检测字节数组中的字符串结尾到字符串转换? [英] How to detect end of string in byte array to string conversion?
问题描述
我从socket接收一个字节数组中的字符串,如下所示:
I receive from socket a string in a byte array which look like :
[128,5,6,3,45,0,0,0,0,0]
网络协议给出的大小是字符串的总长度(包括零)所以,在我的例子中10。
The size given by the network protocol is the total lenght of the string (including zeros) so , in my exemple 10.
如果我只是这样做:
String myString = new String(myBuffer);
我在字符串5的末尾没有正确的字符。转换似乎没有检测到字符串caracter(0)的结束。
I have at the end of the string 5 non correct caracter. The conversion don't seems to detect the end of string caracter (0).
要获得正确的大小和正确的字符串,我这样做:
To get the correct size and the correct string i do this :
int sizeLabelTmp = 0;
//Iterate over the 10 bit to get the real size of the string
for(int j = 0; j<(sizeLabel); j++) {
byte charac = datasRec[j];
if(charac == 0)
break;
sizeLabelTmp ++;
}
// Create a temp byte array to make a correct conversion
byte[] label = new byte[sizeLabelTmp];
for(int j = 0; j<(sizeLabelTmp); j++) {
label[j] = datasRec[j];
}
String myString = new String(label);
有没有更好的方法来解决这个问题?
Is there a better way to handle the problem ?
谢谢
推荐答案
0不是字符串结束字符。这只是一个字节。它是否只出现在字符串的末尾取决于您正在使用的编码(以及文本可以是什么)。例如,如果您使用UTF-16,则ASCII字符的每隔一个字节为0。
0 isn't an "end of string character". It's just a byte. Whether or not it only comes at the end of the string depends on what encoding you're using (and what the text can be). For example, if you used UTF-16, every other byte would be 0 for ASCII characters.
如果您确定确定 0表示字符串的结尾,您可以使用某些,就像您给出的代码一样,但我会将其重写为:
If you're sure that the first 0 indicates the end of the string, you can use something like the code you've given, but I'd rewrite it as:
int size = 0;
while (size < data.length)
{
if (data[size] == 0)
{
break;
}
size++;
}
// Specify the appropriate encoding as the last argument
String myString = new String(data, 0, size, "UTF-8");
我强烈建议您不要只使用平台默认编码 - 它不可移植,并且可能不允许所有Unicode字符。但是,您不能随意决定 - 您需要确保生成和使用此数据的所有内容都符合编码。
I strongly recommend that you don't just use the platform default encoding - it's not portable, and may well not allow for all Unicode characters. However, you can't just decide arbitrarily - you need to make sure that everything producing and consuming this data agrees on the encoding.
如果您控制了如果你可以在字符串之前引入一个长度前缀,那么它将更好地更多,以指示编码形式中有多少字节。通过这种方式,您可以准确读取正确数量的数据(没有过度读取),并且您可以判断数据是否因某种原因被截断。
If you're in control of the protocol, it would be much better if you could introduce a length prefix before the string, to indicate how many bytes are in the encoded form. That way you'd be able to read exactly the right amount of data (without "over-reading") and you'd be able to tell if the data was truncated for some reason.
这篇关于如何检测字节数组中的字符串结尾到字符串转换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!