UTF-8字符串定界符 [英] UTF-8 string delimiter

查看:108
本文介绍了UTF-8字符串定界符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析一个二进制协议,该协议的UTF-8字符串散布在原始字节中.该特定协议在每个UTF-8字符串的前面加了一个简短的(两个字节),指示接下来的UTF-8字符串的长度.这样就可以使最大字符串长度2 ^ 16> 65 000,这对于特定的应用来说绰绰有余.

I am parsing a binary protocol which has UTF-8 strings interspersed among raw bytes. This particular protocol prefaces each UTF-8 string with a short (two bytes) indicating the length of the following UTF-8 string. This gives a maximum string length 2^16 > 65 000 which is more than adequate for the particular application.

我的问题是,这是分隔UTF-8字符串的标准方法吗?

My question is, is this a standard way of delimiting UTF-8 strings?

推荐答案

我不会称其为定界符,更像是长度前缀".有人称它们为帕斯卡字符串,因为早期的Pascal语言是一种将字符串以这种方式存储在内存中的流行方法.

I wouldn't call that delimiting, more like "length prefixing". Some people call them Pascal strings since in the early days the language Pascal was one of the popular ones that stored strings that way in memory.

我认为没有专门针对此的正式标准,因为这是存储UTF-8字符串(或与此有关的任何字节字符串)的一种非常明显的方式.但是,它被反复定义为许多处理包含字符串的消息的标准的一部分.

I don't think there's a formal standard specifically for just that, as it's a rather obvious way of storing UTF-8 strings (or any strings of bytes for that matter). It's defined over and over as a part of many standards that deal with messages that contain strings, though.

这篇关于UTF-8字符串定界符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆