如何读取UTF-8字符串赋予其字符长度以纯C89? [英] How to read UTF-8 string given its length in characters in plain C89?

查看:111
本文介绍了如何读取UTF-8字符串赋予其字符长度以纯C89?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在写纯C89自定义跨平台的简约TCP服务器。 (不过我也将接受特定POSIX标准的答案。)

I'm writing a custom cross-platform minimalistic TCP server in plain C89. (But I will also accept POSIX-specific answer.)

服务器使用UTF-8字符串,但从来没有看起来里面它们。它把所有的字符串作为永恒不变的二进制斑点。

The server works with UTF-8 strings, but never looks inside them. It treats all strings as immutable binary blobs.

但现在我需要从一个不知道如何计算字节大小的客户端接受UTF-8字符串。客户端只能在传输字符字符串长度。的(更新:客户是在JavaScript中,和长度字符,其实,无论 string.length减()回报我假设它是实际的UTF -8字符,而不是别的东西。)

But now I need to accept UTF-8 strings from the client that does not know how to calculate their size in bytes. The client can only transmit string length in characters. (Update: The client is in JavaScript, and "length in characters" is, in fact, whatever String.length() returns. I assume it is actual UTF-8 characters, not something else.)

我不想重依赖添加到我的小服务器。有没有读这数据报一个强大的和简洁的方式? (对于这个问题的缘故,让我们说,这是从 FILE * 阅读。)

I do not want to add heavy dependencies to my tiny server. Is there a robust and neat way to read this datagram? (For the sake of this question, let's say that it is read from FILE *.)

U<CRLF>       ; data type marker (actually read by dispatching code)
<SIZE><CRLF>  ; UTF-8 string size in characters
<DATA><CRLF>  ; data blob

例如:

U
7
Юникод!

更新:

数据的一个批次可以包含多个数据包,所以大概读是行不通的,我需要阅读的字符确切数额。

One batch of data can contain more than one datagram, so approximate reads would not work, I need to read exact amount of characters.

和实际的UTF-8的数据可以包含任何字符,所以我不能选择一个字符作为终结 - 我不想惹的数据逃脱它

And the actual UTF-8 data may contain any characters, so I can't pick a character as a terminator — I don't want mess with escaping it in the data.

推荐答案

这看起来正是我需要的东西。祝我发现它前面:

This looks like exactly the thing I'd need. Wish I found it earlier:

http://bjoern.hoehrmann.de/utf-8 /德codeR / DFA /

这篇关于如何读取UTF-8字符串赋予其字符长度以纯C89?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆