Lua 支持 Unicode 吗? [英] Does Lua support Unicode?

查看:31
本文介绍了Lua 支持 Unicode 吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于下面的链接,我对 Lua 编程语言是否支持 Unicode 感到困惑.

Based on the link below, I'm confused as to whether the Lua programming language supports Unicode.

http://lua-users.org/wiki/LuaUnicode

看起来确实如此,但有局限性.我只是不明白,限制是什么大/关键还是没什么大不了的?

It appears it does but has limitations. I simply don't understand, are the limitation anything big/key or not a big deal?

推荐答案

您当然可以存储 lua 中的 unicode 字符串,如 utf8.您可以像使用任何字符串一样使用它们.

You can certainly store unicode strings in lua, as utf8. You can use these as you would any string.

然而,Lua 并没有为此类字符串上的更高级别的unicode 感知"操作提供任何默认支持——例如,以字符为单位计算字符串长度,将小写字母转换为大写字母等.这种缺失对于你真的取决于你打算用这些字符串做什么.

However Lua doesn't provide any default support for higher-level "unicode aware" operations on such strings—e.g., counting string length in characters, converting lower-to-upper-case, etc. Whether this lack is meaningful for you really depends on what you intend to do with these strings.

可能的方法,取决于您的用途:

Possible approaches, depending on your use:

  1. 如果您只想输入/输出/存储字符串,并且通常将它们用作整体单元"(用于表索引等),您可能根本不需要任何特殊处理.在这种情况下,您只需将这些字符串视为二进制 blob.

  1. If you just want to input/output/store strings, and generally use them as "whole units" (for table indexing etc), you may not need any special handling at all. In this case, you just treat these strings as binary blobs.

由于 utf8 的巧妙设计,可以对包含 utf8 的字符串进行某些类型的字符串操作,并且无需特别注意即可产生正确的结果.

Due to utf8's clever design, some types of string manipulation can be done on strings containing utf8 and will yield the correct result without taking any special care.

例如,您可以附加字符串,在ascii 字符之前/之后将它们分开等.例如,如果您有一个字符串"开発.txt" 并且您搜索."在那个字符串中使用string.find(string_var, "."),然后使用普通的string.sub函数将其拆分为"开発"".txt",即使您没有使用任何类型的unicode-aware"算法,这些结果字符串也将是正确的 utf8 字符串.

For instance, you can append strings, split them apart before/after ascii characters, etc. As an example, if you have a string "開発.txt" and you search for "." in that string using string.find (string_var, "."), and then split it using the normal string.sub function into "開発" and ".txt", those result strings will be correct utf8 strings even though you're not using any kind of "unicode-aware" algorithm.

同样,您可以仅对字符串中的 ASCII 字符(高位为零的字符)进行大小写转换,并将其余字符串视为二进制而不会将它们搞砸.

Similarly, you can do case-conversions on only the ASCII characters in strings (those with the high bit zero), and treat the rest of the strings as binary without screwing them up.

一些支持 utf8 的操作非常简单,只需编写自己的函数即可轻松完成.

Some utf8-aware operations are so simple that it's easy to just write one's own functions to do them.

例如,计算一个字符串的unicode-characters长度,只需计算高位为0的字符数(ASCII字符),以及高位为0的字符数11(非ASCII字符的前导字节");长度是这两者的总和.

For instance, to calculate the length in unicode-characters of a string, just count the number of characters with the high bit zero (ASCII characters), and the number of characters with the top two bits 11 ("leading bytes" for non-ASCII characters); the length is the sum of those two.

对于更复杂的操作—例如,对非 ASCII 字符的大小写转换等—您可能必须使用 Lua unicode 库,例如(前面提到的)Lua-users Unicode 页面

For more complex operations—e.g., case-conversion on non-ASCII characters, etc.—you'll probably have to use a Lua unicode library, such as those on the (previously mentioned) Lua-users Unicode page

这篇关于Lua 支持 Unicode 吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆