Lua Gmatch奇数字符(斯洛伐克字母) [英] Lua gmatch odd characters (Slovak alphabet)

查看：166 发布时间：2020/5/4 6:54:21 string unicode lua coronasdk lua-patterns

本文介绍了Lua Gmatch奇数字符(斯洛伐克字母)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从斯洛伐克的一个单词字符串中提取字符.例如，"TURTLE"的单词是KORYTNAČKA".但是，当我尝试从字符串中提取字符时，它将跳过Č"字符:

I am trying to extract the characters from a string of a word in Slovak. For example, the word for "TURTLE" is "KORYTNAČKA". However, it skips over the "Č" character when I try to extract it from the string:

local str = "KORYTNAČKA"
for c in str:gmatch("%a") do print(c) end
--result: K,O,R,Y,T,N,A,K,A

我正在阅读此页面，并且我也尝试过只是将字符串本身作为一个集合粘贴，但是它带有一些奇怪的东西:

I am reading this page and I have also tried just pasting in the string itself as a set, but it comes up with something weird:

local str = "KORYTNAČKA"
for c in str:gmatch("["..str.."]") do print(c) end
--result: K,O,R,Y,T,N,A,Ä,Œ,K,A

有人知道如何解决这个问题吗?

Anyone know how to solve this?

推荐答案

Lua是8位纯净的，这意味着Lua字符串假定每个字符都是一个字节.模式"%a"匹配一个字节的字符，因此结果不是您期望的.

Lua is 8-bit clean, which means Lua strings assume every character is one byte. The pattern "%a" matches one-byte character, so the result is not what you expected.

模式"["..str.."]"之所以起作用，是因为Unicode字符可能包含多个字节，在此模式中，它使用一组字节来匹配字符.

The pattern "["..str.."]" works because, a Unicode character may contain more than one byte, in this pattern, it uses these bytes in a set, so that it could match the character.

如果使用UTF-8，则可以使用模式"[\0-\x7F\xC2-\xF4][\x80-\xBF]*"匹配Lua 5.2中的单个UTF-8字节序列，如下所示:

If UTF-8 is used, you can use the pattern "[\0-\x7F\xC2-\xF4][\x80-\xBF]*" to match a single UTF-8 byte sequence in Lua 5.2, like this:

local str = "KORYTNAČKA"
for c in str:gmatch("[\0-\x7F\xC2-\xF4][\x80-\xBF]*") do 
    print(c) 
end

在Lua 5.1(Corona SDK使用的版本)中，使用以下命令:

In Lua 5.1(which is the version Corona SDK is using), use this:

local str = "KORYTNAČKA"
for c in str:gmatch("[%z\1-\127\194-\244][\128-\191]*") do 
    print(c) 
end

有关此模式的详细信息，请参见与"[\ 0- \ x7F \ xC2- \ xF4] [\ x80- \ xBF] *"在Lua 5.1中.

For details about this pattern, see Equivalent pattern to "[\0-\x7F\xC2-\xF4][\x80-\xBF]*" in Lua 5.1.

这篇关于Lua Gmatch奇数字符(斯洛伐克字母)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Lua Gmatch奇数字符(斯洛伐克字母) [英] Lua gmatch odd characters (Slovak alphabet)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Lua Gmatch奇数字符(斯洛伐克字母) [英] Lua gmatch odd characters (Slovak alphabet)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭