Lua Gmatch奇数字符(斯洛伐克字母) [英] Lua gmatch odd characters (Slovak alphabet)
问题描述
我正在尝试从斯洛伐克的一个单词字符串中提取字符.例如,"TURTLE"的单词是KORYTNAČKA".但是,当我尝试从字符串中提取字符时,它将跳过Č"字符:
I am trying to extract the characters from a string of a word in Slovak. For example, the word for "TURTLE" is "KORYTNAČKA". However, it skips over the "Č" character when I try to extract it from the string:
local str = "KORYTNAČKA"
for c in str:gmatch("%a") do print(c) end
--result: K,O,R,Y,T,N,A,K,A
我正在阅读此页面,并且我也尝试过只是将字符串本身作为一个集合粘贴,但是它带有一些奇怪的东西:
I am reading this page and I have also tried just pasting in the string itself as a set, but it comes up with something weird:
local str = "KORYTNAČKA"
for c in str:gmatch("["..str.."]") do print(c) end
--result: K,O,R,Y,T,N,A,Ä,Œ,K,A
有人知道如何解决这个问题吗?
Anyone know how to solve this?
推荐答案
Lua是8位纯净的,这意味着Lua字符串假定每个字符都是一个字节.模式"%a"
匹配一个字节的字符,因此结果不是您期望的.
Lua is 8-bit clean, which means Lua strings assume every character is one byte. The pattern "%a"
matches one-byte character, so the result is not what you expected.
模式"["..str.."]"
之所以起作用,是因为Unicode字符可能包含多个字节,在此模式中,它使用一组字节来匹配字符.
The pattern "["..str.."]"
works because, a Unicode character may contain more than one byte, in this pattern, it uses these bytes in a set, so that it could match the character.
如果使用UTF-8,则可以使用模式"[\0-\x7F\xC2-\xF4][\x80-\xBF]*"
匹配Lua 5.2中的单个UTF-8字节序列,如下所示:
If UTF-8 is used, you can use the pattern "[\0-\x7F\xC2-\xF4][\x80-\xBF]*"
to match a single UTF-8 byte sequence in Lua 5.2, like this:
local str = "KORYTNAČKA"
for c in str:gmatch("[\0-\x7F\xC2-\xF4][\x80-\xBF]*") do
print(c)
end
在Lua 5.1(Corona SDK使用的版本)中,使用以下命令:
In Lua 5.1(which is the version Corona SDK is using), use this:
local str = "KORYTNAČKA"
for c in str:gmatch("[%z\1-\127\194-\244][\128-\191]*") do
print(c)
end
有关此模式的详细信息,请参见与"[\ 0- \ x7F \ xC2- \ xF4] [\ x80- \ xBF] *"在Lua 5.1中.
For details about this pattern, see Equivalent pattern to "[\0-\x7F\xC2-\xF4][\x80-\xBF]*" in Lua 5.1.
这篇关于Lua Gmatch奇数字符(斯洛伐克字母)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!