Lua Gmatch奇数字符(斯洛伐克字母) [英] Lua gmatch odd characters (Slovak alphabet)

查看:166
本文介绍了Lua Gmatch奇数字符(斯洛伐克字母)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从斯洛伐克的一个单词字符串中提取字符.例如,"TURTLE"的单词是KORYTNAČKA".但是,当我尝试从字符串中提取字符时,它将跳过Č"字符:

I am trying to extract the characters from a string of a word in Slovak. For example, the word for "TURTLE" is "KORYTNAČKA". However, it skips over the "Č" character when I try to extract it from the string:

local str = "KORYTNAČKA"
for c in str:gmatch("%a") do print(c) end
--result: K,O,R,Y,T,N,A,K,A

我正在阅读此页面,并且我也尝试过只是将字符串本身作为一个集合粘贴,但是它带有一些奇怪的东西:

I am reading this page and I have also tried just pasting in the string itself as a set, but it comes up with something weird:

local str = "KORYTNAČKA"
for c in str:gmatch("["..str.."]") do print(c) end
--result: K,O,R,Y,T,N,A,Ä,Œ,K,A

有人知道如何解决这个问题吗?

Anyone know how to solve this?

推荐答案

Lua是8位纯净的,这意味着Lua字符串假定每个字符都是一个字节.模式"%a"匹配一个字节的字符,因此结果不是您期望的.

Lua is 8-bit clean, which means Lua strings assume every character is one byte. The pattern "%a" matches one-byte character, so the result is not what you expected.

模式"["..str.."]"之所以起作用,是因为Unicode字符可能包含多个字节,在此模式中,它使用一组字节来匹配字符.

The pattern "["..str.."]" works because, a Unicode character may contain more than one byte, in this pattern, it uses these bytes in a set, so that it could match the character.

如果使用UTF-8,则可以使用模式"[\0-\x7F\xC2-\xF4][\x80-\xBF]*"匹配Lua 5.2中的单个UTF-8字节序列,如下所示:

If UTF-8 is used, you can use the pattern "[\0-\x7F\xC2-\xF4][\x80-\xBF]*" to match a single UTF-8 byte sequence in Lua 5.2, like this:

local str = "KORYTNAČKA"
for c in str:gmatch("[\0-\x7F\xC2-\xF4][\x80-\xBF]*") do 
    print(c) 
end

在Lua 5.1(Corona SDK使用的版本)中,使用以下命令:

In Lua 5.1(which is the version Corona SDK is using), use this:

local str = "KORYTNAČKA"
for c in str:gmatch("[%z\1-\127\194-\244][\128-\191]*") do 
    print(c) 
end

有关此模式的详细信息,请参见与"[\ 0- \ x7F \ xC2- \ xF4] [\ x80- \ xBF] *"在Lua 5.1中.

For details about this pattern, see Equivalent pattern to "[\0-\x7F\xC2-\xF4][\x80-\xBF]*" in Lua 5.1.

这篇关于Lua Gmatch奇数字符(斯洛伐克字母)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆