如何检测非西方字符? [英] How can I detect non-western characters?
问题描述
我想禁止某些UTF-8输入(服务器端),例如东方语言,其中示例输入可能是伊。
但是,我确实希望继续支持其他拉丁语或拉丁语字符,例如威尔士语和ŷ,因此无法检查latin-1 。 / p>
我的选择是什么? (如果特定语言,首选PHP)
非常感谢。
推理:很多非西方字符的浏览器支持经常丢失(例如在不同的浏览器上,我只是在上面的问题中看到一个框),所以对于显示名称这样的东西有时是适当的,即使它是不适合邮件正文
只需执行
preg_match('/ [^ \\p {Common} \\p {Latin}] / u',$ string)
其中 $ string
是一个UTF-8字符串。如果有非拉丁字符,则返回1,否则返回0。
示例:
var_dump(preg_match('/ [^ \\p {Common} \\p {Latin}] / u','sf..ŷaás??') ); // int(0)
var_dump(preg_match('/ [^ \\p {Common} \\p {Latin}] / u','sf..ŷݤaás??')); // int(1)
I want to disallow certain UTF-8 input (server-side), e.g. eastern languages, where example input might be " 伊 ".
However, I do want to continue supporting other latin or "latin-like" characters, such as the welsh ŵ and ŷ, so checking against latin-1 is not possible.
What are my options? (if language specific, PHP preferred)
Thanks very much.
Reasoning: browser support for a lot of non-western characters is often missing (e.g. on a different browser I just see a box in the question above), so for things like display names sometimes it's appropriate to restrict it even if it's not appropriate for message bodies
Just do
preg_match('/[^\\p{Common}\\p{Latin}]/u', $string)
where $string
is an UTF-8 string. This will return "1" if there are non-latin characters and will return "0" otherwise.
Example:
var_dump(preg_match('/[^\\p{Common}\\p{Latin}]/u', 'sf..ŷaás??')); //int(0)
var_dump(preg_match('/[^\\p{Common}\\p{Latin}]/u', 'sf..ŷݤaás??')); //int(1)
这篇关于如何检测非西方字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!