如何使用正则表达式验证中文输入? [英] How to use regular expression to validate Chinese input?

查看:54
本文介绍了如何使用正则表达式验证中文输入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题是我需要在客户端验证中将这种中文输入视为无效:

The thing is I need to treat this kind of Chinese input as invalid in client side validation:

当任何英文字符与中文字符和空格混合且总长度> = 10时,输入无效.

Input is invalid when any English character mixed with any Chinese character and spaces has a total length >=10.

让我们说:你的你的a你的你"或你的你的你的你"(长度为10)无效.但是你的a你的a你的a"(长度为9)是可以的.

Let's say : "你的a你的a你的a你" or "你的 你的 你的 你" (length is 10) is invalid. But "你的a你的a你的a" (length is 9) is OK.

我同时使用Javascript进行客户端验证和使用Java进行服务器端验证.因此,我认为对两者应用正则表达式应该是完美的.

I am using both Javascript to do client side validation and Java to do the server side. So I suppose applying the regular expression on both should be perfect.

任何人都可以给出一些提示以正则表达式编写规则的方法吗?

Can anyone give some hints how to write the rules in regular expression?

推荐答案

来自 Unicode中汉字的完整范围是多少?,CJK unicode范围是:

From What's the complete range for Chinese characters in Unicode?, the CJK unicode ranges are:

Block                                   Range       Comment
--------------------------------------- ----------- ----------------------------------------------------
CJK Unified Ideographs                  4E00-9FFF   Common
CJK Unified Ideographs Extension A      3400-4DBF   Rare
CJK Unified Ideographs Extension B      20000-2A6DF Rare, historic
CJK Unified Ideographs Extension C      2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D      2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E      2B820–2CEAF Rare, historic
CJK Compatibility Ideographs            F900-FAFF   Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants
CJK Symbols and Punctuation             3000-303F

您可能想允许Unicode块 CJK统一表意文字 CJK统一表意扩展A 中的代码点.

You probably want to allow code points from the Unicode blocks CJK Unified Ideographs and CJK Unified Ideographs Extension A.

此正则表达式将匹配这2个CJK块中的0到9个空格,表意空格(U + 3000),A-Z字母或代码点.

This regex will match 0 to 9 spaces, ideographic spaces (U+3000), A-Z letters, or code points in those 2 CJK blocks.

/^[ A-Za-z\u3000\u3400-\u4DBF\u4E00-\u9FFF]{0,9}$/

表意文字列在:

但是,您也可以添加更多块.

However, you may as well add more blocks.

function has10OrLessCJK(text) {
    return /^[ A-Za-z\u3000\u3400-\u4DBF\u4E00-\u9FFF]{0,9}$/.test(text);
}

function checkValidation(value) {
    var valid = document.getElementById("valid");
    if (has10OrLessCJK(value)) {
        valid.innerText = "Valid";
    } else {
        valid.innerText = "Invalid";
    }
}

<input type="text" 
       style="width:100%"
       oninput="checkValidation(this.value)"
       value="你的a你的a你的a">

<div id="valid">
    Valid
</div>

这篇关于如何使用正则表达式验证中文输入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆