使用正则表达式删除除中文字符以外的所有字符? [英] Remove all except the chinese characters with regex?

查看：1516 发布时间：2020/7/3 1:51:00 php regex

本文介绍了使用正则表达式删除除中文字符以外的所有字符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个用中文写的句子字符串.

I have a string that is a sentence, written in chinese.

其中包含汉字以及其他填充内容，例如空格，逗号，感叹号等，均以UTF8编码.

This contains chinese characters, and other filler things, like spaces, comma, exclamation marks and etc., all encoded in UTF8.

使用带有latin1字符串的正则表达式，我可以使用preg_replace和[a-zA-Z]对其进行清洁并除去填充物.

Using regex with a latin1 string, I could use preg_replace and [a-zA-Z] to clean it and remove the filler.

在删除所有填充项时，如何在中文字符串中仅保留中文字母"字符?

How can I keep only the chinese "alphabet" characters in the chinese string while removing all the filler items?

推荐答案

根据本文档，以下是汉字的unicode范围:

According to this document, here are the unicode ranges of chinese characters:

表12-2.包含汉字表意文字的积木

Table 12-2. Blocks Containing Han Ideographs

Block                                Range         Comment
CJK Unified Ideographs               4E00–9FFF     Common
CJK Unified Ideographs Extension A   3400–4DBF     Rare
CJK Unified Ideographs Extension B   20000–2A6DF   Rare, historic
CJK Unified Ideographs Extension C   2A700–2B73F   Rare, historic
CJK Unified Ideographs Extension D   2B740–2B81F   Uncommon, some in current use
CJK Compatibility Ideographs         F900–FAFF     Duplicates, unifiable variants, corporate
characters
CJK Compatibility Ideographs Supplement 2F800–2FA1F Unifiable variants

您可以这样使用它:

preg_replace('/[^\u4E00-\u9FFF]+/u', '', $string);

或

preg_replace('/\P{Han}+/u', '', $string);

其中\P是\p

有关所有unicode scripts

这篇关于使用正则表达式删除除中文字符以外的所有字符?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用正则表达式删除除中文字符以外的所有字符? [英] Remove all except the chinese characters with regex?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

使用正则表达式删除除中文字符以外的所有字符? [英] Remove all except the chinese characters with regex?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭