PHP正则表达式-删除所有非字母数字字符 [英] PHP Regular expression - Remove all non-alphanumeric characters

查看:301
本文介绍了PHP正则表达式-删除所有非字母数字字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用PHP.

我的字符串看起来像这样

This is a string-test width åäö and some über+strange characters: _like this?

问题

是否可以删除非字母数字字符并将其替换为空格?以下是一些非字母数字字符:

Is there a way to remove non-alphanumeric characters and replace them with a space? Here are some non-alphanumeric characters:

  • -
  • +
  • :
  • _
  • ?

我已经阅读了很多有关该主题的主题,但是它们不支持其他语言,例如:

I've read many threads about it but they don't support other languages, like this one:

preg_replace("/[^A-Za-z0-9 ]/", '', $string);

要求

  • 我的非字母字符列表可能不完整.
  • 我的内容包含åäöü等不同语言的字符.可能还有更多.
  • 非字母数字字符应替换为空格.否则,这个词就会彼此粘在一起.

推荐答案

您可以尝试以下操作:

preg_replace('~[^\p{L}\p{N}]++~u', ' ', $string);

\p{L}代表所有字母字符(无论字母是什么).

\p{L} stands for all alphabetic characters (whatever the alphabet).

\p{N}代表数字.

使用u修饰符可将主题字符串的字符视为Unicode字符.

With the u modifier characters of the subject string are treated as unicode characters.

或者这个:

preg_replace('~\P{Xan}++~u', ' ', $string);

\p{Xan}包含unicode字母和数字.

\p{Xan} contains unicode letters and digits.

\P{Xan}包含不是Unicode字母和数字的所有内容. (请注意,它也包含空白,您可以使用~[^\p{Xan}\s]++~u保留空白)

\P{Xan} contains all that is not unicode letters and digits. (Be careful, it contains white spaces too that you can preserve with ~[^\p{Xan}\s]++~u )

如果要使用一组更具体的允许字母,则必须将\p{L}替换为 unicode表中的范围.

If you want a more specific set of allowed letters you must replace \p{L} with ranges in unicode table.

示例:

preg_replace('~[^a-zÀ-ÖØ-öÿŸ\d]++~ui', ' ', $string);

为什么在这里使用所有格修饰符(++)?

~\P{Xan}+~u将为您提供与~\P{Xan}++~u相同的结果.此处的区别在于,在第一个引擎中,引擎记录每个回溯位置(我们不需要),而在第二个引擎中,引擎不记录每个回溯位置(例如在原子组中).结果是很小的性能利润.

~\P{Xan}+~u will give you the same result as ~\P{Xan}++~u. The difference here is that in the first the engine records each backtracking position (that we don't need) when in the second it doesn't (as in an atomic group). The result is a small performance profit.

我认为,在可能的情况下,最好使用所有格量词和原子组.

I think it's a good practice to use possessive quantifiers and atomic groups when it's possible.

但是,如果PCRE模块已使用选项PCRE_NO_AUTO_POSSESS编译,则PCRE正则表达式引擎在明显的情况下(例如:a+b => a++b)自动使量词具有所有格. ( http://www.pcre.org/pcre.txt )

However, the PCRE regex engine makes automatically a quantifier possessive in obvious situations (example: a+b => a++b) except If the PCRE module has been compiled with the option PCRE_NO_AUTO_POSSESS. (http://www.pcre.org/pcre.txt)

有关所有格量词和原子组的更多信息,此处(所有量词)此处

More informations about possessive quantifiers and atomic groups here (possessive quantifiers) and here (atomic groups) or here

这篇关于PHP正则表达式-删除所有非字母数字字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆