使用PHP正则表达式匹配字符串中的任何Unicode空格字符 [英] Matching any Unicode whitespace characters in a string with PHP regex
问题描述
我想在每个空格处将文本消息拆分为数组.直到我收到此短信,一切都很好. 以下是处理文本字符串的几行代码:
I want to split text message into array at every Space. It's been working just fine until I received this text message. Here is the few code lines that process the text string:
$str = 'T bw4 05/09/19 07:51 am BW6N 499.803';
$cleanStr = iconv("UTF-8", "ISO-8859-1", $str);
$strArr = preg_split('/[\s\t]/', $cleanStr);
var_dump($strArr);
Var_dump产生以下结果:
Var_dump yields this result:
array:6 [▼
0 => "T"
1 => b"bw4 05/09/19"
2 => "07:51"
3 => "am"
4 => "BW6N"
5 => "499.803"
]
数组"1 => b" bw4 05/09/19"中的#1项不正确,我无法弄清楚数组值前面的字母"b"是什么. 另外,"bw4"和"05/09/19"之间的空格 任何有关如何更好地实现字符串拆分的建议都将受到赞赏. 这是原始字符串: https://3v4l.org/2L35M ,这是我的搜索结果的图像本地主机: http://prntscr.com/jjbvny
The #1 item in the array "1 => b"bw4 05/09/19"" in not correct, I am not able figure out what is the letter "b" in front of the array value. Also, the space(es) between "bw4" and "05/09/19" Any suggestion on how better achieve the string splitting are greatly appreciated. Here is the original string: https://3v4l.org/2L35M and here is the image of result from my localhost: http://prntscr.com/jjbvny
推荐答案
要匹配您可以使用的任何1个或多个Unicode空白字符,
To match any 1 or more Unicode whitespace chars you may use
'~\s+~u'
您的'/[\s\t]/'
模式仅匹配单个空格字符(\s
)或制表符(\t
)(这当然是多余的,因为\s
也已匹配制表符),但是由于u
缺少修饰符,\s
无法匹配您在bw4
之后的 \ u00A0 字符(硬空格).
Your '/[\s\t]/'
pattern only matches a single whitespace char (\s
) or a tab (\t
) (which is of course redundant as \s
already matches tabs, too), but since the u
modifier is missing, the \s
cannot match the \u00A0 chars (hard spaces) you have after bw4
.
所以,使用
$str = 'T bw4 05/09/19 07:51 am BW6N 499.803';
$strArr = preg_split('/\s+/u', $str);
print_r($strArr);
请参见 PHP演示产生
Array
(
[0] => T
[1] => bw4
[2] => 05/09/19
[3] => 07:51
[4] => am
[5] => BW6N
[6] => 499.803
)
这篇关于使用PHP正则表达式匹配字符串中的任何Unicode空格字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!