使用preg_match_all从字符串中提取单词 [英] Extract words from string with preg_match_all
本文介绍了使用preg_match_all从字符串中提取单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我对正则表达式不好,但是我想用它从字符串中提取单词.
I'm not good with regex but i want to use it to extract words from a string.
我需要的单词至少应包含4个字符,并且提供的字符串可以是utf8.
示例字符串:
外在的苏萨azahares presentan gruesospétalosblancosteñidosde rosa,con numerosos estambres(20-40).
Sus azahares presentan gruesos pétalos blancos teñidos de rosa o violáceo en la parte externa, con numerosos estambres (20-40).
所需的输出:
Array(
[0] => azahares
[1] => presentan
[2] => gruesos
[3] => pétalos
[4] => blancos
[5] => teñidos
[6] => rosa
[7] => violáceo
[8] => parte
[9] => externa
[10] => numerosos
[11] => estambres
)
推荐答案
如果要查找的单词是UTF-8(根据规范,至少长4个字符),且由ISO-8859-的字母字符组成,则此方法有效15(适用于西班牙语,也适用于英语,德语,法语等):
This works if the words to look for are UTF-8 (at least 4 chars long, as per specs), consisting of alphabetic characters of ISO-8859-15 (which is fine for Spanish, but also for English, German, French, etc.):
$n_words = preg_match_all('/([a-zA-Z]|\xC3[\x80-\x96\x98-\xB6\xB8-\xBF]|\xC5[\x92\x93\xA0\xA1\xB8\xBD\xBE]){4,}/', $str, $match_arr);
$word_arr = $match_arr[0];
这篇关于使用preg_match_all从字符串中提取单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文