如何将句子分为单词和标点符号? [英] How can I split a sentence into words and punctuation marks?
本文介绍了如何将句子分为单词和标点符号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
例如,我想分割这句话:
For example, I want to split this sentence:
I am a sentence.
分成5个部分组成的数组; I
,am
,a
,sentence
和.
.
Into an array with 5 parts; I
, am
, a
, sentence
, and .
.
尝试使用explode
后,我目前正在使用preg_split
,但似乎找不到合适的东西.
I'm currently using preg_split
after trying explode
, but I can't seem to find something suitable.
这是我尝试过的:
$sentence = explode(" ", $sentence);
/*
returns array(4) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence."
}
*/
还有这个:
$sentence = preg_split("/[.?!\s]/", $sentence);
/*
returns array(5) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence"
[4]=>
string(0) ""
}
*/
这怎么办?
推荐答案
您可以在单词边界处拆分:
You can split on word boundaries:
$sentence = preg_split("/(?<=\w)\b\s*/", 'I am a sentence.');
几乎进行正则表达式扫描,直到找到单词字符为止,然后,正则表达式必须捕获单词边界和一些可选的空格.
Pretty much the regex scans until a word character is found, then after it, the regex must capture a word boundary and some optional space.
输出:
array(5) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence"
[4]=>
string(1) "."
}
这篇关于如何将句子分为单词和标点符号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文