合并两个正则表达式以截断字符串中的单词 [英] Merging two Regular Expressions to Truncate Words in Strings
问题描述
我正试图提出以下函数,将字符串截断为整个单词(如果可能,否则应截断为chars):
I'm trying to come up with the following function that truncates string to whole words (if possible, otherwise it should truncate to chars):
function Text_Truncate($string, $limit, $more = '...')
{
$string = trim(html_entity_decode($string, ENT_QUOTES, 'UTF-8'));
if (strlen(utf8_decode($string)) > $limit)
{
$string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)~su', '$1', $string);
if (strlen(utf8_decode($string)) > $limit)
{
$string = preg_replace('~^(.{' . intval($limit) . '}).*~su', '$1', $string);
}
$string .= $more;
}
return trim(htmlentities($string, ENT_QUOTES, 'UTF-8', true));
}
以下是一些测试:
// Iñtërnâtiônàlizætiøn and then the quick brown fox... (49 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn and then the quick brown fox jumped overly the lazy dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');
// Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_... (50 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');
它们都按原样工作,但是如果我放下第二个preg_replace()
,则会得到以下信息:
They both work as it is, however if I drop the second preg_replace()
I get the following:
Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog 有一天,那只懒狗把驼背 可怜的狐狸下来,直到她死了....
Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died....
我不能使用substr()
,因为它只能在字节级别上使用,而且我无法访问mb_substr()
ATM,我已经尝试过多次尝试将第二个正则表达式与第一个一起加入,但是没有成功.
I can't use substr()
because it only works on byte level and I don't have access to mb_substr()
ATM, I've made several attempts to join the second regex with the first one but without success.
请帮助S.M.S.,我已经为此苦苦挣扎了近一个小时.
Please help S.M.S., I've been struggling with this for almost an hour.
很抱歉,我已经醒了40个小时,无耻地错过了这一点:
$string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)?~su', '$1', $string);
不过,如果有人使用更优化的正则表达式(或忽略尾随空格的正则表达式),请分享:
Still, if someone has a more optimized regex (or one that ignores the trailing space) please share:
"Iñtërnâtiônàlizætiøn and then "
"Iñtërnâtiônàlizætiøn_and_then_"
我仍然无法摆脱尾随的空格,有人可以帮我吗?
好吧,我所做的任何编辑都没有真正起作用,我被RegexBuddy欺骗了-我可能应该把它放到另一天,现在就睡一觉.今天关闭.
推荐答案
在漫长的RegExp噩梦之后,也许我可以给你一个快乐的早晨:
Perhaps I can give you a happy morning after a long night of RegExp nightmares:
'~^(.{1,' . intval($limit) . '}(?<=\S)(?=\s)|.{'.intval($limit).'}).*~su'
将其归结为一个
^ # Start of String
( # begin capture group 1
.{1,x} # match 1 - x characters
(?<=\S)# lookbehind, match must end with non-whitespace
(?=\s) # lookahead, if the next char is whitespace, match
| # otherwise test this:
.{x} # got to x chars anyway.
) # end cap group
.* # match the rest of the string (since you were using replace)
您始终可以将|$
添加到(?=\s)
的末尾,但是由于您的代码已经在检查字符串长度是否大于$limit
,因此我认为这种情况不是必需的.
You could always add the |$
to the end of (?=\s)
but since your code was already checking that the string length was longer than the $limit
, I didn't feel that case would be neccesary.
这篇关于合并两个正则表达式以截断字符串中的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!