如何截断PHP中的字符串到最接近特定字符数的单词? [英] How to Truncate a string in PHP to the word closest to a certain number of characters?

查看:121
本文介绍了如何截断PHP中的字符串到最接近特定字符数的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用PHP编写的代码片段,它从数据库中提取一段文本并将其发送到网页上的小部件。原始文本块可以是冗长的文章或简短的一两句;但对于这个小部件,我不能显示超过200个字符。我可以使用substr()在200个字符处截断文本,但结果会在文字中间切断 - 我真正想要的是在最后一个字< 通过使用 wordwrap 功能。它将文本分成多行,这样最大宽度就是您指定的宽度,在单词边界处打破。



$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $' ),\\\
));

这篇oneliner无法处理的一件事情是,文本本身比期望的宽度短。要处理这种边缘情况,应该这样做:

  if(strlen($ string)> $ your_desired_width)
{
$ string = wordwrap($ string,$ your_desired_width);
$ string = substr($ string,0,strpos($ string,\\\
));







上述解决方案具有如果在实际切点之前包含换行符,则会过早切割文本。这里有一个解决这个问题的版本:

 函数tokenTruncate($ string,$ your_desired_width){
$ parts = preg_split ('/([\ s\\\
\r] +)/',$ string,null,PREG_SPLIT_DELIM_CAPTURE);
$ parts_count = count($ parts);

$ length = 0;
$ last_part = 0; ($; $ last_part< $ parts_count; ++ $ last_part){
$ length + = strlen($ parts [$ last_part]);
;
if($ length> $ your_desired_width){break; }
}

return implode(array_slice($ parts,0,$ last_part));
}

另外,下面是用于测试实现的PHPUnit测试类:

  class TokenTruncateTest extends PHPUnit_Framework_TestCase {
public function testBasic(){
$ this-> assertEquals(1 3 5 7 9,
tokenTruncate(1 3 5 7 9 11 14,10));


public function testEmptyString(){
$ this-> assertEquals(,
tokenTruncate(,10));

$ b $ public function testShortString(){
$ this-> assertEquals(1 3,
tokenTruncate(1 3,10));


public function testStringTooLong(){
$ this-> assertEquals(,
tokenTruncate(toooooooooooooooooooo,10));


public function testContainingNewline(){
$ this-> assertEquals(1 3 \ n5 7 9,
tokenTruncate(1 3 \ \\ n5 7 9 11 14,10));




编辑: h1>

不处理特殊的UTF8字符,如'à'。在REGEX末尾添加'u'来处理它:



$ parts = preg_split('/([\ s\\\
\r] +)/ u',$ string,null,PREG_SPLIT_DELIM_CAPTURE);


I have a code snippet written in PHP that pulls a block of text from a database and sends it out to a widget on a webpage. The original block of text can be a lengthy article or a short sentence or two; but for this widget I can't display more than, say, 200 characters. I could use substr() to chop off the text at 200 chars, but the result would be cutting off in the middle of words-- what I really want is to chop the text at the end of the last word before 200 chars.

解决方案

By using the wordwrap function. It splits the texts in multiple lines such that the maximum width is the one you specified, breaking at word boundaries. After splitting, you simply take the first line:

substr($string, 0, strpos(wordwrap($string, $your_desired_width), "\n"));

One thing this oneliner doesn't handle is the case when the text itself is shorter than the desired width. To handle this edge-case, one should do something like:

if (strlen($string) > $your_desired_width) 
{
    $string = wordwrap($string, $your_desired_width);
    $string = substr($string, 0, strpos($string, "\n"));
}


The above solution has the problem of prematurely cutting the text if it contains a newline before the actual cutpoint. Here a version which solves this problem:

function tokenTruncate($string, $your_desired_width) {
  $parts = preg_split('/([\s\n\r]+)/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
  $parts_count = count($parts);

  $length = 0;
  $last_part = 0;
  for (; $last_part < $parts_count; ++$last_part) {
    $length += strlen($parts[$last_part]);
    if ($length > $your_desired_width) { break; }
  }

  return implode(array_slice($parts, 0, $last_part));
}

Also, here is the PHPUnit testclass used to test the implementation:

class TokenTruncateTest extends PHPUnit_Framework_TestCase {
  public function testBasic() {
    $this->assertEquals("1 3 5 7 9 ",
      tokenTruncate("1 3 5 7 9 11 14", 10));
  }

  public function testEmptyString() {
    $this->assertEquals("",
      tokenTruncate("", 10));
  }

  public function testShortString() {
    $this->assertEquals("1 3",
      tokenTruncate("1 3", 10));
  }

  public function testStringTooLong() {
    $this->assertEquals("",
      tokenTruncate("toooooooooooolooooong", 10));
  }

  public function testContainingNewline() {
    $this->assertEquals("1 3\n5 7 9 ",
      tokenTruncate("1 3\n5 7 9 11 14", 10));
  }
}

EDIT :

Special UTF8 characters like 'à' are not handled. Add 'u' at the end of the REGEX to handle it:

$parts = preg_split('/([\s\n\r]+)/u', $string, null, PREG_SPLIT_DELIM_CAPTURE);

这篇关于如何截断PHP中的字符串到最接近特定字符数的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆