匹配并替换字符串中的图释-最有效的方法是什么? [英] Match and replace emoticons in string - what is the most efficient way?

查看:60
本文介绍了匹配并替换字符串中的图释-最有效的方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Wikipedia 定义了许多人们可以使用的表情符号.我想将此列表与字符串中的单词匹配.我现在有这个:

Wikipedia defines a lot of possible emoticons people can use. I want to match this list to words in a string. I now have this:

$string = "Lorem ipsum :-) dolor :-| samet";
$emoticons = array(
  '[HAPPY]' => array(' :-) ', ' :) ', ' :o) '), //etc...
  '[SAD]'   => array(' :-( ', ' :( ', ' :-| ')
);
foreach ($emoticons as $emotion => $icons) {
  $string = str_replace($icons, " $emotion ", $string);
}
echo $string;

输出:

Lorem ipsum [HAPPY] dolor [SAD] samet

所以原则上这是可行的.但是,我有两个问题:

so in principle this works. However, I have two questions:

  1. 如您所见,我在数组中的每个图释周围放置了空格,例如':-)'而不是':-)',这在我看来使数组的可读性降低.有没有一种方法可以存储不带空格的表情符号,但仍然可以与$ string匹配,并在其周围带有空格? (和现在的代码一样有效吗?)

  1. As you can see, I'm putting spaces around each emoticon in the array, such as ' :-) ' instead of ':-)' This makes the array less readable in my opinion. Is there a way to store emoticons without the spaces, but still match against $string with spaces around them? (and as efficiently as the code is now?)

或者也许有一种方法可以将表情符号放在一个变量中,然后在空间上爆炸以检查$ string?像

Or is there perhaps a way to put the emoticons in one variable, and explode on space to check against $string? Something like

$ emoticons = array( '[HAPPY]'=>>:] :-) :):o):]:3:c):> =] 8)=):}:^)", '[SAD]'=>:'-(:'(:'-):')"//等...

$emoticons = array( '[HAPPY]' => ">:] :-) :) :o) :] :3 :c) :> =] 8) =) :} :^)", '[SAD]' => ":'-( :'( :'-) :')" //etc...

str_replace是执行此操作的最有效方法吗?

Is str_replace the most efficient way of doing this?

我问是因为我需要检查数百万个字符串,所以我正在寻找节省处理时间的最有效方法:)

I'm asking because I need to check millions of strings, so I'm looking for the most efficient way to save processing time :)

推荐答案

如果您要替换表情符号的$ string由您网站的访问者提供(我的意思是用户输入的内容,例如评论或类似内容),那么您不应该在图释之前或之后传达一个空格.另外,至少还有几个表情符号,它们非常相似但又不同,例如:-)和:-)). 因此,我认为,如果这样定义表情符号数组,将会获得更好的结果:

If the $string, in which you want replace emoticons, is provided by a visitor of your site(I mean it's a user's input like comment or something), then you should not relay that there will be a space before or after the emoticon. Also there are at least couple of emoticons, that are very similar but different, like :-) and :-)). So I think that you will achieve better result if you define your emoticon's array like this:

$emoticons = array(
    ':-)' => '[HAPPY]',
    ':)' => '[HAPPY]',
    ':o)' => '[HAPPY]',
    ':-(' => '[SAD]',
    ':(' => '[SAD]',
    ...
)

并且当您填充所有查找/替换定义时,应以某种方式对该数组重新排序,以确保没有机会将:-))替换为:-).我相信,如果您按长度对数组值进行排序就足够了.这是为了防止您要使用str_replace(). strtr()会自动按长度进行排序!

And when you fill all find/replace definitions, you should reorder this array in a way, that there will be no chance to replace :-)) with :-). I believe if you sort array values by length will be enough. This is in case your are going to use str_replace(). strtr() will do this sort by length automatically!

如果您担心性能,可以检查 strtr vs str_replace ,但我建议您您自己的测试(关于$ string的长度和查找/替换定义,您可能会得到不同的结果.)

If you are concerned about performance, you can check strtr vs str_replace, but I will suggest to make your own testing (you may get different result regarding your $string length and find/replace definitions).

最简单的方法是,如果您的查找定义"不包含尾随空格:

The easiest way will be if your "find definitions" doesn't contain trailing spaces:

$string = strtr( $string, $emoticons );
$emoticons = str_replace( '][', '', trim( join( array_unique( $emoticons ) ), '[]' ) );
$string = preg_replace( '/\s*\[(' . join( '|', $emoticons ) . ')\]\s*/', '[$1]', $string ); // striping white spaces around word-styled emoticons

这篇关于匹配并替换字符串中的图释-最有效的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆