匹配并替换字符串中的图释-最有效的方法是什么? [英] Match and replace emoticons in string - what is the most efficient way?
问题描述
Wikipedia 定义了许多人们可以使用的表情符号.我想将此列表与字符串中的单词匹配.我现在有这个:
Wikipedia defines a lot of possible emoticons people can use. I want to match this list to words in a string. I now have this:
$string = "Lorem ipsum :-) dolor :-| samet";
$emoticons = array(
'[HAPPY]' => array(' :-) ', ' :) ', ' :o) '), //etc...
'[SAD]' => array(' :-( ', ' :( ', ' :-| ')
);
foreach ($emoticons as $emotion => $icons) {
$string = str_replace($icons, " $emotion ", $string);
}
echo $string;
输出:
Lorem ipsum [HAPPY] dolor [SAD] samet
所以原则上这是可行的.但是,我有两个问题:
so in principle this works. However, I have two questions:
-
如您所见,我在数组中的每个图释周围放置了空格,例如':-)'而不是':-)',这在我看来使数组的可读性降低.有没有一种方法可以存储不带空格的表情符号,但仍然可以与$ string匹配,并在其周围带有空格? (和现在的代码一样有效吗?)
As you can see, I'm putting spaces around each emoticon in the array, such as ' :-) ' instead of ':-)' This makes the array less readable in my opinion. Is there a way to store emoticons without the spaces, but still match against $string with spaces around them? (and as efficiently as the code is now?)
或者也许有一种方法可以将表情符号放在一个变量中,然后在空间上爆炸以检查$ string?像
Or is there perhaps a way to put the emoticons in one variable, and explode on space to check against $string? Something like
$ emoticons = array( '[HAPPY]'=>>:] :-) :):o):]:3:c):> =] 8)=):}:^)", '[SAD]'=>:'-(:'(:'-):')"//等...
$emoticons = array( '[HAPPY]' => ">:] :-) :) :o) :] :3 :c) :> =] 8) =) :} :^)", '[SAD]' => ":'-( :'( :'-) :')" //etc...
str_replace是执行此操作的最有效方法吗?
Is str_replace the most efficient way of doing this?
我问是因为我需要检查数百万个字符串,所以我正在寻找节省处理时间的最有效方法:)
I'm asking because I need to check millions of strings, so I'm looking for the most efficient way to save processing time :)
推荐答案
如果您要替换表情符号的$ string由您网站的访问者提供(我的意思是用户输入的内容,例如评论或类似内容),那么您不应该在图释之前或之后传达一个空格.另外,至少还有几个表情符号,它们非常相似但又不同,例如:-)和:-)). 因此,我认为,如果这样定义表情符号数组,将会获得更好的结果:
If the $string, in which you want replace emoticons, is provided by a visitor of your site(I mean it's a user's input like comment or something), then you should not relay that there will be a space before or after the emoticon. Also there are at least couple of emoticons, that are very similar but different, like :-) and :-)). So I think that you will achieve better result if you define your emoticon's array like this:
$emoticons = array(
':-)' => '[HAPPY]',
':)' => '[HAPPY]',
':o)' => '[HAPPY]',
':-(' => '[SAD]',
':(' => '[SAD]',
...
)
并且当您填充所有查找/替换定义时,应以某种方式对该数组重新排序,以确保没有机会将:-))替换为:-).我相信,如果您按长度对数组值进行排序就足够了.这是为了防止您要使用str_replace(). strtr()会自动按长度进行排序!
And when you fill all find/replace definitions, you should reorder this array in a way, that there will be no chance to replace :-)) with :-). I believe if you sort array values by length will be enough. This is in case your are going to use str_replace(). strtr() will do this sort by length automatically!
如果您担心性能,可以检查 strtr vs str_replace ,但我建议您您自己的测试(关于$ string的长度和查找/替换定义,您可能会得到不同的结果.)
If you are concerned about performance, you can check strtr vs str_replace, but I will suggest to make your own testing (you may get different result regarding your $string length and find/replace definitions).
最简单的方法是,如果您的查找定义"不包含尾随空格:
The easiest way will be if your "find definitions" doesn't contain trailing spaces:
$string = strtr( $string, $emoticons );
$emoticons = str_replace( '][', '', trim( join( array_unique( $emoticons ) ), '[]' ) );
$string = preg_replace( '/\s*\[(' . join( '|', $emoticons ) . ')\]\s*/', '[$1]', $string ); // striping white spaces around word-styled emoticons
这篇关于匹配并替换字符串中的图释-最有效的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!