PHP 在搜索短语周围获取 10 个单词 [英] PHP get 10 words around a search phrase
问题描述
我正在尝试执行以下操作:
I am trying to do the following :
从文本块中抓取搜索短语之前的 5 个单词(如果只有 Y 个单词,则为 Y)和搜索短语之后的 5 个单词(如果那里只有 Y 个单词,则为 Y)(当我说单词时)意思是文本块中的任何单词或数字)
grab 5 words before the search phrase (or Y if there is only Y words there) and 5 words after the search phrase (or Y if there is only Y words there) from a block of text (when I say words I mean words or numbers whatever is in the block of text)
例如
文本块:欢迎使用 Stack Overflow!访问您的用户页面以设置您的姓名和电子邮件."
The block of text: "Welcome to Stack Overflow! Visit your user page to set your name and email."
如果您要搜索访问您的",它将返回:欢迎使用 Stack Overflow!访问您的用户页面以设置您的"
if you was to search "visit your" it would return: "Welcome to Stack Overflow! Visit your user page to set your"
我试过用这个
$preg_safe = str_replace(" ", "\s", preg_quote($search));
$pattern = "/(\w*\S\s+){0,8}\S*\b($preg_safe)\b\S*(\s\S+){0,8}/ix";
if(preg_match_all($pattern, $full_text, $matches))
{
$result = str_replace(strtolower($search), "<span class='searched-for'>$search</span>", strtolower($matches[0][0]));
}
else
{
$result = false;
}
如果搜索短语是英语,它就可以工作,但我也需要它在其他语言中工作.例如,它不适用于希伯来语搜索短语.
And it works if the search phrase is in English, but I need it to work in other languages as well. It doesn't work for an Hebrew search phrase for example.
我尝试将模式更改为:
$pattern = "(*UTF8)/(\w*\S\s+){0,8}\S*\b($preg_safe)\b\S*(\s\S+){0,8}/i";
但是没有用.
如何使其适用于其他语言?
How can I make it work for other languages?
//////////////////编辑//////////
////////////////// EDIT //////////
正如 enrico.bacis 所建议的 - 我已将模式更改为:
As enrico.bacis suggested - I've changed the pattern to :
$pattern = "/(\w\p{Hebrew}*\S\s+){0,20}\S*\b($preg_safe)\b\S*(\s\S+){0,20}/ixu";
现在它适用于英语和希伯来语搜索短语,但当有特殊字符(例如')时,结果文本会被剪切.
Now it works for English and Hebrew search phrases, but the result text is being cut when there is a special character (' for example).
如何让模式返回搜索短语周围的文本,即使它包含特殊字符?
How can I make the pattern return the text around the search phrase even if it contains special characters?
推荐答案
你的问题是 \w
不匹配希伯来字符,实际上 \w
是只是一个所谓的单词"字符的快捷方式:[A-Za-z0-9_]
.
Your problem is on the \w
that is not matching Hebrew characters, in fact \w
is just a shortcut for a so-called "word" character: [A-Za-z0-9_]
.
要使正则表达式也能够捕获希伯来语字符,您只需进行两项更改:
To make a regex able to capture also Hebrew characters you need only to make two changes:
添加
u
到修饰符来管理UTF8字符(这样你的修饰符就是/ixu
)
Add
u
to the modifier to manage UTF8 characters (so your modifier will be/ixu
)
将 [\w\p{Hebrew}]
替换为模式中每次出现的 \w
.
Replace [\w\p{Hebrew}]
for every occurrence of \w
in your pattern.
您还可以在此处查看有关此主题的更多答案.
You can also check here for more answers on this topic.
这篇关于PHP 在搜索短语周围获取 10 个单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!