使用正则表达式在忽略单词词缀的同时匹配整个单词 [英] matching whole words while ignoring affixes of words using regex

查看:43
本文介绍了使用正则表达式在忽略单词词缀的同时匹配整个单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习一门新语言,并且我用 aprox 创建了一个数据库.2500 个单词和 2500 个单词示例.我创建了一个 PHP/MySQL Web UI,基本上显示每个单词的图片,当您单击它们时,它将播放单词的音频.还有一个上下文菜单可以触发一个弹出 div 匹配并显示出现该词的所有示例.

I am learning a new language and I have created a DB with aprox. 2500 words and 2500 examples of the words. I created a PHP/MySQL web UI with basically shows pictures for each word and when you click them it will play the audio of the word. There is also a context menu to trigger a pop up div that matches and displays all examples where the word occurs.

我一直在使用 REGEXP '[[:<:]]$word[[:>:]]' 但我想过滤掉几个单词的前缀/后缀不会给这个词增加任何真正的意义(比如英语中的后缀 -ing).我解决这个问题的一种方法是在词缀开始的单词中放置一个连字符,这样正则表达式仍然与单词匹配,但这并不完全符合语言处理拼写的方式.也有不同的单词组合我不想过滤,因为意思完全不同.这里没有详细说明,这里有一些伪示例,其中匹配的单词只是WORD",前缀和后缀我想过滤为 pre1pre2... 和 suf1, suf2... 以及我不想过滤为 xxx

I have been using REGEXP '[[:<:]]$word[[:>:]]' but there are several prefixes/suffixes of words that I want to filter out that do not add any real meaning to the word (like the suffix -ing in English). One way I have gotten around this is putting a hyphen in the word where the affix starts so the regex still matches the word but this isn't completely true to how the language handles the spelling. There are also different combinations of words that I do not want to filter because the meaning is completely different. Without getting into specifics here are some pseudo examples with the matched word as just "WORD" and prefixes and suffixes that I want to filter as pre1, pre2... and suf1, suf2... and the stuff I do not want to filter as xxx

1. Xxx xxx WORDsuf1 xxx xxx xxx.
2. Xxx xxx WORDsuf2 xxx xxx xxx.
3. Xxx xxx pre1WORDsuf1 xxx xxx xxx.
4. Xxx xxx WORD xxx xxx xxx.
5. Xxx xxx pre1WORD xxx xxx xxx.
6. Xxx xxx pre2WORDxxx xxx xxx xxx.
7. Xxx xxx xxxWORDxxx xxx xxx xxx.
8. Xxx xxx pre1WORDxxxsuf1 xxx xxx xxx.
9. Xxx xxx pre1xxxWORDsuf1 xxx xxx xxx.
10. Xxx xxx xxxWORDxxx xxx xxx xxx.

在上面的例子中,我想匹配 1, 2, 3, 4, 5 但我不想匹配 6, 7, 8, 9, 10.我开始只添加 OR 子句,例如:

in the examples above I want to match 1, 2, 3, 4, 5 but I do not want to match 6, 7, 8, 9, 10. I started to just add OR clauses for example:

REGEXP  '[[:<:]$word[[:>:]]|[[:<:]]$word$suffix[[:>:]]'

这对于一个异常很有效,但对于多个异常它会变得混乱.

This works fine for one exception but with multiple exceptions it gets messy.

不可否认,我对正则表达式非常缺乏经验,我设法解决的大部分问题都是我必须阅读的简单示例.这可以用一个简短有效的正则表达式来完成吗?

Admittedly I'm pretty inexperienced with regex and most of what I manage to work out are simple examples that I have to read up on. Can this be done with a short and efficient regex?

推荐答案

这是您要找的吗?

(\b(pre1|pre2)?WORD(suf1|suf2)?\b)

在线演示

如果您正在寻找整行作为匹配项,请尝试下面的正则表达式并从索引 1 处的匹配组中获取 if

If you are looking for whole line as a match then try below regex and get if from matched group at index 1

(.*(\b(pre1|pre2)?WORD(suf1|suf2)?\b).*)

在线演示

使用 preg_match_all 获取所有匹配的组.

Use preg_match_all to get all the matched groups.

这篇关于使用正则表达式在忽略单词词缀的同时匹配整个单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆