正则表达式匹配第一个和最后一个单词或任何单词 [英] regex match first and last word or any word

查看：211 发布时间：2017/11/4 21:49:09 python regex file file-io

本文介绍了正则表达式匹配第一个和最后一个单词或任何单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

  #fabulous 7.526 2301 2 
 #excellent 7.247 2612 3 
 #superb 7.199 1660 2 
 #perfection 7.099 3004 4 
 #terrific 6.922 629 1

我有一个包含这样的句子列表的文件：

 太棒了Theo Walcott还是拉屎，星期六看Rafa和Johnny和他交易。 
不是我是普惠制的粉丝，而是一个非常棒的b $ b伊朗将军说，以色列的铁顶不能用J Davlar 11号处理他们的导弹
。主要的对手是波兰队。

我想用正则表达式来检查以下内容：

如果每个句子中的第一个单词与文件中的任何单词相匹配，那么
例如，如果出现在文件中的是好的，那么它就是伊朗语

/ li>

如果句子中的最后一个单词与文件中的任何单词相匹配
例如，如果星期六，神话般的导弹，波兰出现在文件中或者不存在
li>

如果两个或三个字符在句子中的单个词语的前缀和后缀匹配2或3个字符前缀和后缀在文件
例如，如果Ter，它，Ira，wi匹配到文件中任何2或3个单词的前缀或不是。同样适用于后缀。

我对于正则表达式很陌生，我可以这样想，但没有得到结果：
term2.lower（）是文件中的第一列

wordanalysis [trail] =如果re.match [-1]，term2.lower（））else else（found） wordanalysis [lead] =如果re.match（sentence [0]，term2.lower（））else else（found） b $ b

解决方案
更新：通过@justhalf，不需要使用正则表达式来分割单词。删除 .lower（），如果您想要区分大小写的匹配项。

这将匹配第一个字和最后一个字数据列表中的单词（不包括任何标点符号或尾部空白符号）：
$ b $ p
（^ \ s？\ w + \b |（ \b\w +）[\。！！\s] * $）

匹配：
MATCH 1-1。棒极了 MATCH 2-1。星期六。 2.星期六比赛3-1。其比赛4-1。神话般的 2.神话般的比赛5-1。伊朗比赛6-1。导弹 2.导弹 MATCH 7-1。与比赛8-1。波兰。 2.波兰
执行：
import re，string sentences = open（sentences.txt）。read（）。splitlines（） data = open（data.txt）。read（） pattern = re.compile（r（^ \ s？\ w + \b |（\ b\w +） $ \\ b $ b $ $（$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b打印Found+ first +in data.txt if（re.search（last，data，re.I））： printFound+ last +in data.txt
这可能不是最优雅的做法，但是你明白了。 / p>

代码已经过测试并且可以正常工作，输出结果是：

$ p $ data.txt 在data.txt中找到了惊人的结果
这不能达到你的第三个标准，测试一下，看看它是否工作到目前为止。

I have a huge file with a list of data such as this:
#fabulous 7.526 2301 2 #excellent 7.247 2612 3 #superb 7.199 1660 2 #perfection 7.099 3004 4 #terrific 6.922 629 1
I have a file containing a list of sentences like this:
Terrific Theo Walcott is still shit, watch Rafa and Johnny deal with him on Saturday. its not that I'm a GSP fan, fabulous Iranian general says Israel's Iron Dome can't deal with their missiles with J Davlar 11th. Main rivals are team Poland.
I want to check with regex the following:

if first word in every sentence matches any words in the file Example, if Terrific, its, Iranian, with occur in the file or not

if last word in sentence matches any words in the file Example, if saturday, fabulous, missiles , Poland occur in the file or not

if the 2 or 3 characters prefix and suffix of individual words in sentences matches 2 or 3 characters prefix and suffix in file example if Ter, its, Ira, wi matches to any 2 or 3 prefix of words in the file or not. Same applies to suffix.

I am so new to regex that I could think of this way but not getting the result: term2.lower() is the first column in the file
wordanalysis["trail"] = found if re.match(sentence[-1],term2.lower()) else not(found) wordanalysis["lead"] = found if re.match(sentence[0],term2.lower()) else not(found)

解决方案
Update: Per awesome suggestion by @justhalf, no need to use regex for splitting the words. Remove the .lower() if you want a case sensitive match.

This will match the first word and last words (excluding any punctuation or trailing whitespace) in your list of data:

(^\s?\w+\b|(\b\w+)[\.?!\s]*$)

Matches:
MATCH 1-1. Terrific MATCH 2-1. Saturday. 2. Saturday MATCH 3-1. its MATCH 4-1. fabulous 2. fabulous MATCH 5-1. Iranian MATCH 6-1. missiles 2. missiles MATCH 7-1. with MATCH 8-1. Poland. 2. Poland
Implementation:
import re, string sentences = open("sentences.txt").read().splitlines() data = open("data.txt").read() pattern = re.compile(r"(^\s?\w+\b|(\b\w+)[\.?!\s]*$)") for line in sentences: words = line.strip().split() first = words[0].lower() last = words[-1].translate(None, string.punctuation).lower() if (re.search(first, data, re.I)): print "Found " + first + " in data.txt" if (re.search(last, data, re.I)): print "Found " + last + " in data.txt"
This probably isn't the most elegant way of doing it, but you get the idea.

Code is tested and works, output is:
Found Terrific in data.txt Found fabulous in data.txt
Also this doesn't accomplish your 3rd criteria, test this out and see if it's working so far for you.

这篇关于正则表达式匹配第一个和最后一个单词或任何单词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式匹配第一个和最后一个单词或任何单词 [英] regex match first and last word or any word

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

正则表达式匹配第一个和最后一个单词或任何单词 [英] regex match first and last word or any word

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭