正则表达式匹配逗号之间的文本 [英] Regex to match text between commas

查看:51
本文介绍了正则表达式匹配逗号之间的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在疯狂尝试使用正则表达式来检测用户输入中的关键字垃圾邮件.通常开头是一些普通文本,结尾是关键字垃圾邮件,用逗号或其他字符分隔.

I'm going nuts trying to get a regex to detect spam of keywords in the user inputs. Usually there is some normal text at the start and the keyword spam at the end, separated by commas or other chars.

我需要的是一个正则表达式来计算关键字的数量来标记文本以供人们检查.

What I need is a regex to count the number of keywords to flag the text for a human to check it.

文字通常是这样的:

[random text, with commas, dots and all]

keyword1, keyword2, keyword3, keyword4, keyword5,
Keyword6, keyword7, keyword8...

我尝试了几个正则表达式来计算匹配:

I've tried several regex to count the matches:

-这只会获取两个关键字中的一个

-This only gets one out of two keywords

[,-](\w|\s)+[,-]

-这也匹配随机文本

(?:([^,-]*)(?:[^,-]|$))

谁能告诉我一个正则表达式来做到这一点?或者我应该采取不同的方法?

Can anyone tell me a regex to do this? Or should I take a different approach?

谢谢!

推荐答案

我觉得难点在于随机文本也可以包含逗号.

I think the difficulty is that the random text can also contain commas.

如果关键字都在一行,并且是整个文本的最后一行,则修剪整个文本并从末尾删除新行字符.然后将文本从最后一个换行符到末尾.这应该是包含关键字的字符串.一旦你把这部分挑出来,你可以用逗号分解字符串并计算部分.

If the keywords are all on one line and it is the last line of the text as a whole, trim the whole text removing new line characters from the end. Then take the text from the last new line character to the end. This should be your string containing the keywords. Once you have this part singled out, you can explode the string on comma and count the parts.

<?php
$string = " some gibberish, some more gibberish, and random text

keyword1, keyword2, keyword3

";

$lastEOL = strrpos(trim($string), PHP_EOL);
$keywordLine = substr($string, $lastEOL);
$keywords = explode(',', $keywordLine);

echo "Number of keywords: " . count($keywords);

我知道它不是正则表达式,但我希望它仍然有帮助.

I know it is not a regex, but I hope it helps nevertheless.

找到解决方案的唯一方法是找到将随机文本与关键字中不存在的关键字分开的东西.如果关键字中有新行,则不能使用它.但是2连续的新行吗?或任何其他字符.

The only way to find a solution, is to find something that separates the random text and the keywords that is not present in the keywords. If a new line is present in the keywords, you can not use it. But are 2 consecutive new lines? Or any other characters.

$string = " some gibberish, some more gibberish, and random text

keyword1, keyword2, keyword3,
keyword4, keyword5, keyword6,
keyword7, keyword8, keyword9

";

$lastEOL = strrpos(trim($string), PHP_EOL . PHP_EOL); // 2 end of lines after random text
$keywordLine = substr($string, $lastEOL);
$keywords = explode(',', $keywordLine);

echo "Number of keywords: " . count($keywords);

(添加了更多新行的示例 - 远景)

(edit: added example for more new lines - long shot)

这篇关于正则表达式匹配逗号之间的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆