正则表达式不会将空白字符与[\\\\\\\\\ [英] Regex won't match whitespace character with [\r\n\t\f\s]

查看:306
本文介绍了正则表达式不会将空白字符与[\\\\\\\\\的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能是一个非常简单的修复,但我无法解决这个问题!



我试图连续匹配(最多)3个大写单词给出以下文字。



Russell Lake West 。匹配应包括所有3个单词。



此正则表达式将匹配前2个单词但不匹配第3个单词(此处演示):



(([AZ] [az] +) \ {0,2}([AZ] [az] +)?\ {0,2}([AZ] [az] +)?)



此正则表达式匹配所有3个单词,但我必须复制/粘贴 Lake 和<之间的空格code> West 让它工作(演示) :



(([AZ] [a-z' - ] +)\ {0,2}([AZ] [a -z' - ] +)?\ {0,2}([AZ] [a-z' - ] +)?)

  ^粘贴在这里

所以我假设也许空格不被视为空格,但可能是换行符或类似字符,所以我试过这个(演示这里)



[ \ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ code> West ,因此没有返回结果。



为什么regex101或Java不能识别<$ c $之间的这个明显的空白c> Lake 和 West ?有什么可靠的方法来处理这个问题?

解决方案

有很多种空格。您在演示中使用的那个是非破坏(在Unicode表中索引为160),不属于 \\ \\ s (空格字符类),因为它不代表我们可以将文本拆分为单独部分(如行)的位置。

BTW \s 已经代表: \ r \ n \t \f



要匹配它,你可以使用 \p {Zs} class。

您还可以将 \s \p {Zs} [\\\\ {Zs} \\\\] 的课程。


This is likely a very simple fix but I can't figure it out!

I'm trying to match (up to) 3 capitalized words in a row given the following text.

Russell Lake West. The match should include all 3 words.

This regex will match the first 2 words but not the third (demo here):

(([A-Z][a-z]+)\s{0,2}([A-Z][a-z]+)?\s{0,2}([A-Z][a-z]+)?)

This regex will match all 3 words, but I had to copy/paste the whitespace between Lake and West for it to work (demo here):

(([A-Z][a-z'-]+)\s{0,2}([A-Z][a-z'-]+)? \s{0,2}([A-Z][a-z'-]+)?)

                                       ^ pasted it here

So I assumed that maybe the whitespace isn't being treated as whitespace, but perhaps a newline character or similar, so I tried this (demo here):

[\r\n\t\f\s]West

But it doesn't recognize any of those characters before West, thus returning no results.

Why can't regex101 or Java recognize this apparent whitespace between Lake and West? What's a reliable way to handle this?

解决方案

There are many kinds of spaces. The one you are using in your demo is non-breaking one (indexed as 160 in Unicode table) which doesn't belong to \s (whitespaces character class) as it doesn't represent place on which we can expect text to be split into separate parts like lines.
BTW \s already represents: \r \n \t \f.

To match it you can use \p{Zs} class.
You can also combine both \s and \p{Zs} classes with [\\p{Zs}\\s].

这篇关于正则表达式不会将空白字符与[\\\\\\\\\的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆