立即查找小写字母后跟大写字母 [英] Find Lowercase immediately followed by uppercase

查看：139 发布时间：2018/5/28 19:45:00 regex text grep textwrangler

本文介绍了立即查找小写字母后跟大写字母的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的文字如下：

 < font size = + 2 color =＃F07500>< b> [BA] LT; /字体>< / B个
< ul>< li>< font color =＃0B610B>字词词语词< br>< / font>< / li>< / ul> 
< ul>< li>< font color =＃F07500>字词词< br>< / font>< / li>< / ul> 
< ul>< li>< font color =＃0B610B> < br>< / font>< / li>< / ul> 
< ul>< li>< font color =＃0B610B> WordWord< BR>< /字体>< /立GT;< / UL> 
< br>< font color =＃E41B17>< b>大写字母< / b>< / font> 
< ul>< li>< font color =＃0B610B> < br>< / font>< br>< font color =＃E41B17>< b> PhD和dataBase< / b>< / font> < /立GT;< / UL> 
< font color =＃0B610B> < br>< / font>< / li>< / ul>< dd>< font color =＃F07500> »»词词词词。< br>< / font>

每一个< ; font color =＃0B610B> ... 。例如：

 < font color =＃0B610B>词词wordWord词。< br>< / font>

我想通过将它们拆分如下来纠正这个错误（即：添加一个冒号和一个空格他们）：

 < font color =＃0B610B>单词单词：单词单词。< br>< / font>

到目前为止，我一直在使用：

< pre $ （] *>）（。*？） ...的所有实例 / code>，并且它可以很好地通过 ... 的一个实例找到一个实例

但是当我使用时：

 （< font color =＃ 0B610B \ b [^>]>）（。*？[az]）（[AZ]。*？< / font>）

它会找到但选择 ... 之间的所有内容一行，不管其他字体颜色标签，并替换其他不需要的实例。

我希望它找到并替换每个特定标签对中的错误： ... ，不会抓取以 并以。结尾

 
 
 是否有正则表达式来解决这个问题？非常感谢。
解决方案
一般来说，正则表达式并不是解析HTML的好主意（如果它是一次性的可能会好的）。 
 
 
 我认为这可能是你的正则表达式不工作的原因。 
您能举出一个你的正则表达式失败的例子吗？ 
 
 
 在一个案例中，我可以想到如果在一个范围内没有匹配（ [az] [AZ] ））匹配< font color =＃0B610B>< / font> 对，但是在邻居< / em>字体>< /字体> 。例如： 
 
 
 < font color =＃0B610B>单词单词< / font>< font color =＃000000>单词wordWord< /字体> 
  
在这种情况下， only 有效匹配是< font color =＃0B610B> word word< / font>< font color =＃000000> word word 和字符串的其余部分< / font> / code>，所以这就是正则表达式匹配的地方（因为如果它匹配的话就会！）
 
 
我可以想到一个简单的解决方法，但我不会不推荐它，除非这个任务是一次性的，因为使用HTML的正则表达式总是容易出现这样的错误！这个正则表达式也相当低效。尝试（未经测试）：
 
 
 （< font color =＃0B610B\b [^>]>）（ （[^ ）<（？！/ font））*？[az]）（[AZ]。*？< / font>）
  $ p> 
 
 它说：寻找< font color = xxxx> 标签，后跟一个尖括号<   not 后跟 / font ，或其他任何东西，再后面跟着 [AZ] [AZ] 。 
因此它试图确保匹配不会超过< / font> 边界。
 
My text is as below:
<font size=+2 color=#F07500><b> [ba]</font></b>
<ul><li><font color =#0B610B> Word word wordWord word.<br></font></li></ul>
<ul><li><font color =#F07500> Word word word.<br></font></li></ul>
<ul><li><font color =#0B610B> Word word word wordWord.<br></font></li></ul>
<ul><li><font color =#0B610B> WordWord.<br></font></li></ul>
<br><font color =#E41B17><b>UPPERCASE LETTERS</b></font> 
<ul><li><font color =#0B610B> Word word wordWord word.<br></font><br><font color =#E41B17><b>PhD and dataBase</b></font> </li></ul>
<font color =#0B610B> Word word word.<br></font></li></ul><dd><font color =#F07500>     »» Word wordWord word.<br></font>
There is a lowercase letter immediately followed by an uppercase in each of the <font color =#0B610B>...</font>. For example:
<font color =#0B610B> Word word wordWord word.<br></font>
I want to correct this error by splitting them as follows (i.e: adding a colon and a space between them):
<font color =#0B610B> Word word word: Word word.<br></font>
So far, I have been using:
(<font color =#0B610B\b[^>]*>)(.*?</font>)
to select each of the instances of <font color =#0B610B>...</font>, and it works fine in finding one instance by one instance of <font color =#0B610B>...</font>.

But when I use: 
(<font color =#0B610B\b[^>]*>)(.*?[a-z])([A-Z].*?</font>)
it does find but selects everything between <font color =#0B610B>...</font>in one line regardless of other font-color tags, and replaces other unwanted instances.

I want it to find and replace error in each of this specific pair of tags: <font color =#0B610B>...</font>, not grabbing everything starting by <font color =#0B610B> and ending in </font>

Are there any regular expressions to solve this problem? Many thanks in advance.
 解决方案 
In general, regex is not a good idea for parsing HTML (if it's a once-off you might be OK). 

I think this might be the reason your regex is not working.
Can you give an example of a case in which your regex fails? 

One case I can think of if is there is no match ([a-z][A-Z]) within a matching <font color=#0B610B></font> pair, but there is in a neighbouring <font></font>. For example:
<font color=#0B610B>word word</font><font color=#000000>word wordWord</font>
In this case, the only valid match is <font color=#0B610B>word word</font><font color=#000000>word word and the rest of the string Word</font>, and so this is what the regex matches (since if it can match it will!)

I can think of a crude workaround but I wouldn't recommend it unless this task is a once-off because using regex for HTML is always prone to such errors!. This regex is also pretty inefficient. Try (untested):
(<font color =#0B610B\b[^>]*>)(([^<]|<(?!/font))*?[a-z])([A-Z].*?</font>)
It says, "look for the <font colour=xxxx> tag, followed by either an angle bracket < not followed by /font, OR anything else, and again followed by the [a-z][A-Z]".
So it tries to make sure that the match doesn't go over a </font> boundary.

                        这篇关于立即查找小写字母后跟大写字母的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

立即查找小写字母后跟大写字母 [英] Find Lowercase immediately followed by uppercase

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

立即查找小写字母后跟大写字母 [英] Find Lowercase immediately followed by uppercase

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭