使用正则表达式自动换行 [英] Word Wrapping with Regular Expressions

查看:904
本文介绍了使用正则表达式自动换行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑CLARITY - 我知道有办法在多个步骤来做到这一点,或使用LINQ或香草C#字符串操作。我使用的是单一的正则表达式的电话,原因是因为我想练习复杂的正则表达式模式。 - 编辑完

EDIT FOR CLARITY - I know there are ways to do this in multiple steps, or using LINQ or vanilla C# string manipulation. The reason I am using a single regex call, is because I wanted practice with complex regex patterns. - END EDIT

我想写一个正则表达式将执行自动换行。这是非常接近所需的输出,但我不能完全得到它的工作。

I am trying to write a single regular expression that will perform word wrapping. It's extremely close to the desired output, but I can't quite get it to work.

Regex.Replace(text, @"(?<=^|\G)(.{1,20}(\s|$))", "$1\r\n", RegexOptions.Multiline)

这是正确的包装字太长线,但它添加一个换行符时已经是其中之一。

This is correctly wrapping words for lines that are too long, but it's adding a line break when there already is one.

输入

"This string is really long. There are a lot of words in it.\r\nHere's another line in the string that's also very long."



预计输出

Expected Output

"This string is \r\nreally long. There \r\nare a lot of words \r\nin it.\r\nHere's another line \r\nin the string that's \r\nalso very long."



实际输出

Actual Output

"This string is \r\nreally long. There \r\nare a lot of words \r\nin it.\r\n\r\nHere's another line \r\nin the string that's \r\nalso very long.\r\n"


$ b $字符串b

请注意句子,其中输入已经有了一个换行符和额外的\r\\\
被放在句末的双\r\\\

Note the double "\r\n" between sentences where the input already had a line break and the extra "\r\n" that was put at the end.

或许有有条件地应用不同的替换模式的方法吗?
I.E.如果比赛以\r\\\
结尾,使用替代模式$ 1,否则,使用替代模式$ 1\r\\\

Perhaps there's a way to conditionally apply different replacement patterns? I.E. If the match ends in "\r\n", use replace pattern "$1", otherwise, use replace pattern "$1\r\n".

下面是一个类似的问题一个链接,包装与我作为一个起点,没有空白的字符串。
正则表达式找到完整的文本,插入空间

Here's a link to a similar question for wrapping a string with no white space that I used as a starting point. Regular expression to find unbroken text and insert space

推荐答案

这是Perl的快速测试。

This was quick-tested in Perl.

修改 - 此正则表达式的代码模拟在MS-Windows上的 Notepad.exe的

Edit - This regex code simulates the word wrap used (good or bad) in MS-Windows Notepad.exe

 # MS-Windows  "Notepad.exe Word Wrap" simulation
 # ( N = 16 )
 # ============================
 # Find:     @"(?:((?>.{1,16}(?:(?<=[^\S\r\n])[^\S\r\n]?|(?=\r?\n)|$|[^\S\r\n]))|.{1,16})(?:\r?\n)?|(?:\r?\n|$))"
 # Replace:  @"$1\r\n"
 # Flags:    Global     

 # Note - Through trial and error discovery, it apparears Notepad accepts an extra whitespace
 # (possibly in the N+1 position) to help alignment. This matters not because thier viewport hides it.
 # There is no trimming of any whitespace, so the wrapped buffer could be reconstituted by inserting/detecting a
 # wrap point code which is different than a linebreak.
 # This regex works on un-wrapped source, but could probably be adjusted to produce/work on wrapped buffer text.
 # To reconstitute the source all that is needed is to remove the wrap code which is probably just an extra "\r".

 (?:
      # -- Words/Characters 
      (                       # (1 start)
           (?>                     # Atomic Group - Match words with valid breaks
                .{1,16}                 #  1-N characters
                                        #  Followed by one of 4 prioritized, non-linebreak whitespace
                (?:                     #  break types:
                     (?<= [^\S\r\n] )        # 1. - Behind a non-linebreak whitespace
                     [^\S\r\n]?              #      ( optionally accept an extra non-linebreak whitespace )
                  |  (?= \r? \n )            # 2. - Ahead a linebreak
                  |  $                       # 3. - EOS
                  |  [^\S\r\n]               # 4. - Accept an extra non-linebreak whitespace
                )
           )                       # End atomic group
        |  
           .{1,16}                 # No valid word breaks, just break on the N'th character
      )                       # (1 end)
      (?: \r? \n )?           # Optional linebreak after Words/Characters
   |  
      # -- Or, Linebreak
      (?: \r? \n | $ )        # Stand alone linebreak or at EOS
 )

测试用例的卷绕宽度N是16输出相匹配的记事本的,并在各种宽度。

Test Case The wrap width N is 16. Output matches Notepad's and over a variety of widths.

 $/ = undef;

 $string1 = <DATA>;

 $string1 =~ s/(?:((?>.{1,16}(?:(?<=[^\S\r\n])[^\S\r\n]?|(?=\r?\n)|$|[^\S\r\n]))|.{1,16})(?:\r?\n)?|(?:\r?\n|$))/$1\r\n/g;

 print $string1;

 __DATA__
 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
 bbbbbbbbbbbbbbbbEDIT FOR CLARITY - I                    know there are  ways to do this in   multiple steps, or using LINQ or vanilla C#
 string manipulation. 

 The reason I am using a single regex call, is because I wanted practice. with complex
 regex patterns. - END EDIT
 pppppppppppppppppppUf

输出>>

 hhhhhhhhhhhhhhhh
 hhhhhhhhhhhhhhh
 bbbbbbbbbbbbbbbb
 EDIT FOR CLARITY 
 - I              
       know there 
 are  ways to do 
 this in   
 multiple steps, 
 or using LINQ or 
 vanilla C#
 string 
 manipulation. 

 The reason I am 
 using a single 
 regex call, is 
 because I wanted 
 practice. with 
 complex
 regex patterns. 
 - END EDIT
 pppppppppppppppp
 pppUf

这篇关于使用正则表达式自动换行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆