高尔夫代码:“颜色突出显示"重复文字 [英] Code golf: "Color highlighting" of repeated text

查看:90
本文介绍了高尔夫代码:“颜色突出显示"重复文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(感谢下面的greg0ire帮助您提供关键概念)

(Thanks to greg0ire below for helping with key concepts)

挑战: 构建一个程序,以查找所有子字符串,并使用颜色属性对其进行标记"(有效地以XML突出显示它们).

The challenge: Build a program that finds all substrings and "tags" them with color attributes (effectively highlighting them in XML).

规则:

  1. 仅应对长度为2或更大的子字符串执行此操作.
  2. 子字符串只是连续字符的字符串,其中可能包含非字母字符.请注意,空格和其他标点符号不会分隔子字符串.
  3. 不能忽略字符大小写.
  4. 突出显示"应该通过在XML中标记子字符串来完成.您的标记应采用<TAG#>theSubstring</TAG#>的形式,其中#是该子字符串和相同子字符串唯一的正数.
  5. 该算法的优先级是找到最长的子字符串,而不是找到文本中匹配子字符串的次数.
  1. This should only be done for substrings of length 2 or more.
  2. Substrings are just strings of consecutive characters, which may include non-alphabetic characters. Note that spaces and other punctuation do not delimit substrings.
  3. Character casing cannot be ignored.
  4. The "highlight" should be done by tagging the substring in XML. Your tagging should be of the form <TAG#>theSubstring</TAG#> where # is a positive number unique to that substring and identical substrings.
  5. The priority of the algorithm is to find the longest substring, not how many times it matches within the text.

注意:下例中显示的标记顺序并不重要. OP只是为了清楚起见使用了它.

Note: The order of the tagging shown in the example below is not important. Its just used by the OP for clarity.

示例输入:

LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook.


部分正确的输出(在此示例中,OP可能没有完全替代)


A partially correct output (OP may NOT have completely replaced perfectly in this example)

<TAG1>LoremIpsum</TAG1>issimply<TAG2>dummytext</TAG2>of<TAG5>the</TAG5><TAG3>print</TAG3>ingand<TAG4>type</TAG4>setting<TAG6>industry</TAG6>.<TAG1>LoremIpsum</TAG1>hasbeen<TAG5>the</TAG5><TAG6>industry</TAG6>'sstandard<TAG2>dummytext</TAG2>eversince<TAG5>the</TAG5>1500s,whenanunknown<TAG3>print</TAG3>ertookagalleyof<TAG4>type</TAG4>andscrambledittomakea<TAG4>type</TAG4>specimenbook.


您的代码应能够处理一些极端情况,例如:


Your code should be able to handle edge cases, such as the following:

示例输入2:

hello!TAG!</hello.TAG.</

示例输出2:

<TAG1>hello</TAG1>!<TAG2>TAG</TAG2>!<TAG3></</TAG3><TAG1>hello</TAG1>.<TAG2>TAG</TAG2>.<TAG3></</TAG3>


获胜者:


The winner:

  • 多数优雅的解决方案胜出(由 其他评论,支持)
  • 奖金 解决方案的要点/注意事项 利用shell脚本
  • Most elegant solution wins (judged by others comments, upvotes)
  • Bonus points/consideration for solutions utilizing shell scripting

次要澄清:

  • 输入可以进行硬编码或从文件中读取
  • 标准仍然是优雅",虽然它有点模糊,但是它也封装了简单的字符/行数.其他人的评论和/或投票也表明SO社区如何看待挑战

推荐答案

Perl 206 189 188 199 ,157个字符

,不包括原始字符串和最后打印的内容.

Perl 206, 189, 188, 199, 157 chars

excluding original string and last print.

 #New algorithm that produces correct ouputs for many cases



    push@z,q/LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe1500s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook/;

    push@z,q/oktooktobookokm/;
    push@z,q!dino1</dino2</!;
    push@z,q!dino1TAG2dino3TAG!;

    ## loop for tests doesn't count
    for $z(@z) {
    print "input : $z\n";
    $i=0;@r=();
    #### begin count
    $c=127;$l=length($_=$z);while($l>1){if(/(.{$l}).*\1/){push@r,$1;++$c;s/$1/chr($c)/eg}else{$l--}}$z=$_;map{++$i;$x=chr(127+$i);$z=~s:$x:<TAG$i>$_</TAG$i>:g}@r
    #### end count 157 chars
    ## output doesn't count
    ;print "output : $z\n","="x80,"\n"
    }

__END__
perl tags2.pl
input : LoremIpsumissimplydummytextoftheprintingandtypesettingindustry.LoremIpsumhasbeentheindustry'sstandarddummytexteversincethe15
00s,whenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbook

output : <TAG1>LoremIpsum</TAG1>i<TAG11>ss</TAG11><TAG12>im</TAG12>ply<TAG2>dummytext</TAG2><TAG6>oft</TAG6><TAG13>he</TAG13><TAG4>p
rint</TAG4><TAG7>ing</TAG7><TAG8>and</TAG8><TAG5>types</TAG5>e<TAG14>tt</TAG14><TAG7>ing</TAG7><TAG3>industry</TAG3>.<TAG1>LoremIpsu
m</TAG1>hasbe<TAG15>en</TAG15><TAG9>the</TAG9><TAG3>industry</TAG3>'<TAG11>ss</TAG11>t<TAG8>and</TAG8>ard<TAG2>dummytext</TAG2>ev<TA
G16>er</TAG16>since<TAG9>the</TAG9>1500s,w<TAG13>he</TAG13>nanunknown<TAG4>print</TAG4><TAG16>er</TAG16>t<TAG10>ook</TAG10>agal<TAG1
7>le</TAG17>y<TAG6>oft</TAG6>y<TAG18>pe</TAG18><TAG8>and</TAG8>scramb<TAG17>le</TAG17>di<TAG14>tt</TAG14>omakea<TAG5>types</TAG5><TA
G18>pe</TAG18>c<TAG12>im</TAG12><TAG15>en</TAG15>b<TAG10>ook</TAG10>
================================================================================
input : oktooktobookokm
output : <TAG1>okto</TAG1><TAG1>okto</TAG1>bo<TAG2>ok</TAG2><TAG2>ok</TAG2>m
================================================================================
input : dino1</dino2</
output : <TAG1>dino</TAG1>1<TAG2></</TAG2><TAG1>dino</TAG1>2<TAG2></</TAG2>
================================================================================
input : dino1TAG2dino3TAG
output : <TAG1>dino</TAG1>1<TAG2>TAG</TAG2>2<TAG1>dino</TAG1>3<TAG2>TAG</TAG2>
================================================================================

这篇关于高尔夫代码:“颜色突出显示"重复文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆