找出一组字符串中最长的公共起始子串 [英] Find the longest common starting substring in a set of strings

查看:20
本文介绍了找出一组字符串中最长的公共起始子串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

针对相对微不足道的问题提出最优雅的 JavaScript、Ruby 或其他解决方案是一项挑战.

This is a challenge to come up with the most elegant JavaScript, Ruby or other solution to a relatively trivial problem.

这个问题是最长公共子串问题的一个更具体的案例.我只需要在数组中找到最长的公共 starting 子字符串.这大大简化了问题.

This problem is a more specific case of the Longest common substring problem. I need to only find the longest common starting substring in an array. This greatly simplifies the problem.

例如,[interspecies, interstelar, interstate] 中最长的子串是inters".但是,我不需要在 [specifics, terrific] 中找到ific".

For example, the longest substring in [interspecies, interstelar, interstate] is "inters". However, I don't need to find "ific" in [specifics, terrific].

我通过在 JavaScript 中快速编写解决方案作为我的关于类似 shell 的 tab-completion 的答案 (此处为测试页面).这是该解决方案,稍作调整:

I've solved the problem by quickly coding up a solution in JavaScript as a part of my answer about shell-like tab-completion (test page here). Here is that solution, slightly tweaked:

function common_substring(data) {
  var i, ch, memo, idx = 0
  do {
    memo = null
    for (i=0; i < data.length; i++) {
      ch = data[i].charAt(idx)
      if (!ch) break
      if (!memo) memo = ch
      else if (ch != memo) break
    }
  } while (i == data.length && idx < data.length && ++idx)

  return (data[0] || '').slice(0, idx)
}

代码在此 Gist 中可用以及在 Ruby 中的类似解决方案.您可以将 gist 克隆为 git repo 以进行试用:

This code is available in this Gist along with a similar solution in Ruby. You can clone the gist as a git repo to try it out:

$ git clone git://gist.github.com/257891.git substring-challenge

我对这些解决方案不太满意.我有一种感觉,它们可能会以更优雅和更少执行复杂性的方式解决 - 这就是我发布此挑战的原因.

I'm not very happy with those solutions. I have a feeling they might be solved with more elegance and less execution complexity—that's why I'm posting this challenge.

我将接受我认为最优雅或最简洁的解决方案作为答案.例如,这是我想出的一个疯狂的 Ruby hack——在 String 上定义 & 运算符:

I'm going to accept as an answer the solution I find the most elegant or concise. Here is for instance a crazy Ruby hack I come up with—defining the & operator on String:

# works with Ruby 1.8.7 and above
class String
  def &(other)
    difference = other.to_str.each_char.with_index.find { |ch, idx|
      self[idx].nil? or ch != self[idx].chr
    }
    difference ? self[0, difference.last] : self
  end
end

class Array
  def common_substring
    self.inject(nil) { |memo, str| memo.nil? ? str : memo & str }.to_s
  end
end

首选 JavaScript 或 Ruby 解决方案,但您可以展示其他语言的巧妙解决方案,只要您解释发生了什么.请仅使用标准库中的代码.

Solutions in JavaScript or Ruby are preferred, but you can show off clever solution in other languages as long as you explain what's going on. Only code from standard library please.

我选择了 JavaScript 排序解决方案 by kennebec 作为答案",因为它让我感到意外和天才.如果我们不考虑实际排序的复杂度(假设它被语言实现无限优化),解决方案的复杂度只是比较两个字符串.

I've chosen the JavaScript sorting solution by kennebec as the "answer" because it struck me as both unexpected and genius. If we disregard the complexity of actual sorting (let's imagine it's infinitely optimized by the language implementation), the complexity of the solution is just comparing two strings.

其他出色的解决方案:

  • "regex greed" by FM takes a minute or two to grasp, but then the elegance of it hits you. Yehuda Katz also made a regex solution, but it's more complex
  • commonprefix in Python — Roberto Bonvallet used a feature made for handling filesystem paths to solve this problem
  • Haskell one-liner is short as if it were compressed, and beautiful
  • the straightforward Ruby one-liner

感谢参与!从评论中可以看出,我学到了很多东西(甚至是关于 Ruby 的).

Thanks for participating! As you can see from the comments, I learned a lot (even about Ruby).

推荐答案

这是一个品味问题,但这是一个简单的 javascript 版本:它对数组进行排序,然后只查看第一项和最后一项.

It's a matter of taste, but this is a simple javascript version: It sorts the array, and then looks just at the first and last items.

//数组中最长的公共起始子串

//longest common starting substring in an array

function sharedStart(array){
    var A= array.concat().sort(), 
    a1= A[0], a2= A[A.length-1], L= a1.length, i= 0;
    while(i<L && a1.charAt(i)=== a2.charAt(i)) i++;
    return a1.substring(0, i);
}

演示

sharedStart(['interspecies', 'interstelar', 'interstate'])  //=> 'inters'
sharedStart(['throne', 'throne'])                           //=> 'throne'
sharedStart(['throne', 'dungeon'])                          //=> ''
sharedStart(['cheese'])                                     //=> 'cheese'
sharedStart([])                                             //=> ''
sharedStart(['prefix', 'suffix'])                           //=> ''

这篇关于找出一组字符串中最长的公共起始子串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆