子串递归算法不起作用 [英] Substring recursive algorithm not working

查看:52
本文介绍了子串递归算法不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是我的第一个 C++ 课程的编程学生,最近我们被鼓励编写一个简单的递归函数来查找给定字符串中子字符串的第一次出现.如果找到,则返回索引.如果未找到子字符串,index_of() 函数应返回 -1.我们鼓励使用将索引作为其参数之一的辅助函数,这就是我所尝试的.

I'm a programming student in my first C++ class, and recently we were encouraged to write a simple recursive function to find the first occurrence of a substring in a given string. If found, it returns the index. If the substring is not found, the index_of() function should return -1. We are encouraged to use a helper function that takes the index as one of its parameters, and this is what I've tried.

例如:

int index_of("Mississippi", "sip"); // this would return a 6

这本应该是一个简单的练习,帮助我们理解递归,不会上交.我的教授说我们实际的递归作业会涉及更多,这就是为什么我很想了解这个简单的用法的递归.

This is supposed to be a simple exercise to help us understand recursion and won't be turned in. My professor stated that our actual assignment with recursion will be much more involved, which is why I really want to understand this simple use of recursion.

我已经使用 C 风格的字符串和指针成功地做到了这一点,但没有使用 C++ std::string 对象.我在我的程序中做错了什么?我的教授说我们应该可以轻松地在 5 分钟内写出这个,但我已经为此苦苦挣扎了两个小时.这是我到目前为止所做的:

I've done this successfully using C-style strings and pointers, but not with C++ std::string objects. What am I doing wrong in my program? My professor stated we should easily be able to write this in 5 mins, but I've been struggling with it for two hours. Here's what I've done so far:

int index_of(string s, string t)
{
    int index = 0;

    if (s[index] == NULL)
        return -1;
    else if (starts_with(s, t, ++index))
    {
        return index;
    }
    else 
        return index;
}

bool starts_with(string s, string t, int index)
{
    if (t[index] == NULL)
        return true;
    if ( s[index] == NULL || t[0] != s[index])
        return false;
    return starts_with(s, t, ++index);
}

正如所写,这个函数总是返回一个 index 为 1.

As written, this function always returns an index of 1.

推荐答案

int index_of(string s, string t)
{
    int index = 0;

    if (s[index] == NULL)

句号.这不是 C++ 字符串的工作方式,如果你想使用它们,你必须解决这个问题.即使使用 C 风格的字符串,也不要使用 NULL 来表示 ASCII 空字符.它们共享一个名称但具有不同的用途,您不应该使用 NULL 来表示整数零(字符是整数类型,空字符是它们的零值).使用 '\0' 或仅使用 if (s[index]).

Full stop. This isn't how C++'s strings work and you must fix this if you want to use them. Even with C-style strings, don't use NULL to mean the ASCII null character. They share a name but have different purposes, and you should not use NULL to mean integer zero (chars are integer types and the null character is their zero value). Use '\0' or just if (s[index]).

但是,除非您知道索引有效,否则不允许索引 std::string.为此,请将索引与 s.size() 进行比较(并确保它大于或等于 0).即便如此,您在这里真正要测试的是 s 是否为空,并且它有一个特殊的方法来做到这一点:

However, you aren't allowed to index a std::string unless you know the index is valid. To do that, compare the index against s.size() (and make sure it's greater than or equal to 0). Even so, what you are really testing here is if s is empty, and it has a special method to do that:

    if (s.empty())


继续:


Continuing:

    else if (starts_with(s, t, ++index))

增量和减量内部表达式,尤其是这里,可能会让初学者感到困惑,没有任何优势.它们的主要优点是代码简洁明了,但您必须先了解代码的主要部分,即使如此,有经验的程序员有时也会受益于稍微详细一点.

Increment and decrement inside expressions, especially as here, can be confusing for the beginner with no advantage. The main advantage of them is code that is succinct and clear, but you have to already understand the main part of the code first, and even then experienced programmers sometimes benefit from being a tiny bit more verbose.

有趣的是,Go 的创造者,他们也参与了早期 C 的历史,甚至将增量从表达式变成了语句,我相信清晰度是很大一部分原因.

Anecdotally, Go's creators, who were also involved in early C history, even turned increment from an expression into a statement, and I believe clarity is a large part of the reason.

你想用这个签名实现一个函数:

You want to implement a function with this signature:

int index_of(string haystack, string needle);
// returns the index of needle in haystack, if found
// otherwise returns -1

我特意将那些带有签名的注释包含在内:它们是此函数的公共接口的一部分.更好的参数名称也会增加清晰度.

I include those comments with the signature on purpose: they are part of the public interface for this function. Better parameter names also increase clarity.

确定您需要考虑的情况:

Identify the cases you need to consider:

  • 针是空的(您可以通过多种方式处理)
  • 干草堆是空的:返回 -1
  • 此时我们知道haystack和needle都不是空的
  • 剩下的两种情况是算法的关键:
    • haystack的第一个字符与needle的第一个字符不匹配
    • 匹配第一个字符

    当第一个字符匹配时,你有两种子情况:

    And when there is a match of the first character, you have two sub-cases:

    • needle 中没有更多字符:找到匹配
    • 还有更多字符:继续检查

    我已经将这些编写为递归算法,它接收新副本";每个字符串(和子字符串)而不是使用索引.但是,您可以通过更改第一个字符"来转换为使用索引.到当前字符",对于空"也类似.使适应.在这种情况下,您将需要使用 两个 索引(并且尝试只使用一个索引可能是您迄今为止的主要绊脚石),除非您有一个帮助函数来比较子字符串(尽管我'我不确定你的教授是否对这个评论有单独的意图).

    I've written these as a recursive algorithm which receives "new copies" of each string (and substring) instead of using indices. However, you can transform to use indices by changing "first character" to "current character", and similarly for the "empty" conditions. You will want to use two indices in that case (and trying to only use one may have been a major stumbling block for you so far), unless you have a helping function to compare substrings (though I'm unsure if your professor had a separate intention with this comment).

    将上述散文直接翻译成代码:

    A direct translation of the above prose into code:

    int index_of(string haystack, string needle) {
      if (needle.empty()) return 0;
      // this implementation considers empty substrings to occur at the start of any
      // string, even an empty haystack; you could also make it an error to call
      // index_of when needle is empty, or just return -1
    
      if (haystack.empty()) return -1;
    
      assert(!needle.empty() && !haystack.empty()); // I wouldn't normally include
      // this, since we just checked these conditions, but this is the "at this
      // point we know both haystack and needle are not empty" that I mentioned
    
      if (haystack[0] != needle[0]) {
        // mark A, see below
        int index = index_of(haystack.substr(1), needle);
        return index != -1 ? index + 1 : index;
      }
    
      if (needle.length() == 1) return 0; // found complete match
      // note the way I chose to handle needle.empty() above makes this unnecessary
    
      // mark B, see below    
      // partial match (of the first character), continue matching
      int index = index_of(haystack.substr(1), needle.substr(1)); // strip first
      return index == 0 ? 0 : -1;
      // must check index == 0 exactly, if -1 then we must return that, and if not 0
      // then we've found a "broken" needle, which isn't a real match
    }
    

    断针注释暗示了该代码的效率有多低,因为它将递归调用分为两类:必须在 1(分割成子串后为 0),在标记 B 处匹配,并且可以在任何地方匹配,在标记处答:我们可以用一个辅助函数来改进它,为此我将使用 std::string 的 operator== 重载(在 haystack 的子串上操作).这产生了经典的naive strstr"的递归等价物:

    The broken needle comment hints at how inefficient that code is, as it bifurcates the recursive calls into two categories: must match at 1 (which is 0 after slicing into substrings), at mark B, and can match anywhere, at mark A. We can improve this with a helper function, and I'll use std::string's operator== overload (operating on a substring of haystack) for that. This yields the recursive equivalent of the classical "naive strstr":

    int index_of(string haystack, string needle) {
      if (needle.empty()) return 0;
      if (haystack.empty()) return -1;
      if (haystack.substr(0, needle.length()) == needle()) {
        return 0;
      }
      int index = index_of(haystack.substr(1), needle);
      if (index != -1) index++;
      return index;
    }
    

    并且当使用 string::compare 作为 helper 的 haystack 索引时,因此不需要针索引:

    And when using an index for haystack with string::compare as the helper so a needle index isn't required:

    // might not be exposed publicly, but could be
    int index_of(string const& haystack, int haystack_pos, string const& needle) {
      // would normally use string const& for all the string parameters in this
      // answer, but I've mostly stuck to the prototype you already have
    
      // shorter local name, keep parameter name the same for interface clarity
      int& h = haystack_pos;
    
      // preconditions:
      assert(0 <= h && h <= haystack.length());
    
      if (needle.empty()) return h;
      if (h == haystack.length()) return -1;
      if (haystack.compare(h, needle.length(), needle) == 0) {
        return h;
      }
      return index_of(haystack, h+1, needle);
    }
    
    int index_of(string haystack, string needle) {
      // sets up initial values or the "context" for the common case
      return index_of(haystack, 0, needle);
    }
    

    注意这个版本是尾递归的,但这仍然是一个简单的算法和更高级的算法存在.

    Notice this version is tail-recursive, but this is still a naive algorithm and more advanced ones exist.

    如果我有更多的时间,我会写一封较短的信.
    — 西塞罗

    If I had more time, I would have written a shorter letter.
        — Cicero

    您说过这对您有很大帮助,但是,即使使用我刚刚包含的其他示例,对我来说似乎也有所欠缺.在我看来,子字符串搜索不是一个好的递归练习,这可能就是原因.

    You've said this is helped you a lot, but, even with the additional examples I just included, it seems lacking to me. Substring-search is not a good recursion exercise, in my opinion, and that could be why.

    这篇关于子串递归算法不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆