字符串搜索使用符号表中的罗伯特·Sedwick书 [英] string search using symbol tables in Robert Sedwick book

查看:141
本文介绍了字符串搜索使用符号表中的罗伯特·Sedwick书的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对罗伯特Sedwick阅读关于算法的C ++。在第12章有关的符号表下面的文本采取了。

该计划假定Item.cxx定义为char *数据重新presentation在项目,重载运算符&LT字符串键;使用STRCMP,使用STRNCMP重载运算符==,和转换操作符的项目为char *(见正文)。主程序读取从指定的文件的文本字符串,并使用一个符号表建立从由起始于在文本串的每个字符定义的字符串的索引。然后,它从标准输入读取查询字符串,并打印在查询的文本中找到(或找不到打印)的位置。用BST符号表实现中,搜索速度快,即使对于巨大字符串

 的#include< iostream.h时>
#包括< fstream.h>
#包括Item.cxx
#包括ST.cxx
静态字符文本[MAXN]
INT主(INT ARGC,字符* argv的[])
  {INT N = 0;图表;
    ifstream的主体; corpus.open(* ++ argv的);
    而(N< MAXN和放大器;&安培; corpus.get(t))的文本[N +] = T;
    文[N] = 0;
    ST<项目,重点> ST(MAXN);
    的for(int i = 0; I&n种;我++)st.insert(安培;文[I]);
    焦炭查询[MAXQ]项目X,V(查询);
    而(cin.getline(查询,MAXQ))
      如果((X = st.search(v.key()))。空())
           COUT<< 找不到:<<查询<< ENDL;
      否则COUT<<的X文字<< :&其中;&其中;查询<< ENDL; //这里的问题。
  }
 

  

以上程序读取一系列从标准输入查询,用途   搜索,以确定每个查询是否是在文本,并打印出   查询的第一次出现的文本位置。如果符号   表与BSTS实施,那么我们预计搜索将   涉及约2N LNñ比较。例如,一旦该指数是   建,我们可以找到一个文本的短语组成的约1   万字(如白鲸)约30串   比较。本申请是一样的索引,因为C   字符串的指针索引到一个字符数组:如果x点   文本[I]中,然后将两个指针之间的差,X文本,是   等于i。

我上面的文字问题是

  1. 如何上面的程序工作,以找到字符串中的符号表中笔者只存储字符?是否有程序错误。

  2. 正如在下面科芒特里认为如果x指向文本的两个指针之间[Ⅰ],然后差值,X文本,等于i。如何笔者总结的?

  3. 文本,我们可以发现在由约100万字符(如白鲸)约30字符串比较文本任何短语。在这里,笔者conlcuded我们能够找到词组在给定的文本字符串30 comparisions。因为我们需要2NlogN comparisions?

4.如果任何一个有机会获得预定如何笔者绘制的BST图12.11。索引的文本字符串?这是怎么联系到上述程序的实例。

谢谢!

解决方案

我没有文字,但我通过谷歌图书访问最相关的部分。

  1. 这是事实,文本是一个字符数组。然而,当搜索树 ST 而建,有什么插入的字符指针( st.insert(安培;文[I]))。你可以把的char * 为C字符串类型。所以搜索树运行在弦,而不是字符。

  2. X 指向正文[I] 意味着 X - 文= =我是C指针运算。

     字符文本[1];
    INT I = 0;
    字符* X =&放大器;文[I]
    INT II = X  - 文本;
    断言(我==二);
     

  3. 2N LN N是一个错字。如果你是指回12.6财产,你会看到,它应为2 LN N档,对于N等于一百万,为27.63,或约30。

  4. 我无法通过谷歌图书访问的身影12.11,所以我不能回答这个问题。

I am reading about Algorithms in C++ by Robert Sedwick. In chapter 12 about symbol tables following text is taken.

This program assumes that Item.cxx defines a char* data representation for string keys in items, an overloaded operator< that uses strcmp, an overloaded operator== that uses strncmp, and a conversion operator from Item to char* (see text). The main program reads a text string from a specified file and uses a symbol table to build an index from the strings defined by starting at each character in the text string. Then, it reads query strings from standard input, and prints the position where the query is found in the text (or prints not found). With a BST symbol-table implementation, the search is fast, even for huge strings.

#include <iostream.h>
#include <fstream.h>
#include "Item.cxx"
#include "ST.cxx"
static char text[maxN];
int main(int argc, char *argv[])
  { int N = 0; char t;
    ifstream corpus; corpus.open(*++argv);
    while (N < maxN && corpus.get(t)) text[N++] = t;
    text[N] = 0;
    ST<Item, Key> st(maxN);
    for (int i = 0; i < N; i++) st.insert(&text[i]);
    char query[maxQ]; Item x, v(query);
    while (cin.getline(query, maxQ))
      if ((x = st.search(v.key())).null())
           cout << "not found: " << query << endl;
      else cout << x-text << ": " << query << endl; // Question here.
  }

Above program reads a series of queries from standard input, uses search to determine whether each query is in the text, and prints out the text position of the first occurrence of the query. If the symbol table is implemented with BSTs, then we expect that the search will involve about 2N ln N comparisons. For example, once the index is built, we could find any phrase in a text consisting of about 1 million characters (such as Moby Dick) with about 30 string comparisons. This application is the same as indexing, because C string pointers are indices into a character array: If x points to text[i], then the difference between the two pointers, x-text, is equal to i.

My question on above text is

  1. How can above program work to find string in symbol table as author is storing only char? Is there is bug in program.

  2. As mentioned in below commentry that "If x points to text[i], then the difference between the two pointers, x-text, is equal to i." How author concluded this?

  3. Text "we could find any phrase in a text consisting of about 1 million characters (such as Moby Dick) with about 30 string comparisons." Here author conlcuded that we can find phrase in given text with 30 string comparisions. as we required 2NlogN comparisions?

4.If any one have access to book how author has drawn BST Figure 12.11. Example of indexing a text string"? And how this is linked to above program.

Thanks!

解决方案

I do not have the text, but I accessed most of the relevant parts through google books.

  1. It is true that text is an array of characters. However, when the search tree st is built, what is inserted are character pointers (st.insert(&text[i])). You can think of char* as the type of a C string. So the search tree operates on strings, not characters.

  2. x points to text[i] implies x - text == i is C pointer arithmetic.

    char text[1];
    int i = 0;
    char* x = &text[i];
    int ii = x - text;
    assert(i == ii);
    

  3. "2N ln N" is a typo. If you refer back to "Property 12.6" you will see that it should read "2 ln N" which, for N equals a million, is 27.63, or about 30.

  4. I could not access figure 12.11 through google books, so I cannot answer this one.

这篇关于字符串搜索使用符号表中的罗伯特·Sedwick书的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆