最有效的方法来匹配C(几)字符串? [英] Most efficient way to match (a few) strings in C?

查看:146
本文介绍了最有效的方法来匹配C(几)字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的系统需要从对一些已知关键字字符串(也许10)。

终端和匹配接受用户输入

我们没有空间/ computrons做正则表达式等,code必须是微小和放大器;快。

现在,讨厌的方式做到这一点是:

  // str是空值终止的,假设我们知道它的安全/理智这里
   如果(!STRNCMP(STR,你好,5)
   {
      do_hello();
   }
   否则,如果(!STRNCMP(STR,世界,5)
   {
      do_world();
   }
   其他
   {
      MEH(); //不是比赛
   }

所以,有点谷歌搜索与放大器之后;阅读我是相信一个更好的办法是pre-计算各种比赛的哈希值作为一个int,然后只用一个case语句:

  //假设散列()停在空
开关(哈希(STR))
{
   案例HASH_OF_HELLO:
      do_hello();
      打破;   案例HASH_OF_WORLD:
      do_world();
      打破;   默认:
      MEH();
      打破;
}

我们可以计算出* HASH_OF_match *在编译时。这可能似乎更快/更优雅的方式来接从一个相对较小的集的字符串。

所以 - 这是否似乎是合理的? /有有这样一个突出的问题? /任何人有这样做的更优雅的方式?

作为一个注脚,这是最好的寻找哈希算法我已经看到了今天下午。),贷记丹伯恩斯坦,它看起来到手头的工作。

 无符号整型
get_hash(为const char * S)
{
    unsigned int类型的哈希= 0;
    INT℃;    而((C = * S +))
    {
        //哈希散列= * 33 ^ C
        哈希=((哈希<< 5)+散)^ C;
    }    返回哈希;
}


解决方案

通过散列的问题是,用户输入的任意字符串可以产生相同的散列作为你的匹配的之一,你'会执行错误的东西。对于一个搜索设置为小到10我只是坚持到的if-else 办法。或者用一个字符串数组和函数指针数组(假设所有的功能具有相同的签名),选择要执行的功能。

 字符常量*匹配[10] = {第一,第二,......,第十届};
空隙(* FN [10])(无效)= {&放大器; do_first,&放大器; do_second,...,放大器; do_tenth};对于(I = 0; I&小于10 ++ⅰ){
  如果(STRCMP(STR,火柴[I])== 0){
    (* FN [I])();
  }
}

Our system needs to accept user input from a terminal and match against a few known keyword strings (maybe 10).

We don't have the space/computrons to do regexp etc., code needs to be tiny & quick.

Now, the nasty way to do this is:

   // str is null-terminated, assume we know it's safe/sane here
   if(!strncmp(str,"hello",5)
   {
      do_hello();
   }
   else if(!strncmp(str,"world",5)
   {
      do_world();
   }
   else
   {
      meh(); // Wasn't a match
   }

So, after a bit of googling & reading I'm being convinced that a nicer way is to pre-compute the hash of the various matches as an int, and then just use a case statement:

// Assume hash() stops at NULL
switch(hash(str))
{
   case HASH_OF_HELLO:
      do_hello();
      break;

   case HASH_OF_WORLD:
      do_world();
      break;

   default:
      meh();
      break;
}

We can compute the *HASH_OF_match* at compile time. This seems potentially a faster / more elegant way to pick a string from a relatively small set.

So - does this seem reasonable? / Is there a glaring problem with doing this? / Anyone got a more elegant way of doing it?

As a footnote, this is the nicest looking hash algorithm I've seen this afternoon ;), credited to dan bernstein, it looks up to the job at hand.

unsigned int
get_hash(const char* s)
{
    unsigned int hash = 0;
    int c;

    while((c = *s++))
    {
        // hash = hash * 33 ^ c 
        hash = ((hash << 5) + hash) ^ c;
    }

    return hash;
}

解决方案

The problem with hashing is that an arbitrary string entered by the user may generate the same hash as one of your matches and you'll execute the wrong stuff. For a search set as small as 10 I'd just stick to the if-else approach. Or use a string array and function pointer array (assuming all functions have the same signature) to select the function to execute.

char const *matches[10] = {"first", "second", ..., "tenth"};
void (*fn[10])(void) = {&do_first, &do_second, ..., &do_tenth};

for( i = 0; i < 10; ++i ) {
  if( strcmp( str, matches[i] ) == 0 ) {
    (*fn[i])();
  }
}

这篇关于最有效的方法来匹配C(几)字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆