C:什么是连接字符串的最佳和最快的方式 [英] C: What is the best and fastest way to concatenate strings

查看:169
本文介绍了C:什么是连接字符串的最佳和最快的方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前串联使用的strcat()函数从文件string.h 库C字符串。

我想过这个问题,我得到一个结论,它应该是非常昂贵的功能,它开始串连之前,它遍历字符数组直到找到'\\ 0 字符。

例如,如果我将字符串伯乐使用1000次的strcat(),我会必须付款
(1 + 2 + 3 + ... + 1000)*的strlen(马)=(1000 * 1001)/ 2 * 6 = 3003000

我想到了非标准的方式,保持一个整数,字符串的长度,然后发送到的strcat()的指针字符串的结束:

 的strcat(DEST + dest_len,串);

在这种情况下,我只付 1000 * strlen的(马)= 1000×6 = 6000

6000 3003000 低得多,所以如果你赚了很多它可以是性能非常关键这样的串联。

有没有做一些更标准的方式,看起来比我更好的解决方案?


解决方案

乔尔斯波斯基,在他的歌姬一文,介绍了低效率的字符串连接与 strcat的问题作为的 Shlemiel画家的算法的(读条,这是相当不错的)。由于效率低下code的一个例子,他给出了这样的例子,它运行在为O(n 2 )时间:


 字符bigString [1000]; / *我从来不知道有多少分配... * /
bigString [0] ='\\ 0';
strcat的(bigString,约翰);
strcat的(bigString,保);
strcat的(bigString,乔治);
strcat的(bigString,乔尔);


这不是真正走在第一个字符串的首次的问题;因为我们已经有走在第二个字符串的运行时的有一个 strcat的是结果的长度是线性的。多个 strcat的 s是有问题的,但因为我们走在$ P $连连pviously连接结果。他提供了这个替代方案:


  

我们如何解决这个问题?几个聪明的C程序员来实现自己的
   mystrcat 如下:

 的char * mystrcat(字符* DEST,字符* SRC)
{
     而(* DEST)DEST ++;
     而(* DEST ++ = * SRC ++);
     返回--dest;
}


  
  

我们用什么在这里完成?在非常少的额外成本我们返回一个
  指针为新的,较长的字符串的末尾。这样的code,它
  调用这个函数可以决定进一步追加不重新扫描
  字符串:

 字符bigString [1000]; / *我从来不知道有多少分配... * /
的char * p = bigString;
bigString [0] ='\\ 0';
P = mystrcat(P,约翰);
P = mystrcat(P,保);
P = mystrcat(P,乔治);
P = mystrcat(P,乔尔);

这是当然的,在性能上的线性,而不是正的平方,所以它
  当你有很多东西不能降解苦
  串连。


当然,这是如果你想使用标准C字符串你可以做什么。那你描述缓存字符串的长度,并使用一种特殊的串联功能的替代(比如,要求 strcat的略有不同的参数)有几分帕斯卡变化的字符串,乔尔也提到:


  

帕斯卡的设计者意识到这个问题,并通过固定它的
  存储在所述字符串的第一个字节一个字节计数。这些被称为
  帕斯卡尔字符串。它们可以包含零和不空终止。
  因为一个字节只能存储0和255帕斯卡之间的数字
  串被限制为255字节,但因为它们不是
  空终止他们占用内存ASCIZ相同数量
  字符串。关于帕斯卡尔字符串伟大的事情是,你永远不会有
  有一个循环只找出您的字符串的长度。查找
  帕斯卡字符串的长度为一个汇编指令,而不是
  的一整圈。这是纪念碑式更快。<​​/ P>
  
  

...


  
  

在很长一段时间,如果你想把一个Pascal字符串在
  C code,你必须写:

 的char *海峡=\\ 006Hello!;


  
  

是的,你还得算上手工字节,自己和硬code将其
  到字符串的第一个字节。懒惰的程序员会做到这一点,
  并有缓慢的方案:

 的char *海峡=*你好!;
海峡[0] = strlen的(STR) - 1;


I currently concatenate strings in c using the strcat() function from string.h library.

I thought about it, and I got to a conclusion that it should be very expensive function, as before it starts to concatenate, it has to iterate over the char array until it finds the '\0' char.

For example, if I concatenate the string "horses" 1000 times using strcat(), I'll have to pay (1 + 2 + 3 + ... + 1000) * strlen("horses") = (1000*1001)/2 * 6 = 3003000

I thought about the non-standard way, of maintaining an integer with the string length, and then sending to strcat() the pointer to the end of the string:

strcat(dest + dest_len, "string");

In this case, I'll pay only 1000 * strlen("horses") = 1000 * 6 = 6000.

6000 is much lower than 3003000, so it can be very critical for performance if you make a lot of such concatenations.

Is there some more standard way to do it, looks better than my solution?

解决方案

Joel Spolsky, in his Back to Basics article, describes the problem of inefficient string concatenation with strcat as the Shlemiel the painter's algorithm (read the article, it's quite good). As an example of inefficient code, he gives this example, which runs in O(n2) time:

char bigString[1000];     /* I never know how much to allocate... */
bigString[0] = '\0';
strcat(bigString,"John, ");
strcat(bigString,"Paul, ");
strcat(bigString,"George, ");
strcat(bigString,"Joel ");

It's not really a problem to walk over the first string the first time; since we've already got to walk over the second string, the runtime of one strcat is linear in the length of the result. Multiple strcats is problematic though, because we walk over the previously concatenated results again and again. He provides this alternative:

How do we fix this? A few smart C programmers implemented their own mystrcat as follows:

char* mystrcat( char* dest, char* src )
{
     while (*dest) dest++;
     while (*dest++ = *src++);
     return --dest;
}

What have we done here? At very little extra cost we're returning a pointer to the end of the new, longer string. That way the code that calls this function can decide to append further without rescanning the string:

char bigString[1000];     /* I never know how much to allocate... */
char *p = bigString;
bigString[0] = '\0';
p = mystrcat(p,"John, ");
p = mystrcat(p,"Paul, ");
p = mystrcat(p,"George, ");
p = mystrcat(p,"Joel ");

This is, of course, linear in performance, not n-squared, so it doesn't suffer from degradation when you have a lot of stuff to concatenate.

Of course, this is what you can do if you want to use standard C strings. The alternative that you're describing of caching the length of the string and using a special concatenation function (e.g., calling strcat with slightly different arguments) is sort of a variation on Pascal strings, which Joel also mentioned:

The designers of Pascal were aware of this problem and "fixed" it by storing a byte count in the first byte of the string. These are called Pascal Strings. They can contain zeros and are not null terminated. Because a byte can only store numbers between 0 and 255, Pascal strings are limited to 255 bytes in length, but because they are not null terminated they occupy the same amount of memory as ASCIZ strings. The great thing about Pascal strings is that you never have to have a loop just to figure out the length of your string. Finding the length of a string in Pascal is one assembly instruction instead of a whole loop. It is monumentally faster.

For a long time, if you wanted to put a Pascal string literal in your C code, you had to write:

char* str = "\006Hello!";

Yep, you had to count the bytes by hand, yourself, and hardcode it into the first byte of your string. Lazy programmers would do this, and have slow programs:

char* str = "*Hello!";
str[0] = strlen(str) - 1;

这篇关于C:什么是连接字符串的最佳和最快的方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆