这是我的代码 [英] Hows my code

查看:64
本文介绍了这是我的代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码是我编写的程序的核心,用于从网页中提取

html标记。我的代码效率如何?还有

优化代码的可能方法。我是否按照

教科书使用了所有内容。我只是担心这可能会破坏或可能导致

内存泄漏。任何机会。


#define TOKN_SIZE 256

void tagfinder(){


char ch,* tokens;

int i,j,len;

i = j = 0;

//扫描缓冲区将网页保存为一个字符串。

len = strlen(scan_buffer);


while(i< len){

ch = scan_buffer [ i ++];

if(ch ==''<''){

tokens = malloc(TOKN_SIZE * sizeof(char));

j = 0;

while(ch!=''>''){

ch = scan_buffer [i ++];

if (j> = TOKN_SIZE)

tokens = realloc(代币,(j + TOKN_SIZE)* sizeof(char));

if(ch!=''> ''){

代币[j ++] = ch;

代币[j] =''\ 0'';

}


} //结束时间(ch!=''>'')

printf("%s \ n",tokens);

免费(代币);

} //结束if(ch ==''<'')

} //结束while(len> 0)


}

The following code is the heart of a program that I wrote to extract
html tags from a webpage. How efficient is my code ?. Is there still
possible way to optimize the code. Am I using everything as per the
text book. I am just apprehensive whether this may break or may cause a
memmory leak. Any chance for it.

#define TOKN_SIZE 256

void tagfinder() {

char ch, *tokens;
int i, j ,len;
i=j=0;
//scan buffer holds the webpage as a string.
len = strlen(scan_buffer);

while(i < len) {
ch = scan_buffer[i++];
if(ch == ''<'') {
tokens = malloc(TOKN_SIZE*sizeof(char));
j=0;
while(ch != ''>'') {
ch = scan_buffer[i++];
if(j >= TOKN_SIZE)
tokens = realloc(tokens, (j+TOKN_SIZE) * sizeof(char));
if(ch != ''>'') {
tokens[j++] = ch;
tokens[j] = ''\0'';
}

}// end of while(ch != ''>'')
printf("%s\n",tokens);
free(tokens);
}//end of if(ch == ''<'')
}//end of while(len > 0)

}

推荐答案

saraca意味着ashoka树写道:
saraca means ashoka tree wrote:
以下代码是我写的一个程序的核心提取教科书使用了所有内容。我只是担心这是否会破坏或可能导致内存泄漏。任何机会。
[代码剪断;看上去线程]
The following code is the heart of a program that I wrote to extract
html tags from a webpage. How efficient is my code ?. Is there still
possible way to optimize the code. Am I using everything as per the
text book. I am just apprehensive whether this may break or may cause a
memmory leak. Any chance for it.
[code snipped; see up-thread]




在担心效率之前,先担心正确性。

你使用malloc()和realloc()而不检查失败,

你使用realloc()的方式会导致内存泄漏如果realloc()

失败,插入''\ 0''就可以运行

分配区域的结尾,空标记<>会给你留下一个

非字符串,缺少终端''\ 0'',而''<''没有

匹配''> ''将完全脱离你的代码。


一旦你修复了这五个错误(以及其他任何我没有发生的事情)b $ b发生要找出你的缩进代码),你可以开始测量程序的性能,看看是否需要提高效率。请记住,如果它
需要一个小时来提高速度一毫秒,

你必须运行程序360万次才能收支平衡。


如果需要提高效率(因为它们可能会很好;

您的代码目前还远远不够紧),这里有四个

建议。请注意,C语言本身没有

效率的概念。所以这些建议的实际效果将因平台而异。作为一个实际问题,所有的b / b $ b可能会改善问题,但这不能保证。

再次,你必须衡量。


建议#1:学习如何使用strchr()函数,

,因为它可能找到''<''和''>''字符

比你快。不要重新发明轮子。


建议#2:如果你需要做的只是打印''<''之间的

子串和''>'',直接从

源缓冲区打印出来,摆脱malloc()和realloc()调用。

了解如何使用"%。* S"格式规范,或者学习如何使用fwrite()



建议#3:如果你的真实程序需要存储

substrings在某个地方而不是打印出来,不要
分配内存,直到你找到结束''>''并知道

如何你需要的空间很大。这样可以避免在你获得一个短子串时浪费内存,并且当你得到一个很长的子串时避免了realloc()

的开销。


建议#4:学习如何使用memcpy()函数,

,因为它可能会将字符串从大字符串

复制到目标区域的速度比您可以。 (这将是
几乎肯定比你目前的做法更好

存储大多数目的地位置两次!)不要重新发明

轮子。


-

Er*********@sun.com



Before worrying about efficiency, worry about correctness.
You use malloc() and realloc() without checking for failure,
the way you use realloc() will cause a memory leak if realloc()
ever fails, the insertion of ''\0'' can run off the end of your
allocated region, an empty tag "<>" will leave you with a
non-string lacking the terminal ''\0'', and a ''<'' without a
matching ''>'' will send your code completely off the rails.

Once you''ve fixed these five bugs (and any others I didn''t
happen to spot in your badly-indented code), you can start
measuring the performance of your program to see whether any
efficiency improvements are needed. Keep in mind that if it
takes you one hour to improve the speed by one millisecond,
you must run the program 3.6 million times just to break even.

If efficiency improvements are needed (as they well may be;
your code as it stands is far from tight), here are four
suggestions. Note that the C language itself has no notion of
"efficiency," so the actual effect of these suggestions will
vary from platform to platform. As a practical matter, all
four are likely to improve matters, but this is not guaranteed.
Again, you must measure.

Suggestion #1: Learn how to use the strchr() function,
because it can probably locate the ''<'' and ''>'' characters
faster than you can. Don''t reinvent the wheel.

Suggestion #2: If all you need to do is print out the
substrings between ''<'' and ''>'', print them directly from the
source buffer and get rid of the malloc() and realloc() calls.
Learn how to use the "%.*s" format specification, or learn how
to use fwrite().

Suggestion #3: If your real program needs to store the
substrings somewhere instead of just printing them out, don''t
allocate memory until you''ve located the closing ''>'' and know
how much space you''ll need. This avoids wasting memory when
you get a short substring, and avoids the overhead of realloc()
when you get a long one.

Suggestion #4: Learn how to use the memcpy() function,
because it can probably copy characters from the big string
to your destination area faster than you can. (It will
almost certainly do better than your current practice of
storing most destination positions twice!) Don''t reinvent
the wheel.

--
Er*********@sun.com




2004年12月21日星期二,saraca意味着ashoka树写道:

On Tue, 21 Dec 2004, saraca means ashoka tree wrote:

以下代码是我的一个程序的核心写道从网页中提取
html标签。我的代码效率如何?是否还有可能的方法来优化代码[?]


当然。

我是否按照教科书[? ]我只是担心这是否会破坏或可能导致内存泄漏。任何机会它[?]

The following code is the heart of a program that I wrote to extract
html tags from a webpage. How efficient is my code ?. Is there still
possible way to optimize the code[?]
Of course.
Am I using everything as per the text book[?] I am just apprehensive
whether this may break or may cause a memory leak. Any chance for it[?]




重新发布你的代码有一些缩进,也许有人会把

麻烦看看它。现在,它完全不可读。

(如果你的问题是你试图将硬标签发布到Usenet ......

不要!我推荐
http: //www.contrib.andrew.cmu.edu/~a...ftware/detab.c

,原因很明显。;)运行''detab -R -4 myprogram .c''并重新发帖。)


顺便说一下,确保您发布的文本实际编译。

您发布的内容此消息无法在C90或C99中编译,

此新闻组中讨论的语言。我强烈建议你

不要使用''''' - 你想要向任何人展示的C代码中的风格评论;

他们倾向于做坏事


//这是一个很长的评论,溢出该行并变成

语法错误


和每隔一段时间(尽管越来越少,谢天谢地),我们看到


file://这条评论被基于Windows的新闻客户端严重损坏


-Arthur



Re-post your code with some indentation, and maybe someone will take
the trouble to look at it. Right now, it''s completely unreadable.
(If your problem is that you''re trying to post hard tabs to Usenet...
don''t! I recommend
http://www.contrib.andrew.cmu.edu/~a...ftware/detab.c
, for obvious reasons. ;) Run ''detab -R -4 myprogram.c'' and re-post.)

Make sure the text you''re posting actually compiles, by the way.
What you posted in this message doesn''t compile in either C90 or C99,
the languages discussed in this newsgroup. I strongly recommend you
don''t use ''//''-style comments in C code you intend to show anyone;
they tend to do Bad Things like

// this is a long comment that overflows the line and turns into a
syntax error

and every so often (though less and less frequently, thankfully) we see

file://this comment was mangled by a Windows-based news client

-Arthur


saraca意味着ashoka树写道:
saraca means ashoka tree wrote:
以下代码是我写的提取程序的核心来自网页的HTML标签。我的代码效率如何?


最好不要使用malloc,只记得

a指向令牌开始的指针及其长度(或许,

开始和结束的scan_buffer的偏移量。


malloc的唯一原因是如果你需要

破坏scan_buffer但保持令牌,或者如果你需要将
传递给一个无法处理长度计数字符串的函数(printf不是
$ b) $ b那些功能)。

#define TOKN_SIZE 256






char ch,* tokens;
int i,j,len;
i = j = 0;
//扫描缓冲区将网页保存为字符串。
len = strlen(scan_buffer);


你忘了包括stdlib.h和string.h

while(i< len){
ch = scan_buffer [i ++];
if(ch ==''<''){
tokens = malloc(TOKN_SIZE * sizeof(char));


sizeof(char)始终为1.

您需要检查malloc的返回值。如果你的内存不足,它将返回NULL



j = 0;
while(ch!=''>''){
ch = scan_buffer [i ++];
if(j> = TOKN_SIZE)
tokens = realloc(tokens,(j + TOKN_SIZE)* sizeof(char));


您需要检查realloc的返回值。

此外,如果内存不足,则realloc将返回NULL

并泄漏原始缓冲区。所以为了避免内存泄漏,你需要这样的东西:

temp = realloc(.......);

if(!temp){free(tokens);出口(EXIT_FAILURE); }

tokens = temp;


这也是糟糕的设计,因为你一次重新分配一个char

。因此,如果您的令牌长度为1024,那么

您将最终进行1024 - TOKN_SIZE分配。

您至少可以通过
$ b来增加分配大小每次$ b TOKN_SIZE。


更好的方法是在完成任何分配之前计算令牌长度

。然后你只需要
需要分配一次(内存分配是订单比扫描一个令牌慢两倍)。

if(ch !=''>''){
标记[j ++] = ch;
标记[j] =''\ 0'';
}


你溢出''令牌''。例如,如果j == TOKN_SIZE-1

那么标记[j ++] = ch将最后一个字符设置为ch,

然后标记[j] = 0写入过去缓冲区的结束。


无论如何这都是低效的,因为你写了每个

的时间。在

令牌完成后,您应该只写一次\0。

} //结束时间(ch!=''>'')
printf("%s\ n",tokens);
free(令牌);
} // if的结尾(ch ==''<'')
} //结束时间(len> 0)

}
The following code is the heart of a program that I wrote to extract
html tags from a webpage. How efficient is my code ?.
Better would be to not use malloc at all, and just remember
a pointer to the token start, and its length (or perhaps,
the offset into scan_buffer of the start and end).

The only reason to malloc would be if you needed to
destroy scan_buffer but keep the tokens, or if you
needed to pass the token to a function that cannot
handle a length-counted string (printf is not one of
those functions).

#define TOKN_SIZE 256

void tagfinder() {

char ch, *tokens;
int i, j ,len;
i=j=0;
//scan buffer holds the webpage as a string.
len = strlen(scan_buffer);
You forgot to include stdlib.h and string.h
while(i < len) {
ch = scan_buffer[i++];
if(ch == ''<'') {
tokens = malloc(TOKN_SIZE*sizeof(char));
sizeof(char) is always 1.
You need to check malloc''s return value. It will return NULL
if you have run out of memory.
j=0;
while(ch != ''>'') {
ch = scan_buffer[i++];
if(j >= TOKN_SIZE)
tokens = realloc(tokens, (j+TOKN_SIZE) * sizeof(char));
You need to check realloc''s return value.
Also, if you run out of memory then realloc will return NULL
and leak the original buffer. So to avoid leaks in the case
of a memory shortage you need to so something like:
temp = realloc(.......);
if (!temp) { free(tokens); exit(EXIT_FAILURE); }
tokens = temp;

This is also bad design because you realloc one char
at a time. So if your token was 1024 in length then
you will end up doing 1024 - TOKN_SIZE allocations.
You could at least increase the allocation size by
TOKN_SIZE each time.

Even better would be to count the length of the token
before you do any allocations at all. Then you only
need to allocate once (a memory allocation is orders
of magnitude slower than scanning a token twice).
if(ch != ''>'') {
tokens[j++] = ch;
tokens[j] = ''\0'';
}
You overflowed ''tokens''. For example, if j == TOKN_SIZE-1
then tokens[j++]=ch sets the last character to ch,
and then tokens[j]=0 writes past the end of the buffer.

This is inefficient anyway because you write a \0 every
time. You should only write the \0 once, after the
token is finished.

}// end of while(ch != ''>'')
printf("%s\n",tokens);
free(tokens);
}//end of if(ch == ''<'')
}//end of while(len > 0)

}






这篇关于这是我的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆