理论问题 [英] Theoretical Problem

查看:65
本文介绍了理论问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经完成了一项任务,即开发一个程序,坐在一个基于cgi

的c程序旁边(我知道这是offtopic但我的问题仅涉及

代码的标准c部分。


基本上程序需要做的是,打开一个文本文件,包含

数百甚至数千项目,并删除重复的项目。

此外,如果行长度超过一定数量的字符,我希望

删除它。


我正在考虑使用malloc创建一个动态生成的数组

来加载元素,然后循环遍历所有删除重复的

项目。


考虑到我处理的文件大小,这似乎有点资源。有没有人对如何更好地处理

手头的任务有什么建议?


提前谢谢

Mick

I have been given the task, of developing a program to sit next to a cgi
based c program (I know this is offtopic but my question does only refer
to the standard c part of the code).

Basically what the program needs to do, is open a text file, containing
hundreds maybe even thousands of items, and remove duplicate items.
Also if the line length exceds a certian amount of characters, I wish to
remove it.

I was thinking of using malloc to create a dynamically generated array
to load the elements into, then loop through them all removing duplicate
items.

This seems a little resource heavy considering the size of the file I am
dealing with. Does anyone have any suggestions on how to better handle
the task at hand?

Thanks in advance
Mick

推荐答案

具体化< ma ********** @ privacy.net>写道:
Materialised <ma**********@privacy.net> writes:
基本上该程序需要做的是打开一个包含数百甚至数千个项目的文本文件,并删除重复的项目。另外如果行长超过了一定数量的字符,我希望
删除它。
Basically what the program needs to do, is open a text file,
containing hundreds maybe even thousands of items, and remove
duplicate items.
Also if the line length exceds a certian amount of characters, I wish
to remove it.




我认为你最好结合现有的工具来做到这一点。

例如,在大多数操作系统上你可以使用`sort<输入| uniq -c>

输出''或类似的东西,以删除重复项。

-

有些人*是*傲慢,其他人阅读常见问题解答。

--Chris Dollin



I think you''d be best off combining existing tools to do this.
For example, on most OSes you could use `sort < input | uniq -c >
output'', or something similar, to do the removal of duplicates.
--
"Some people *are* arrogant, and others read the FAQ."
--Chris Dollin


物化< ma ****** ****@privacy.net>写道:
Materialised <ma**********@privacy.net> wrote:
有没有人对如何更好地处理手头的任务有任何建议?
Does anyone have any suggestions on how to better handle
the task at hand?




假设你不能使用现有的工具,正如Ben建议的那样,我不能认为

更好的方法来解决你的问题。


我建议使用动态数据结构当你在每个项目中读取
时,你插入项目并保持一切排序。


当你插入时,你也可以检查看看是否是一个

重复并中止插入。


这样,你只需要对数据进行一次传递即可删除

重复,一旦完成,你就可以把所有东西都写回来。


至于数据结构,我可能会建议使用红黑树。
< br $> b $ b -

== Eric Gorr ========= http://www.ericgorr.net ========= ICQ:9293199 ===

因此,对智能的考虑总是包括

福利和伤害。 - 孙子

==侮辱,就像暴力一样,是无能的最后避难所... ===



Assuming you cannot use existing tools, as Ben suggested, I cannot think
of a better method to solve your problem.

I would suggest using a dynamic data structure such that as you read in
each item, you insert the item and keep everything sorted.

As you do the insert then, you can also check to see if it is a
duplicate and abort the insert.

This way, you only have to do one pass over the data to remove the
duplicates and once complete you can just write everything back out.

As for the data structure, I might suggest using a red-black tree.

--
== Eric Gorr ========= http://www.ericgorr.net ========= ICQ:9293199 ===
"Therefore the considerations of the intelligent always include both
benefit and harm." - Sun Tzu
== Insults, like violence, are the last refuge of the incompetent... ===


例如************* @ verizon.net (Eric)写道:
eg*************@verizon.net (Eric) writes:
我建议使用动态数据结构,当你在每个项目中读到时,你插入项目并保持一切排序。

当您执行插入操作时,您还可以检查它是否重复并中止插入。

这样,您只需要对数据进行一次传递即可删除
重复,一旦完成,你可以把所有东西都写回来。

至于数据结构,我可能会建议使用红黑树。
I would suggest using a dynamic data structure such that as you read in
each item, you insert the item and keep everything sorted.

As you do the insert then, you can also check to see if it is a
duplicate and abort the insert.

This way, you only have to do one pass over the data to remove the
duplicates and once complete you can just write everything back out.

As for the data structure, I might suggest using a red-black tree.




我不能想到为什么红黑树会优于散列

表。哈希表适用于查找重复项。

-

欢迎来到未定义行为的奇妙世界,其中恶魔

是鼻子并且DeathStation用户很紧张。 --Daniel Fox



I can''t think why a red-black tree would be superior to a hash
table here. Hash tables work fine for finding duplicates.
--
"Welcome to the wonderful world of undefined behavior, where the demons
are nasal and the DeathStation users are nervous." --Daniel Fox


这篇关于理论问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆