面试问题。 [英] An interview question.

查看:58
本文介绍了面试问题。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我被要求压缩一个文本文件,其方式是写出这个单词被重复的次数。比如

这个:


如果这是原始文件:


------- --------------------------------

AAAA BBB CCCC AAAA CCCC AAAA .... ....

结果将是:

------------------------- --------------

3 AAAA

1 BBB

2 CCCC


我很欣赏任何提示。


Matt

I was asked to compress a text file in a way that evey word be written
with the number of times that such a word is repeated. Something like
this:

If this is the original file:

---------------------------------------
AAAA BBB CCCC AAAA CCCC AAAA........
the result will be:
---------------------------------------
3 AAAA
1 BBB
2 CCCC

I appreciate any hints.

Matt

推荐答案

Matt写于27 / 07/04:
Matt wrote on 27/07/04 :
我被要求压缩一个文本文件,其方式是写出这样一个单词重复的次数。像
这样的东西:

如果这是原始文件:

------------------- --------------------
AAAA BBB CCCC AAAA CCCC AAAA ........

结果将是:
---------------------------------------
3 AAAA
1 BBB
2 CCCC
I was asked to compress a text file in a way that evey word be written
with the number of times that such a word is repeated. Something like
this:

If this is the original file:

---------------------------------------
AAAA BBB CCCC AAAA CCCC AAAA........

the result will be:
---------------------------------------
3 AAAA
1 BBB
2 CCCC




经典的字数统计程序。你的C问题究竟是什么?


-

Emmanuel

C-FAQ: http://www.eskimo.com/~scs/C-faq/faq。 HTML


C是一个锋利的工具



A classical word count program. What exactly is your C-question?

--
Emmanuel
The C-FAQ: http://www.eskimo.com/~scs/C-faq/faq.html

"C is a sharp tool"




" Matt" <毫安******** @ hotmail.com>写了

"Matt" <ma********@hotmail.com> wrote
我被要求压缩一个文本文件,其方式是写出这样一个单词重复的次数。像
这样的东西:

如果这是原始文件:

------------------- --------------------
AAAA BBB CCCC AAAA CCCC AAAA ........

结果将是:
---------------------------------------
3 AAAA
1 BBB
2 CCCC

我很欣赏任何提示。
I was asked to compress a text file in a way that evey word be written
with the number of times that such a word is repeated. Something like
this:

If this is the original file:

---------------------------------------
AAAA BBB CCCC AAAA CCCC AAAA........
the result will be:
---------------------------------------
3 AAAA
1 BBB
2 CCCC

I appreciate any hints.



首先,您要查询您的规格。通常通过压缩来实现。我们的意思是一个

可逆过程,原始文本用更简单的b $ b紧凑格式表示。你得到的更像是一个和谐的程序。


你得到的是典型的一般任务。


首先你需要什么是单词的定义。可能是由空白或

非撇号标点符号分隔的任何序列的b / b
字母字符或撇号都可以。


你需要通过输入文件,输入行并拉出每个

字。请注意,某些文本文件包含非常长的行,因为

换行符仅用作段落标记。


当你拉出每个单词时,你需要检查它是否在

字典中。如果是,则计数递增,否则添加单词。


当您需要优化程序时,有趣的部分就出现了。通过字典搜索
是O(N ^ 2)操作。但是,如果您可以按字母顺序排序

字典,则会减少为O(N log N)。你可以用红黑树做这个

。另一种策略是使用哈希表来存储单词。


你需要问什么运行时间可以接受?知道是否他们正在寻找优化



Firstly you want to query your spec. Usually by "compression" we mean a
reversible process by which the original text is represented in a more
compact format. What you have been given is more like a concordance program.

What you have been given is a typical general assignment.

Firstly you need a definition of what is a word. Probably any sequence of
alphabetical characters or apostrophes separated by whitespace or
non-apostrophe punctuation will do.

You need to go through the input file, inputting lines and pulling out each
word. Beware that some text files contain extremely long lines, because
newline is used only as a paragraph marker.

As you pull out each word, you need to check whether it is in the
dictionary. If it is, the count is incremented, if not the word is added.

The interesting part comes when you need to optimise the program. Searching
through the dictionary is an O(N^2) operation. However if you can keep the
dictionary sorted alphabetically, it reduces to O(N log N). You can do this
using a red-black tree. An alternative strategy is to use a hash table to
store the words.

You need to ask "what running time is acceptable?" to know whether or not
they are looking for optimisation.


Matt写道:

我被要求压缩文本文件的方式是用这个单词重复的次数来写出单词。像
这样的东西:

如果这是原始文件:

------------------- --------------------
AAAA BBB CCCC AAAA CCCC AAAA ........

结果将是:
---------------------------------------
3 AAAA
1 BBB
2 CCCC

我很感激任何提示。

Matt

I was asked to compress a text file in a way that evey word be written
with the number of times that such a word is repeated. Something like
this:

If this is the original file:

---------------------------------------
AAAA BBB CCCC AAAA CCCC AAAA........

the result will be:
---------------------------------------
3 AAAA
1 BBB
2 CCCC

I appreciate any hints.

Matt




您可能希望阅读图灵​​综合中有关压缩的章节

by AK Dewdney,对这个算法的要点有一个很受欢迎的介绍。


-

Julian V. Noble

荣誉退休教授物理学
jv*@lessspamformother.virginia.edu

^^^^^^^^ ^^^^^^^^^^
http:// galileo.phys.virginia.edu/~jvn/


因为从来没有哲学家可以耐心地忍受牙痛

。 " - 嗯。莎士比亚,很多阿多无所事事。法案诉Sc。 1.



You might want to read the chapter on compression in "The Turing Omnibus"
by A.K. Dewdney, for a popular introduction to the gist of this algorithm.

--
Julian V. Noble
Professor Emeritus of Physics
jv*@lessspamformother.virginia.edu
^^^^^^^^^^^^^^^^^^
http://galileo.phys.virginia.edu/~jvn/

"For there was never yet philosopher that could endure the toothache
patiently." -- Wm. Shakespeare, Much Ado about Nothing. Act v. Sc. 1.


这篇关于面试问题。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆