打印到文件(字符串)是最频繁出现的词汇的Objective-C [英] Printing the most frequent words in a file(string) Objective-C

查看:231
本文介绍了打印到文件(字符串)是最频繁出现的词汇的Objective-C的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

新来的Objective-C,需要帮助解决这个问题:

编写一个函数,有两个参数:

  • 1的String重新presenting一个文本文档以及

  • 2的整数提供返回的项目数。实施使得其能返回字符串由单词频率一阶,最频繁出现的字的列表的功能。用你的最佳判断来决定如何字是分开的。你的溶液应该运行在O(n)的时间,其中n是在文档中的字符数。实现这个功能,你会为一个生产/商业体系。您可以使用任何标准的数据结构。

我试过到目前为止(在建工程):`正在进行//函数工作

  //  - (的NSString *)wordFrequency:(INT)itemsToReturn inDocument:(的NSString *)textDocument;
//获取桌面目录(其中文本文件)

NSURL * desktopDirectory = [[的NSFileManager defaultManager] URLForDirectory:NSDesktopDirectory inDomain:NSUserDomainMask appropriateForURL:无创:无错误:无]

 //创建完整的文件路径
 NSURL * FULLPATH = [desktopDirectory URLByAppendingPathComponent:@文档.txt];

 //将字符串
 的NSString *含量= [NSString的stringWithContentsOfURL:FULLPATH编码:NSUTF8StringEncoding错误:无]
 //可选code进行确认 - 检查该文件是在这里和打印内容到控制台
 //的NSLog(@的字符串是:%@,内容);

 //创建一个数组的话包含字符串中
  NSArray的* myWords = [内容componentsSeparatedByString:@];

 //可选code确认 - 数组的内容打印到控制台
 //的NSLog(@阵:%@,myWords);
 //您可以在一个数组对象的NSCountedSet并责令这些对象由其对象计数然后返回一个排序的数组,排序的对象的数量降序排列。

  NSCountedSet * countedSet = [[NSCountedSet页头] initWithArray:myWords]。
  NSMutableArray里* dictArray = [NSMutableArray的阵列]
  [countedSet enumerateObjectsUsingBlock:^(ID OBJ,BOOL *停止){
  [dictArray ADDOBJECT:@ {@字:OBJ,
                               @计:@([countedSet countForObject:OBJ])}]。
    }];

  的NSLog(@字排序计数:%@,[dictArray sortedArrayUsingDescriptors:@ [NSSortDescriptor sortDescriptorWithKey:@算升:否]]]);
 }
返回0;
 }
 

解决方案

这是一个典型的工作对于的map-reduce 。我很熟悉Objective-C的,但据我所知 - 这些概念在里面很容易实现

1日的map-reduce指望出现的次数。
这步骤是根据字基本上分组元素,然后计数它们。

 地图(文字):
   在文本每个单词:
       发射(字,'1')
减少(文字,列表<数量>):
    发射(文字,和(号))
 

有使用地图,减少的另一种方法是使用迭代计算和散列的地图,这将是一个直方图,计数每字occurances的数目

在有数字和occurances的AA列表,你要做的就是真正得到顶K掉他们。这是在这个线程很好地解释道:存储最多5000号从数流。 酒店在这里,所述比较是每个单词#occurances,作为计算在previous步骤

其基本思想是利用分堆,并在其店内 K 第一要素。
现在,迭代的元件的剩余,而如果新的一个大于顶部(在堆最小元件),除去顶部并用新元素替换

最后,你有一个包含 K 最大的元素堆,他们已经在一个堆 - 所以他们已经排序(虽然在相反的顺序,但处理它是相当容易的)。

复杂性是 O(nlogK)

要实现 O(N + klogk)您可以使用选择算法的的代替最小堆溶液获得前k,然后检索元素进行排序。

New to objective-c, need help to solve this:

Write a function that takes two parameters:

  • 1 a String representing a text document and

  • 2 an integer providing the number of items to return. Implement the function such that it returns a list of Strings ordered by word frequency, the most frequently occurring word first. Use your best judgement to decide how words are separated. Your solution should run in O(n) time where n is the number of characters in the document. Implement this function as you would for a production/commercial system. You may use any standard data structures.

What I tried so far (work in progress): ` // Function work in progress

// -(NSString *) wordFrequency:(int)itemsToReturn  inDocument:(NSString *)textDocument ;
//  Get the desktop directory (where the text document is)

NSURL *desktopDirectory = [[NSFileManager defaultManager] URLForDirectory:NSDesktopDirectory inDomain:NSUserDomainMask appropriateForURL:nil create:NO error:nil];

 //  Create full path to the file
 NSURL *fullPath = [desktopDirectory URLByAppendingPathComponent:@"document.txt"];

 //  Load the string
 NSString *content = [NSString stringWithContentsOfURL:fullPath encoding:NSUTF8StringEncoding error:nil];
 //  Optional code for confirmation - Check that the file is here and print its content to the console
 //  NSLog(@" The string is:%@", content);

 // Create an array with the words contain in the string
  NSArray *myWords = [content componentsSeparatedByString:@" "];

 //  Optional code for confirmation - Print content of the array to the console
 //  NSLog(@"array: %@", myWords);
 //  Take an NSCountedSet of objects in an array and order those objects by their object count then returns a sorted array, sorted in descending order by the count of the objects.

  NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:myWords];
  NSMutableArray *dictArray = [NSMutableArray array];
  [countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
  [dictArray addObject:@{@"word": obj,
                               @"count": @([countedSet countForObject:obj])}];
    }];

  NSLog(@"Words sorted by count: %@", [dictArray sortedArrayUsingDescriptors:@[[NSSortDescriptor sortDescriptorWithKey:@"count" ascending:NO]]]);
 }
return 0;
 }

解决方案

This is a classic job for map-reduce. I am very familiar with objective-c, but as far as I know - these concepts are very easily implemented in it.

1st map-reduce is counting the number of occurances.
This step is basically grouping elements according to the word, and then counting them.

map(text):
   for each word in text:
       emit(word,'1')
reduce(word,list<number>):
    emit (word,sum(number))

An alternative for using map-reduce is to use iterative calculation and a hash-map which will be a histogram that counts number of occurances per word.

After you have a a list of numbers and occurances, all you got to do is actually get top k out of them. This is nicely explained in this thread: Store the largest 5000 numbers from a stream of numbers.
In here, the 'comparator' is #occurances of each word, as calculated in previous step.

The basic idea is to use a min-heap, and store k first elements in it.
Now, iterate the remaining of the elements, and if the new one is bigger than the top (minimal element in the heap), remove the top and replace it with the new element.

At the end, you have a heap containing k largest elements, and they are already in a heap - so they are already sorted (though in reversed order, but dealing with it is fairly easy).

Complexity is O(nlogK)

To achieve O(n + klogk) you may use selection algorithm instead of the min-heap solution to get top-k, and then sort the retrieved elements.

这篇关于打印到文件(字符串)是最频繁出现的词汇的Objective-C的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆