如何提取文本中使用的关键字？ [英] How do I extract keywords used in text?

查看：236 发布时间：2018/8/2 12:56:38 text indexing keyword data-mining

本文介绍了如何提取文本中使用的关键字？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何通过数据挖掘一堆文本来获取关键字？（雅各布史密斯或围栏）

How do I data mine a pile of text to get keywords by usage? ("Jacob Smith" or "fence")

是否有软件可以执行此操作？甚至半自动，如果它可以过滤掉简单的单词，如the，and，或，那么我可以更快地找到主题。

And is there a software to do this already? even semi-automatically, and if it can filter out simple words like "the", "and", "or", then I could get to the topics quicker.

推荐答案

通用算法将如下所示：


- Obtain Text
- Strip punctuation, special characters, etc.
- Strip "simple" words
- Split on Spaces
- Loop Over Split Text
    - Add word to Array/HashTable/Etc if it doesn't exist;
       if it does, increment counter for that word

最终结果是文本中所有单词的频率计数。然后，您可以取这些值并除以总字数以获得频率百分比。任何进一步的处理都取决于你。

The end result is a frequency count of all words in the text. You can then take these values and divide by the total number of words to get a percentage of frequency. Any further processing is up to you.

你也想要研究 Stemming 。词干用于减少词根。例如 going =>去， cars =>汽车等。

You're also going to want to look into Stemming. Stemming is used to reduce words to their root. For example going => go, cars => car, etc.

像这样的算法在垃圾邮件过滤器，关键字索引等中很常见。

An algorithm like this is going to be common in spam filters, keyword indexing and the like.

这篇关于如何提取文本中使用的关键字？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何提取文本中使用的关键字？ [英] How do I extract keywords used in text?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何提取文本中使用的关键字？ [英] How do I extract keywords used in text?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭