“停用词"英文清单? [英] "Stop words" list for English?

查看:35
本文介绍了“停用词"英文清单?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为一些英语文本生成一些统计数据,我想跳过诸如a"和the"之类的无趣词.

I'm generating some statistics for some English-language text and I would like to skip uninteresting words such as "a" and "the".

  • 在哪里可以找到这些无趣的单词的列表?
  • 这些单词的列表是否与英语中最常用的单词列表相同?

更新:这些显然被称为停用词"而不是跳过词".

update: these are apparently called "stop words" and not "skip words".

推荐答案

放入 Google 的神奇词是停用词".这会产生一个看起来合理的列表.

The magic word to put into Google is "stop words". This turns up a reasonable-looking list.

MySQL 还有一个内置的停用词列表,但这对我的口味来说太全面了.例如,在我们的大学图书馆,我们遇到了问题,因为第三世界"中的第三"被认为是停用词.

MySQL also has a built-in list of stop words, but this is far too comprehensive to my tastes. For example, at our university library we had problems because "third" in "third world" was considered a stop word.

这篇关于“停用词"英文清单?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆