英文单词的真实定义? [英] True definition of an English word?

查看:93
本文介绍了英文单词的真实定义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

英语单词的最佳定义是什么?

What would be the best definition of an English word?

除了\w+之外,英语单词还有哪些其他情况? 有些可能包含\w+-\w+\w+'\w+;有些可能会排除\b[0-9]+\b这样的情况.但是我还没看 关于这些案件的任何一般性共识. 我们对此有正式定义吗? 你们任何人都可以澄清吗?

What are the other cases of an English word than just \w+? Some may include \w+-\w+ or \w+'\w+; some may exclude cases like \b[0-9]+\b. But I haven't seen any general consensus on those cases. Do we have a formal defintion of such? Can any of you clarify?

(扩大问题范围,因此它不仅取决于regexp.)

( broaden the question so it doesn't depend on regexp only.)

推荐答案

我真的不认为正则表达式会在这里为您提供帮助,因为英语(或与此相关的任何语言)文本的问题在于上下文.没有它,您可以确定单词边界之间是文本,数字,字符的随机集合等.对于NLP,我认为您将选择该语言的子集并查找特定的单词,而不是尝试从字符串中提取所有单词".

I really don't think a regex is going to help you here, the problem with English (or any language for that matter) text is context. Without it you can be sure if what's between the word boundaries is text, a number, a random collection of characters, etc. For an NLP I think you are going to be selecting a subset of the language and looking for specific words rather than trying to extract all 'Words' from a string.

这篇关于英文单词的真实定义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆