将英语单词识别为事物或产品? [英] Identify an english word as a thing or product?

查看:81
本文介绍了将英语单词识别为事物或产品?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编写具有以下目标的程序- 能够识别单词/短语是否代表事物/产品.例如 - 1)一种手套,至少包括一个食指容器,一个中指容器."<-能够将手套识别为事物/产品. 2)在调节器窗口中,尤其是对于将窗口连接到驱动器的汽车..."-能够将调节器识别为物体. 这样做告诉我文字是在谈论事物/产品.相比之下,以下文字讨论的是过程而不是事物/产品->用于生产尼龙涂层基材的软包装薄膜的挤出涂层过程,包括以下步骤: "

Write a program with the following objective - be able to identify whether a word/phrase represents a thing/product. For example - 1) "A glove comprising at least an index finger receptacle, a middle finger receptacle.." <-Be able to identify glove as a thing/product. 2) "In a window regulator, especially for automobiles, in which the window is connected to a drive..." <- be able to identify regulator as a thing. Doing this tells me that the text is talking about a thing/product. as a contrast, the following text talks about a process instead of a thing/product -> "An extrusion coating process for the production of flexible packaging films of nylon coated substrates consisting of the steps of..."

我有数百万条这样的文字;因此,手动进行是不可行的.到目前为止,在使用NLTK + Python的帮助下,我已经能够识别出一些使用非常相似的关键字的特定情况.但是我无法对以上示例中提到的种类进行相同的处理.任何帮助将不胜感激!

I have millions of such texts; hence, manually doing it is not feasible. So far, with the help of using NLTK + Python, I have been able to identify some specific cases which use very similar keywords. But I have not been able to do the same with the kinds mentioned in the examples above. Any help will be appreciated!

推荐答案

您实际上想做的事很困难.这是一种(非常特定的)语义标记任务.可能的解决方案是:

What you want to do is actually pretty difficult. It is a sort of (very specific) semantic labelling task. The possible solutions are:

  • 创建您自己的标记算法,创建训练数据,进行测试,评估并最终标记您的数据
  • 使用现有的知识库(词典)为每个目标词提取语义标签

第一个选择本身就是一个复杂的研究项目.如果您有时间和资源,请这样做.

The first option is a complex research project in itself. Do it if you have the time and resources.

第二个选项只会为您提供知识库中可用的标签,而这些标签可能与您的意愿不符.我可以尝试使用python,NLTK和Wordnet(接口已可用),也许可以为您的问题使用同义词集别名.

The second option will only give you the labels that are available in the knowledge base, and these might not match your wishes. I would give it a try with python, NLTK and Wordnet (interface already available), you might be able to use synset hypernyms for your problem.

这篇关于将英语单词识别为事物或产品?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆