将英语单词识别为事物或产品? [英] Identify an english word as a thing or product?
问题描述
编写具有以下目标的程序- 能够识别单词/短语是否代表事物/产品.例如 - 1)一种手套,至少包括一个食指容器,一个中指容器."<-能够将手套识别为事物/产品. 2)在调节器窗口中,尤其是对于将窗口连接到驱动器的汽车..."-能够将调节器识别为物体. 这样做告诉我文字是在谈论事物/产品.相比之下,以下文字讨论的是过程而不是事物/产品->用于生产尼龙涂层基材的软包装薄膜的挤出涂层过程,包括以下步骤: "
Write a program with the following objective - be able to identify whether a word/phrase represents a thing/product. For example - 1) "A glove comprising at least an index finger receptacle, a middle finger receptacle.." <-Be able to identify glove as a thing/product. 2) "In a window regulator, especially for automobiles, in which the window is connected to a drive..." <- be able to identify regulator as a thing. Doing this tells me that the text is talking about a thing/product. as a contrast, the following text talks about a process instead of a thing/product -> "An extrusion coating process for the production of flexible packaging films of nylon coated substrates consisting of the steps of..."
我有数百万条这样的文字;因此,手动进行是不可行的.到目前为止,在使用NLTK + Python的帮助下,我已经能够识别出一些使用非常相似的关键字的特定情况.但是我无法对以上示例中提到的种类进行相同的处理.任何帮助将不胜感激!
I have millions of such texts; hence, manually doing it is not feasible. So far, with the help of using NLTK + Python, I have been able to identify some specific cases which use very similar keywords. But I have not been able to do the same with the kinds mentioned in the examples above. Any help will be appreciated!
推荐答案
您实际上想做的事很困难.这是一种(非常特定的)语义标记任务.可能的解决方案是:
What you want to do is actually pretty difficult. It is a sort of (very specific) semantic labelling task. The possible solutions are:
- 创建您自己的标记算法,创建训练数据,进行测试,评估并最终标记您的数据
- 使用现有的知识库(词典)为每个目标词提取语义标签
第一个选择本身就是一个复杂的研究项目.如果您有时间和资源,请这样做.
The first option is a complex research project in itself. Do it if you have the time and resources.
第二个选项只会为您提供知识库中可用的标签,而这些标签可能与您的意愿不符.我可以尝试使用python,NLTK和Wordnet(接口已可用),也许可以为您的问题使用同义词集别名.
The second option will only give you the labels that are available in the knowledge base, and these might not match your wishes. I would give it a try with python, NLTK and Wordnet (interface already available), you might be able to use synset hypernyms for your problem.
这篇关于将英语单词识别为事物或产品?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!