如何处理此命名实体分类任务? [英] How do I approach this named-entity classification task?

查看:89
本文介绍了如何处理此命名实体分类任务?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在问一个相关问题

I am asking a related question here but this question is more general. I have taken a large corpora and annotated some words with their named-entities. In my case, they are domain-specific and I call them: Entity, Action, Incident. I want to use these as a seed for extracting more named-entities. For example, following is one sentence:

当机器人出现技术故障时,该物体被抛出,但随后又被另一个机器人抓住.

When the robot had a technical glitch, the object was thrown but was later caught by another robot.

被标记为:

(机器人)/实体出现(技术故障)/事件时, (对象)/实体(抛出)/动作,但后来被(捕获)/动作 (另一个机器人)/实体.

When the (robot)/Entity had a (technical glitch)/Incident, the (object)/Entity was (thrown)/Action but was later (caught)/Action by (another robot)/Entity.

给出这样的例子,我是否仍然可以训练分类器来识别新的命名实体?例如,给定这样的句子:

Given examples like this, is there anyway I can train a classifier to recognize new named-entities? For instance, given a sentence like this:

纳米机器人有一个错误,因此撞到了墙上.

The nanobot had a bug and so it crashed into the wall.

应该这样标记:

(纳米机器人)/实体具有一个(错误)/事件,因此它(崩溃)/操作进入了 (墙)/实体.

The (nanobot)/Entity had a (bug)/Incident and so it (crashed)/Action into the (wall)/Entity.

当然,我知道不可能100%的准确性,但是我想知道任何正式的方法来做到这一点.有什么建议吗?

Of course, I am aware that 100% accuracy is not possible but I would be interested in knowing any formal approaches to do this. Any suggestions?

推荐答案

这根本不是命名实体识别,因为所有标记的部件都不是名称,因此NER系统的功能集对您没有帮助(英语NER系统倾向于非常依赖大写字母,并且会优先使用名词.这是一种信息提取/语义解释.我怀疑这在机器学习环境中会非常困难,因为您的注释确实不一致:

This is not named-entity recognition at all, since none of the labeled parts are names, so the feature sets for NER systems won't help you (English NER systems tend to rely on capitalization quite strongly and will prefer nouns). This is a kind of information extraction/semantic interpretation. I suspect this is going to be quite hard in a machine learning setting because your annotation is really inconsistent:

当(机器人)/实体发生(技术故障)/事件时,(对象)/实体被(抛出)/动作,但后来又被另一个机器人(捕捉)/动作.

When the (robot)/Entity had a (technical glitch)/Incident, the (object)/Entity was (thrown)/Action but was later (caught)/Action by another robot.

为什么没有注释另一个机器人"?

Why is "another robot" not annotated?

如果要解决此类问题,最好从一些正则表达式开始,也许将其与带有POS标签的字符串匹配.

If you want to solve this kind of problem, you'd better start out with some regular expressions, maybe matched against POS-tagged versions of the string.

这篇关于如何处理此命名实体分类任务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆