使用条件随机字段进行命名实体识别 [英] Using Conditional Random Fields for Named Entity Recognition

查看:136
本文介绍了使用条件随机字段进行命名实体识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是条件随机字段? 有条件的随机字段如何准确地将专有名称标识为人,组织或在结构化或非结构化文本中的位置?

例如:此产品由StackOverFlow Inc.订购.

有条件随机场如何将StackOverFlow Inc.识别为组织?

解决方案

CRF是具有区别性的批处理标签模型,与最大熵马尔可夫模型相同.

完整的解释是书本长度.

简短的解释如下:

  1. 人类注释200-500K的文字单词,标记实体.
  2. 人类选择了一组希望表示实体的特征.大写字母,或者单词是否在训练集中带有标签的事情.
  3. 培训过程会统计所有出现的特征.
  4. CRF算法的实质是搜索符合计数的所有可能模型的空间,以找到一个不错的模型.
  5. 在运行时,解码器(可能是维特比解码器)查看一个句子,并决定为每个单词分配什么标签.

其中最困难的部分是特征选择和第4步中的搜索算法.

What is Conditional Random Field? How does exactly Conditional Random Field identify proper names as person, organization, or place in a structured or unstructured text?

For example: This product is ordered by StackOverFlow Inc.

What does Conditional Random Field do to identify StackOverFlow Inc. as an organization?

解决方案

A CRF is a discriminative, batch, tagging model, in the same general family as a Maximum Entropy Markov model.

A full explanation is book-length.

A short explanation is as follows:

  1. Humans annotate 200-500K words of text, marking the entities.
  2. Humans select a set of features that they hope indicate entities. Things like capitalization, or whether the word was seen in the training set with a tag.
  3. A training procedure counts all the occurrences of the features.
  4. The meat of the CRF algorithm search the space of all possible models that fit the counts to find a pretty good one.
  5. At runtime, a decoder (probably a Viterbi decoder) looks at a sentence and decides what tag to assign to each word.

The hard parts of this are feature selection and the search algorithm in step 4.

这篇关于使用条件随机字段进行命名实体识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆