使用条件随机字段进行命名实体识别 [英] Using Conditional Random Fields for Named Entity Recognition

查看：136 发布时间：2020/5/9 1:53:10 metadata named-entity-recognition information-extraction crf

本文介绍了使用条件随机字段进行命名实体识别的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

什么是条件随机字段? 有条件的随机字段如何准确地将专有名称标识为人，组织或在结构化或非结构化文本中的位置?

例如:此产品由StackOverFlow Inc.订购.

有条件随机场如何将StackOverFlow Inc.识别为组织?

解决方案

CRF是具有区别性的批处理标签模型，与最大熵马尔可夫模型相同.

完整的解释是书本长度.

简短的解释如下:

人类注释200-500K的文字单词，标记实体.
人类选择了一组希望表示实体的特征.大写字母，或者单词是否在训练集中带有标签的事情.
培训过程会统计所有出现的特征.
CRF算法的实质是搜索符合计数的所有可能模型的空间，以找到一个不错的模型.
在运行时，解码器(可能是维特比解码器)查看一个句子，并决定为每个单词分配什么标签.

其中最困难的部分是特征选择和第4步中的搜索算法.

What is Conditional Random Field? How does exactly Conditional Random Field identify proper names as person, organization, or place in a structured or unstructured text?

For example: This product is ordered by StackOverFlow Inc.

What does Conditional Random Field do to identify StackOverFlow Inc. as an organization?

解决方案

A CRF is a discriminative, batch, tagging model, in the same general family as a Maximum Entropy Markov model.

A full explanation is book-length.

A short explanation is as follows:

Humans annotate 200-500K words of text, marking the entities.
Humans select a set of features that they hope indicate entities. Things like capitalization, or whether the word was seen in the training set with a tag.
A training procedure counts all the occurrences of the features.
The meat of the CRF algorithm search the space of all possible models that fit the counts to find a pretty good one.
At runtime, a decoder (probably a Viterbi decoder) looks at a sentence and decides what tag to assign to each word.

The hard parts of this are feature selection and the search algorithm in step 4.

这篇关于使用条件随机字段进行命名实体识别的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用条件随机字段进行命名实体识别 [英] Using Conditional Random Fields for Named Entity Recognition

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用条件随机字段进行命名实体识别 [英] Using Conditional Random Fields for Named Entity Recognition

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭