使用条件随机字段进行命名实体识别 [英] Using Conditional Random Fields for Named Entity Recognition
问题描述
什么是条件随机字段? 有条件的随机字段如何准确地将专有名称标识为人,组织或在结构化或非结构化文本中的位置?
例如:此产品由StackOverFlow Inc.订购.
有条件随机场如何将StackOverFlow Inc.识别为组织?
CRF是具有区别性的批处理标签模型,与最大熵马尔可夫模型相同.
完整的解释是书本长度.
简短的解释如下:
- 人类注释200-500K的文字单词,标记实体.
- 人类选择了一组希望表示实体的特征.大写字母,或者单词是否在训练集中带有标签的事情.
- 培训过程会统计所有出现的特征.
- CRF算法的实质是搜索符合计数的所有可能模型的空间,以找到一个不错的模型.
- 在运行时,解码器(可能是维特比解码器)查看一个句子,并决定为每个单词分配什么标签.
其中最困难的部分是特征选择和第4步中的搜索算法.
What is Conditional Random Field? How does exactly Conditional Random Field identify proper names as person, organization, or place in a structured or unstructured text?
For example: This product is ordered by StackOverFlow Inc.
What does Conditional Random Field do to identify StackOverFlow Inc. as an organization?
A CRF is a discriminative, batch, tagging model, in the same general family as a Maximum Entropy Markov model.
A full explanation is book-length.
A short explanation is as follows:
- Humans annotate 200-500K words of text, marking the entities.
- Humans select a set of features that they hope indicate entities. Things like capitalization, or whether the word was seen in the training set with a tag.
- A training procedure counts all the occurrences of the features.
- The meat of the CRF algorithm search the space of all possible models that fit the counts to find a pretty good one.
- At runtime, a decoder (probably a Viterbi decoder) looks at a sentence and decides what tag to assign to each word.
The hard parts of this are feature selection and the search algorithm in step 4.
这篇关于使用条件随机字段进行命名实体识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!