使用斯坦福 NER 从文本文档中提取地址? [英] Using Stanford NER for extracting Address from a text document?

查看:51
本文介绍了使用斯坦福 NER 从文本文档中提取地址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找斯坦福 NER 并考虑使用 JAVA Apis 来提取来自文本文档的邮政地址.该文件可以是任何有邮政地址部分的文件,例如水电费、电费.

I was looking Stanford NER and thinking of using JAVA Apis it to extract postal address from a text document. The document may be any document where there is an postal address section e.g. Utility Bills, electricity bills.

所以我的想法是,

  1. 使用 LOCATION 和其他原始命名实体将邮政地址定义为命名实体.
  2. 定义细分和其他子流程.

我正在尝试寻找相同的示例管道(所需的详细步骤是什么),以前有人这样做过吗?欢迎提出建议.

I am trying to find a example pipeline for the same (what are the steps in details required), anyone has done this before? Suggestions welcome.

推荐答案

需要明确的是:所有功劳都归功于 Raj Vardhan(和 John Bauer),他在 [java-nlp-user] 邮件列表.

To be clear: all credit goes to Raj Vardhan (and John Bauer) who had an interaction on the [java-nlp-user] mailing list.

Raj Vardhan 写了关于在句子中查找街道地址"的工作计划:

Raj Vardhan wrote about the plan to work on "finding street address in a sentence":

这是我想到的一种方法:

Here is an approach I have thought of:

  1. 找出句子中的事件锚
  2. 从该事件节点中选择 SemanticGraph 中的传出边与诸如 *"prep-in" * 或 "prep-at" 之类的关系.
  3. 如果关系中的依赖值的 POS 标记为 NNP
  1. Find the event-anchor in a sentence
  2. Select outgoing-edges in the SemanticGraph from that event-node with relations such as *"prep-in" *or "prep-at".
  3. IF the dependent value in the relation has POS tag as NNP

a) 从依赖值的节点中找到具有以下关系的传出边作为 "nn"

a) Find outgoing-edges from dependent value's node with relations such as "nn"

b) 按出现的递增顺序连接所有此类节点句子.

b) Connect all such nodes in increasing order of occurrence in the sentence.

c) PRINT 结果值作为事件发生的位置

c) PRINT resulting value as Location where the event occurred

这显然是有某些假设的,比如直接依赖在句子中的事件锚和位置之间.

This is obviously with certain assumptions such as direct dependency between the event-anchor and location in a sentence.

不确定这是否可以帮助您,但我想提一下以防万一.同样,任何功劳都应该归功于 Raj Vardhan(和 John Bauer).

Not sure whether this could help you, but I wanted to mention it just in case. Again, any credit should go to Raj Vardhan (and John Bauer).

这篇关于使用斯坦福 NER 从文本文档中提取地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆