如何自动检测首字母缩写词的含义/扩展名 [英] how to automatically detect acronym meaning / extension
问题描述
如何使用NLP/信息提取(IE)方法检测/找出首字母缩写词的含义(扩展名)?
How can you detect / find out the meaning (the extension) of an acronym using NLP / Information Extraction (IE) methods?
我们想在自由文本中检测是否使用了单词或首字母缩写词,并将其映射到相同的实体/令牌.
We want to detect in free text if a word or it's acronym is used and map it to the same entity / token.
在线上提供的大多数论文都是关于医学首字母缩写词的,它们没有提供完成此任务的库.
Most papers available online are about medical acronyms and they do not provide a library for acomplish this task.
有什么想法吗?
推荐答案
阅读您的问题和评论,我了解您想创建从首字母缩写词到其扩展名的映射.
Reading your question and the comments I understand that you want to create a mapping from an acronym to its extension.
假设您有一个同时出现首字母缩写词和其扩展名的文本文档集合,则可以应用一种算法来提取(缩写词,扩展名)对.
Assuming you have a collection of textual documents where both the acronym and its expansion occur you can apply an algorithm to extract (acronym,extension) pairs.
由AS Schwartz和MA Hearst提出的用于识别生物医学文本中缩写定义的简单算法 ,正是通过查看模式来做到这一点.可在此处获得.
A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text by A.S Schwartz and M.A. Hearst, does exactly this by looking at patterns. The Java implementation is available here.
我将此算法应用于英语维基百科,您可以看到结果这里.我还将其应用于一系列葡萄牙语新文章,结果在此处.
I applied this algorithm to the English Wikipedia, you can see the results here. I also applied it to a collection of Portuguese new articles, results are here.
这篇关于如何自动检测首字母缩写词的含义/扩展名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!