如何使用机器学习提取公司债券信息 [英] How to extract corporate bonds informations using machine learning
问题描述
我正在一个项目中,我需要从非结构化电子邮件中提取公司债券信息.经过大量研究,我发现机器学习可用于信息提取.我尝试使用Opennlp NER(命名实体识别器),但不确定是否为此问题选择了正确的库,因为我得到的是结果,但未达到要求.
I am working on a project where I need to extract corporate bonds information from the unstructured emails. After doing a lot of research, I found that machine learning can be used for information extraction. I tried Opennlp NER (Named entity recognizer) but I am not sure whether I picked up the correct library for this problem or not because I am getting the results but not up to the mark.
有人可以向我建议任何库或算法吗,这意味着我该如何解析和提取数据.我打算探索朴素贝叶斯或N-gram或支持向量机,但不确定,这对我有没有帮助.请提出建议.
Could someone please suggest me any library or algorithms means how can I parse and extract data from it. I am planning to explore Naïve Bayes or N-gram or Support vector machine but not sure, this will help me or not. Please suggest.
示例如下:
[/] Trading 10mm ABC 2.5 19 05/06 mkt can use 50mm
--->在这里我要提取"ABC 2.5 19"
[/] Trading 10mm ABC 2.5 19 05/06 mkt can use 50mm
---> here I want to extract "ABC 2.5 19"
示例2:
XYZ 6.5 15 10-2B 106-107 B3 AAA- 1.646MM 2x2
--->在这里我要提取"XYZ 6.5 15"
XYZ 6.5 15 10-2B 106-107 B3 AAA- 1.646MM 2x2
---> here I want to extract "XYZ 6.5 15"
推荐答案
在Perl中,您可以使用 Marpa: :R2 —通用的BNF解析器.
In Perl, you can use Marpa::R2 — a general BNF parser.
此要点从您的示例中提取信息.
This gist extracts info from your examples.
希望这会有所帮助.
这篇关于如何使用机器学习提取公司债券信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!