HTML注释器,Uima Ruta中的HTML转换器 [英] Html Annotator,Html Converter in Uima Ruta

查看:73
本文介绍了HTML注释器,Uima Ruta中的HTML转换器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以通过一些示例来简要说明有关Html注释器,Html转换器和TEIViewWriter的信息.我想在初始视图中创建注释.

Can anyone briefly explain about the Html annotator, Html converter and TEIViewWriter with some examples.I want to create annotations in the initial view.

等待答案.

主脚本:

 PACKAGE uima.ruta.example;
 SCRIPT uima.ruta.example.Html;
 Document{-> EXEC(Html)};
 WORDLIST JOURNALNAMELIST='JournalName.txt';
 WORDLIST CITYPUBLIST='CITYPUB.txt';
 DECLARE JOURNALNAME;
 DECLARE CITYPUB;
 Document{ -> MARKFAST(JOURNALNAME, JOURNALNAMELIST)};
 Document{ -> MARKFAST(CITYPUB, CITYPUBLIST)};
 DECLARE Reference;
 "<a name=para(.+?)>(.+?)</a>"-> 2=Reference;
 DECLARE FirstToken, LastToken;

 BLOCK(InRef) Reference{}
 {
 ANY{POSITION(Reference,1) -> MARK(FirstToken)};
 Document{-> MARKLAST(LastToken)};
 }
 DECLARE FIRSTWORD;
 FirstToken PERIOD CW {->MARK(FIRSTWORD)};

HTML脚本:

 PACKAGE uima.ruta.example;
 ENGINE utils.HtmlAnnotator;
 ENGINE utils.HtmlConverter;
 ENGINE utils.HtmlViewWriter;
 TYPESYSTEM utils.HtmlTypeSystem;
 TYPESYSTEM utils.SourceDocumentInformation;
 Document{-> EXEC(HtmlAnnotator)};
 Document { -> CONFIGURE(HtmlConverter, "inputView" = "_InitialView","outputView" = "plain"),
 EXEC(HtmlConverter)};
 Document{ -> CONFIGURE(HtmlViewWriter, "inputView" = "plain","outputView" = "_InitialView", "output" = "E:/ruta-2.4.0-source-release/ruta-2.4.0/example-projects/TextRulerExample/output"),
 EXEC(HtmlViewWriter)};

示例HTML输入文件:(通过更改扩展名手动转换为html)

Sample Html Input file:(manually converted into html by changing extension)

<html>
<head>
 <meta http-equiv=Content-Type content="text/html; charset=windows-1252">
 <meta name=Generator content="Microsoft Word 14 (filtered)">
 <style>
 <!--
/* Font Definitions */
 @font-face
 {font-family:Calibri;
 panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
 {margin-top:0in;
 margin-right:0in;
 margin-bottom:10.0pt;
 margin-left:0in;
 line-height:115%;
 font-size:11.0pt;
 font-family:"Calibri","sans-serif";}
span.DAZZLEFN
 {mso-style-name:DAZZLEFN;}
span.DAZZLELN
 {mso-style-name:DAZZLELN;
 color:#92D050;}
.MsoChpDefault
 {font-family:"Calibri","sans-serif";}
.MsoPapDefault
 {margin-bottom:10.0pt;
 line-height:115%;}
@page WordSection1
 {size:8.5in 11.0in;
 margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
 {page:WordSection1;}
-->
</style>

</head>

<body lang=EN-US>

<div class=WordSection1>

<p class=MsoNormal><a name=para0>REFERENCES</a></p>

 <p class=MsoNormal><a name=para1>1.����������� Lawrence RA. A        review of the
 medical benefits and contraindications to breastfeeding in the United    States
 [Internet] . Arlington (VA): National Center for Education in Maternal and
 Child Health; 1997 Oct [cited 2000 Apr 24]. p. 40. Available from:
 www.ncemch.org/pubs/PDFs/Welcometojungle.pdf.</a></p>

 <p class=MsoNormal><a name=para2>2.����������� Shishido A.  Retraction notice:
 Effect of platinum compounds on murine lymphocyte mitogenesis [Retraction of
 Alsabti EA, Ghalib ON, Salem MH. In: Jpn J Med Biol 1979 Apr; 32(2):53-65].      Jpn
 J Med Sci Biol 1980 Aug;33(4):235-237.</a></p>

 <p class=MsoNormal><a name=para3>3.����������� Leist TP,  Zinkernagel RM.
 Effects of treatment with IL-2 receptor specific monoclonal antibody in mice
 [letter] [Retraction of Leist TP, Kohler M, Eppler M, Zinkernagel RM. In: J
 Immunol 1989 Jul 15; 143(2): 628-32]. J Immunol 1990 Apr 1;144(7):2847.</a>  </p>

 <p class=MsoNormal><a name=para4>4.����������� Alsabti EA, Ghalib     ON, Salem MH.
 Effect of platinum compounds on murine lymphocyte mitogenesis [Retracted by
 Shishido A. In: Jpn J Med Sci Biol 1980 Aug; 33(4):235-7]. Jpn J Med Sci  Biol
 1979 Apr;32(2):53-65.</a></p>

 <p class=MsoNormal><a name=para5>5.����������� Tidy JA, Parry GC, Ward P,
 Coleman DV, Peto J, Malcolm AD, Farrell PJ. High rate of papillomavirus type 16
 infection in cytologically normal cervices [letter] [Retracted by Tidy J,
 Farrell PJ. In: Lancet 1989 Dec 23-30:2(8678-8679):1535]. Lancet 1989 Feb   25;1(8635):434.</a></p>

 <p class=MsoNormal><a name=para6>6.����������� Magni F, Rossoni G,  Berti F.
 BN-52021 protects guinea-pig from heard anaphylaxis. Pharm Res Commun 1988
 Dec;20 Suppl 5:75-78.</a></p>

 <p class=MsoNormal><a name=para7>7.����������� Garvia EE, DeHaven ED. An
 experimental analysis of response acquisition and elimination with positive
 reinforcers. Behav Neuropsychiatry 1975 a April-1976 May;7(1-12):71-78.</a>  </p>

 <p class=MsoNormal><a name=para8>8.����������� Mueller FO,   Schindler RD. Annual
 survey of football injury research 1931-1985. [place unknown]: American
 Football Coaches Assn; 1986. 24 p.</a></p>

 <p class=MsoNormal><a name=para9>9.����������� Stern, Michael P.   National
 Institute of Arthritis, Diabetes, and Digestive and Kidney Diseases.   Diabetes
 in America: diabetes data compiled 1984.. [Bethesda (MD)]: The Institute; 1985
 Aug. Diabetes in Hispanic Americans. Chapter 9. (NIH publication; no. 86- 1468).</a></p>

 <p class=MsoNormal><a name=para10>10.��������� Vivian, Valerie L,      editor. Child
 abuse and neglect: a medical community response. 1st AMA National   Conference on
 Child Abuse and Neglect; 1984 March 30-June 31; Chicago. Chicago: American
 Medical Association; 1985. 256 p.</a></p>

 <p class=MsoNormal><a name=para11>11.��������� Popper, Hans, et al.,   editors.
 Structural carbohydrates in the liver: proceedings of the 34th Falk   Symposium;
 1982 oct 12-19; Basil, Switzerland.Boston: MTB Press; 1983. 701 p.</a></p>

 <p class=MsoNormal><a name=para12></a>&nbsp;</p>

 </div>

 </body>

 </html>

推荐答案

请注意,您的示例脚本不包含提到的TEIViewWriter.问题是一样的.

Note that you example script does not contain the mentioned TEIViewWriter. The problem is the same, however.

不幸的是,示例脚本有一个错误:

Unfortunately, the exemplary script has an error:

Document{ -> CONFIGURE(ViewWriter, "inputView" = "plain",...

应阅读

Document{ -> CONFIGURE(HtmlViewWriter, "inputView" = "plain",

...然后NPE不见了.如果HtmlParser无法解析输入文本,可能会导致另一个异常,导致XMI文件中缺少Sofa.在其中包装文字可能会有所帮助.

... then the NPE is gone. There could be another exception if the input text is not parseable by the HtmlParser resulting is a missing Sofa in the XMI file. Wrapping the text in could help here.

文件HtmlConverter.ruta和TEIConverter.ruta

The files HtmlConverter.ruta and TEIConverter.ruta here are indeed good examples for these components The HtmlAnnotator creates annotations for HTML and XML tags/elements. The HtmlConverter removes all HTML/XML tags, stores the resulting text in a new view and recalculates the offsets of the annotations. The TEIViewWriter is just a ViewWriter with a specific type system, which copies a specific view to a new CAS and stores it. Together, these components are able to convert a TEI/Html/XML text to plain text with annotations for the xml markup.

文档包含更多信息,例如,有关配置参数的信息

The documentation contains more information, e.g., about the configuration parameters

免责声明:我是UIMA Ruta的开发人员

这篇关于HTML注释器,Uima Ruta中的HTML转换器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆