HTML注释器,Uima Ruta中的HTML转换器 [英] Html Annotator,Html Converter in Uima Ruta
问题描述
任何人都可以通过一些示例来简要说明有关Html注释器,Html转换器和TEIViewWriter的信息.我想在初始视图中创建注释.
Can anyone briefly explain about the Html annotator, Html converter and TEIViewWriter with some examples.I want to create annotations in the initial view.
等待答案.
主脚本:
PACKAGE uima.ruta.example;
SCRIPT uima.ruta.example.Html;
Document{-> EXEC(Html)};
WORDLIST JOURNALNAMELIST='JournalName.txt';
WORDLIST CITYPUBLIST='CITYPUB.txt';
DECLARE JOURNALNAME;
DECLARE CITYPUB;
Document{ -> MARKFAST(JOURNALNAME, JOURNALNAMELIST)};
Document{ -> MARKFAST(CITYPUB, CITYPUBLIST)};
DECLARE Reference;
"<a name=para(.+?)>(.+?)</a>"-> 2=Reference;
DECLARE FirstToken, LastToken;
BLOCK(InRef) Reference{}
{
ANY{POSITION(Reference,1) -> MARK(FirstToken)};
Document{-> MARKLAST(LastToken)};
}
DECLARE FIRSTWORD;
FirstToken PERIOD CW {->MARK(FIRSTWORD)};
HTML脚本:
PACKAGE uima.ruta.example;
ENGINE utils.HtmlAnnotator;
ENGINE utils.HtmlConverter;
ENGINE utils.HtmlViewWriter;
TYPESYSTEM utils.HtmlTypeSystem;
TYPESYSTEM utils.SourceDocumentInformation;
Document{-> EXEC(HtmlAnnotator)};
Document { -> CONFIGURE(HtmlConverter, "inputView" = "_InitialView","outputView" = "plain"),
EXEC(HtmlConverter)};
Document{ -> CONFIGURE(HtmlViewWriter, "inputView" = "plain","outputView" = "_InitialView", "output" = "E:/ruta-2.4.0-source-release/ruta-2.4.0/example-projects/TextRulerExample/output"),
EXEC(HtmlViewWriter)};
示例HTML输入文件:(通过更改扩展名手动转换为html)
Sample Html Input file:(manually converted into html by changing extension)
<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=Generator content="Microsoft Word 14 (filtered)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin-top:0in;
margin-right:0in;
margin-bottom:10.0pt;
margin-left:0in;
line-height:115%;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
span.DAZZLEFN
{mso-style-name:DAZZLEFN;}
span.DAZZLELN
{mso-style-name:DAZZLELN;
color:#92D050;}
.MsoChpDefault
{font-family:"Calibri","sans-serif";}
.MsoPapDefault
{margin-bottom:10.0pt;
line-height:115%;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
-->
</style>
</head>
<body lang=EN-US>
<div class=WordSection1>
<p class=MsoNormal><a name=para0>REFERENCES</a></p>
<p class=MsoNormal><a name=para1>1.����������� Lawrence RA. A review of the
medical benefits and contraindications to breastfeeding in the United States
[Internet] . Arlington (VA): National Center for Education in Maternal and
Child Health; 1997 Oct [cited 2000 Apr 24]. p. 40. Available from:
www.ncemch.org/pubs/PDFs/Welcometojungle.pdf.</a></p>
<p class=MsoNormal><a name=para2>2.����������� Shishido A. Retraction notice:
Effect of platinum compounds on murine lymphocyte mitogenesis [Retraction of
Alsabti EA, Ghalib ON, Salem MH. In: Jpn J Med Biol 1979 Apr; 32(2):53-65]. Jpn
J Med Sci Biol 1980 Aug;33(4):235-237.</a></p>
<p class=MsoNormal><a name=para3>3.����������� Leist TP, Zinkernagel RM.
Effects of treatment with IL-2 receptor specific monoclonal antibody in mice
[letter] [Retraction of Leist TP, Kohler M, Eppler M, Zinkernagel RM. In: J
Immunol 1989 Jul 15; 143(2): 628-32]. J Immunol 1990 Apr 1;144(7):2847.</a> </p>
<p class=MsoNormal><a name=para4>4.����������� Alsabti EA, Ghalib ON, Salem MH.
Effect of platinum compounds on murine lymphocyte mitogenesis [Retracted by
Shishido A. In: Jpn J Med Sci Biol 1980 Aug; 33(4):235-7]. Jpn J Med Sci Biol
1979 Apr;32(2):53-65.</a></p>
<p class=MsoNormal><a name=para5>5.����������� Tidy JA, Parry GC, Ward P,
Coleman DV, Peto J, Malcolm AD, Farrell PJ. High rate of papillomavirus type 16
infection in cytologically normal cervices [letter] [Retracted by Tidy J,
Farrell PJ. In: Lancet 1989 Dec 23-30:2(8678-8679):1535]. Lancet 1989 Feb 25;1(8635):434.</a></p>
<p class=MsoNormal><a name=para6>6.����������� Magni F, Rossoni G, Berti F.
BN-52021 protects guinea-pig from heard anaphylaxis. Pharm Res Commun 1988
Dec;20 Suppl 5:75-78.</a></p>
<p class=MsoNormal><a name=para7>7.����������� Garvia EE, DeHaven ED. An
experimental analysis of response acquisition and elimination with positive
reinforcers. Behav Neuropsychiatry 1975 a April-1976 May;7(1-12):71-78.</a> </p>
<p class=MsoNormal><a name=para8>8.����������� Mueller FO, Schindler RD. Annual
survey of football injury research 1931-1985. [place unknown]: American
Football Coaches Assn; 1986. 24 p.</a></p>
<p class=MsoNormal><a name=para9>9.����������� Stern, Michael P. National
Institute of Arthritis, Diabetes, and Digestive and Kidney Diseases. Diabetes
in America: diabetes data compiled 1984.. [Bethesda (MD)]: The Institute; 1985
Aug. Diabetes in Hispanic Americans. Chapter 9. (NIH publication; no. 86- 1468).</a></p>
<p class=MsoNormal><a name=para10>10.��������� Vivian, Valerie L, editor. Child
abuse and neglect: a medical community response. 1st AMA National Conference on
Child Abuse and Neglect; 1984 March 30-June 31; Chicago. Chicago: American
Medical Association; 1985. 256 p.</a></p>
<p class=MsoNormal><a name=para11>11.��������� Popper, Hans, et al., editors.
Structural carbohydrates in the liver: proceedings of the 34th Falk Symposium;
1982 oct 12-19; Basil, Switzerland.Boston: MTB Press; 1983. 701 p.</a></p>
<p class=MsoNormal><a name=para12></a> </p>
</div>
</body>
</html>
推荐答案
请注意,您的示例脚本不包含提到的TEIViewWriter.问题是一样的.
Note that you example script does not contain the mentioned TEIViewWriter. The problem is the same, however.
不幸的是,示例脚本有一个错误:
Unfortunately, the exemplary script has an error:
行
Document{ -> CONFIGURE(ViewWriter, "inputView" = "plain",...
应阅读
Document{ -> CONFIGURE(HtmlViewWriter, "inputView" = "plain",
...然后NPE不见了.如果HtmlParser无法解析输入文本,可能会导致另一个异常,导致XMI文件中缺少Sofa.在其中包装文字可能会有所帮助.
... then the NPE is gone. There could be another exception if the input text is not parseable by the HtmlParser resulting is a missing Sofa in the XMI file. Wrapping the text in could help here.
文件HtmlConverter.ruta和TEIConverter.ruta
The files HtmlConverter.ruta and TEIConverter.ruta here are indeed good examples for these components The HtmlAnnotator creates annotations for HTML and XML tags/elements. The HtmlConverter removes all HTML/XML tags, stores the resulting text in a new view and recalculates the offsets of the annotations. The TEIViewWriter is just a ViewWriter with a specific type system, which copies a specific view to a new CAS and stores it. Together, these components are able to convert a TEI/Html/XML text to plain text with annotations for the xml markup.
文档包含更多信息,例如,有关配置参数的信息
The documentation contains more information, e.g., about the configuration parameters
免责声明:我是UIMA Ruta的开发人员
这篇关于HTML注释器,Uima Ruta中的HTML转换器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!