运行Ruta脚本时CPU使用率过高 [英] CPU usage too high while running Ruta Script

查看:108
本文介绍了运行Ruta脚本时CPU使用率过高的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

运行Ruta Script时CPU使用率过高.因此我打算使用GPU.我是否需要执行任何其他过程才能在GPU机器中运行脚本. Orelse是否有其他解决方案可以减少CPU使用率

CPU usage too high while running Ruta Script.So I plan to use GPU. Whether I need to do any additional process to run the script in GPU machine. Orelse is there any alternative solution to reduce the CPU usage

示例脚本:

PACKAGE uima.ruta.example;

ENGINE utils.PlainTextAnnotator;
TYPESYSTEM utils.PlainTextTypeSystem;

WORDLIST EditorMarkerList = 'EditorMarker.txt';
WORDLIST EnglishStopWordList = 'EnglishStopWords.txt';
WORDLIST FirstNameList = 'FirstNames.txt';
WORDLIST JournalVolumeMarkerList = 'JournalVolumeMarker.txt';
WORDLIST MonthList = 'Months.txt';
WORDLIST PagesMarkerList = 'PagesMarker.txt';
WORDLIST PublisherList = 'Publishers.txt';

DECLARE EditorMarker, EnglishStopWord, FirstName, JournalVolumeMarker,        Month, PagesMarker, PublisherInd;
Document{ -> MARKFAST(EditorMarker, EditorMarkerList)};
Document{ -> MARKFAST(EnglishStopWord,EnglishStopWordList)};
Document{ -> MARKFAST(FirstName, FirstNameList)};
Document{ -> MARKFAST(JournalVolumeMarker, JournalVolumeMarkerList)};
Document{ -> MARKFAST(Month, MonthList)};
Document{ -> MARKFAST(PagesMarker, PagesMarkerList)};
Document{ -> MARKFAST(PublisherInd, PublisherList)};


DECLARE Reference;
Document{-> EXEC(PlainTextAnnotator, {Line, Paragraph})};
Document{-> RETAINTYPE(SPACE, BREAK)};
Line{-REGEXP("CORA:.*") -> MARK(Reference)};
Reference{-> TRIM(SPACE, BREAK)};
Document{-> RETAINTYPE};

DECLARE LParen, RParen;
SPECIAL{REGEXP("[(]") -> MARK(LParen)};
SPECIAL{REGEXP("[)]") -> MARK(RParen)};

DECLARE YearInd;
NUM{REGEXP("19..|20..") -> MARK(YearInd, 1, 2)} SW?{REGEXP("a|b|c|d", true)};
Document{-> RETAINTYPE(SPACE)};
CAP YearInd{-> UNMARK(YearInd)};
Document{-> RETAINTYPE};


DECLARE NameLinker;
W{-PARTOF(NameLinker), REGEXP("and", true) -> MARK(NameLinker)};
COMMA{-PARTOF(NameLinker) -> MARK(NameLinker)};
SEMICOLON{-PARTOF(NameLinker) -> MARK(NameLinker)};
SPECIAL{-PARTOF(NameLinker), REGEXP("&") -> MARK(NameLinker)};

DECLARE FirstNameInd, FirstNameInitial, SingleChar;
CW{-PARTOF(FirstNameInitial), REGEXP(".")} SPECIAL{-    PARTOF(FirstNameInitial), REGEXP("-")} CW{REGEXP(".") ->     MARK(FirstNameInitial,1,2,3,4)} PERIOD;
SPECIAL{-PARTOF(FirstNameInitial), REGEXP("-")} CW{REGEXP(".") ->    MARK(FirstNameInitial,1,2,3)} PERIOD;
CW{-PARTOF(FirstNameInitial), REGEXP(".") -> MARK(FirstNameInitial,1,2)} PERIOD;
CW{-PARTOF(FirstNameInitial), REGEXP(".") -> MARK(FirstNameInitial)} COMMA;
CW{-PARTOF(FirstNameInitial), REGEXP(".") -> MARK(SingleChar)};

DECLARE Quote, QuotedStuff;
SPECIAL[1,2]{REGEXP("[\"'´`‘’"]"), -PARTOF(Quote) -> MARK(Quote)};
Document{-> RETAINTYPE(SPACE)};
W Quote{-> UNMARK(Quote)} W;
Document{-> RETAINTYPE};
BLOCK(InRef) Reference{}{
    Quote ANY+{-PARTOF(Quote) -> MARK(QuotedStuff, 1, 2, 3)} Quote;
}

DECLARE InInd;
W{REGEXP("In", true)-> MARK(InInd)};

DECLARE FirstToken, LastToken;
BLOCK(InRef) Reference{}{
    ANY{POSITION(Reference,1) -> MARK(FirstToken)};
    Document{-> MARKLAST(LastToken)};
}


DECLARE NumPeriod, NumComma, NumColon;
Document{-> RETAINTYPE(SPACE, BREAK)};
NUM PERIOD{-> MARKONCE(NumPeriod)} NUM;
NUM COMMA{-> MARKONCE(NumComma)} NUM;
NUM COLON{-> MARKONCE(NumColon)} NUM;
Document{-> RETAINTYPE};
DECLARE PeriodSep, CommaSep, ColonSep;
PERIOD{-PARTOF(FirstNameInitial), -PARTOF(NumPeriod), -PARTOF(FirstToken) -> MARKONCE (PeriodSep)};
COMMA{-PARTOF(FirstNameInitial), -PARTOF(NumComma), -  PARTOF(FirstToken) -> MARKONCE (CommaSep)};
COLON{-PARTOF(FirstNameInitial), -PARTOF(NumColon), -PARTOF(FirstToken) -> MARKONCE (ColonSep)};

推荐答案

我没有在GPU上运行Ruta的经验,并且与某些具有多个CPU的并行化进程相比,是否具有任何优势.

I have no experience running Ruta on a GPU and if this brings any advantages compared to some parallelized process with multiple CPUs.

Ruta变得势在必行,其结果是您可以根据自己的关心程度快速编写规则,也可以编写缓慢的规则.

Ruta became more and more imperative with the consequence that you can write fast but also slow rules, depending on how much you care.

松散地说,每个规则都是特定注释类型的迭代器.如果在常规类型上有很多迭代器,则UIMA中有很多索引操作.索引操作是花费时间的主要来源,因此应减少索引操作,例如通过减少注释或选择更好的迭代器/规则.

Loosely speaking, each rule is an iterator over a specific type of annotation. If you have many iterators over general types, you have many index operations in UIMA. Index operation are the main source time is spent on, thus they should be reduced, e.g., by reducing annotations or selecting better iterators/rules.

您的规则示例包含许多可能的选项,以改善运行时间(仅是优化的第一次迭代):

Your rule example contains many potential options to improve the runtime (only a first iteration of optimization):

  • 每个MARKFAST都会在RutaBasic(所有原子文本范围)上导致两个嵌套的迭代器,该迭代器在整个文档上迭代7次.而是将规则编译成mtwl并使用TRIE操作.下面是一个示例操作方法: ruta-german-novel -with-dkpro

您在连续的规则(例如第32 + 33行)中有多个重复的起始锚点.您可以使用BLOCK或内联规则在SPECIAL上进行一次迭代:SPECIAL->{Document{REGEXP("[(]") -> MARK(LParen)};Document{REGEXP("[)]") -> MARK(RParen)};};您甚至可以通过在ANY上进行一次遍历并将其全部分类一次,将其与其他类似规则进行组合.

You have several duplicate starting anchors in consecutive rules, e.g., line 32+33. You can iterate once over SPECIAL with BLOCK or inlined rules: SPECIAL->{Document{REGEXP("[(]") -> MARK(LParen)};Document{REGEXP("[)]") -> MARK(RParen)};}; You can even combine it with the other similar rules by iteration once over ANY and classifying all of them only once.

您的规则不应用动态锚定,您不指定规则匹配的开始锚定.例如,第58行中的规则需要遍历所有单词.这不是必需的,因为您还可以使用W @Quote{-> UNMARK(Quote)} W;遍历所有Quote注释,这要快得多.有几种可以通过这种方式优化的规则.

Your rules do not apply dynamic anchoring, you do not specify the starting anchor of the rule match. The rule in line 58 for example needs to iterate over all words. This is not necessary as you can also iterate only over all Quote annotations with W @Quote{-> UNMARK(Quote)} W; which is much faster. There are several rules which can be optimized this way.

如果您具有相同的迭代器,但具有第49-53行中所示的其他顺序依赖性,则应使用FOREACH块.在这里,您可以遍历CW并在每个CW上锚定几个规则.

If you have the same iterators but have additional a sequential dependency like in line 49-53, you should use the FOREACH block. Here, you can iterate over CWs and apply several rules anchoring on each CW.

某些条件确实很慢.例如,您应该避免POSITION(第69行),而应将其替换为MARKFIRST操作.

Some conditions are really slow. For example, you should avoid POSITION (line 69) and replace it with the MARKFIRST action.

正如Renaud所说,Ruta Workbench提供了配置功能.它显示脚本的哪一部分(规则,块)花费了多长时间,以及大部分时间需要哪种语言元素(条件,动作).在那里,您可以很好地了解哪些零件值得优化.

As Renaud mentioned, the Ruta Workbench provides profiling functionality. It displays which part of your script (rule, block) took how long, and also which language element (condition, action) required most of the time. There you get a good indicator which parts are worth to be optimized.

免责声明:我是UIMA Ruta的开发人员

这篇关于运行Ruta脚本时CPU使用率过高的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆