在LanguageTool中,如何创建字典并将其用于拼写检查? [英] In LanguageTool, how do you create a dictionary and use it for spell checking?

查看:152
本文介绍了在LanguageTool中,如何创建字典并将其用于拼写检查?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用语言工具"创建用于拼写检查的字典?我不是Java程序员,这是我第一次看到LT.

解决方案

您好,这是我为使用Language Tool创建字典进行拼写检查的经验!希望你喜欢.

第1部分:如何创建字典

您需要:

•一个.txt文件,其中带有字典

•一个.info文件,用于指定有关如何设置LT输出文件的信息(该文件已经存在于LT目录中).

•LanguageTool独立版本

•Java 8

在本节的最后,您将拥有:

•.dict文件,即带有您的词典的LT可读格式的文件

  1. 安装LT的LAST版本: https://languagetool.org/下载/快照/?C = M; O = D
  2. 请确保您的.txt采用正确的格式(a)和编码(b): 一种. 1个字面标准线 b. UTF8编码
  3. 在命令行中输入: 一种. java -cp languagetool.jar org.languagetool.tools.SpellDictionaryBuilder fr_FR -i 字典文件的路径 -info .info文件的路径 -o 的路径输出文件

其中:

i. fr_FR是与字典语言相关的代码

ii. –i是输入文件(您的.txt)的参数

iii. –info是与字典相关的.info文件的参数.您可以按照以下说明进行创建( http://wiki.languagetool.org/hunspell-support -配置字典"部分)或使用\ org \ languagetool \ resource \ yourlanguage

中已经存在的.info(如果存在)

iv. –o它是用于指定您希望将.dict输出文件保存在何处的参数


第2部分:如何在LT上集成字典以进行拼写检查

您需要:

•JDK 1.8( http://www. oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html )

•Maven( https://maven.apache.org/download.cgi )

•Java的IDE(JetBrains,Eclipse等)

•.info文件+ .dict文件(请参阅第1部分)

•GitHub LanguageTool项目( https://github.com/languagetool-org/languagetool )

  1. 设置JDK和Maven bin路径(更多信息: https://maven.apache.org/install .html )
  2. 将在part1中创建的.info和.dict文件复制到\ languagetool-master \ languagetool-language-modules \ YourLanguage \ src \ main \ resources \ org \ languagetool \ resource \ YourLanguage \ hunspell
  3. 使用您的IDE打开称为字典语言的Java文件(例如French.java):

a.将YourLanguage.java中的HunspellNoSuggestionRule更改为MorfologikYourLanguageSpellerRule

 @Override
  public List<Rule> getRelevantRules(ResourceBundle messages) throws IOException {
    return Arrays.asList(
new CommaWhitespaceRule(messages),
new DoublePunctuationRule(messages),
new GenericUnpairedBracketsRule(messages,
Arrays.asList("[", "(", "{" /*"«", "‘"*/),
Arrays.asList("]", ")", "}"
/*"»", French dialog can contain multiple sentences. */
/*"’" used in "d’arm" and many other words */)),
new MorfologikYourLanguageSpellerRule(messages, this),
new UppercaseSentenceStartRule(messages, this),
new MultipleWhitespaceRule(messages, this),
new SentenceWhitespaceRule(messages),
// specific to French:
new CompoundRule(messages),
new QuestionWhitespaceRule(messages)
);
}

b.在\ languagetool-master \ languagetool-language-modules \ YourLanguage \ src \ main \ java \ org \ languagetool \ rules \ YourLanguage中创建新的MorfologikYourLanguageSpellerRule.java.

/* LanguageTool, a natural language style checker
 * Copyright (C) 2012 Marcin Miłkowski (http://www.languagetool.org)
 *
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * This library is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public
 * License along with this library; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301
 * USA
 */

package org.languagetool.rules.fr;

import java.io.IOException;
import java.util.ResourceBundle;

import org.languagetool.Language;
import org.languagetool.rules.spelling.morfologik.MorfologikSpellerRule;

public final class MorfologikYourLanguageSpellerRule extends MorfologikSpellerRule {

    public static final String RULE_ID = "MORFOLOGIK_RULE_CODEOFYOURLANGUAGE"; /* for ex. Fr_FR for French */

    private static final String RESOURCE_FILENAME = "PATH TO YOUR .DICT FILE";

    public MorfologikFrenchSpellerRule(ResourceBundle messages,
                                      Language language) throws IOException {
    super(messages, language);
  }

    @Override
    public String getFileName() {
        return RESOURCE_FILENAME;
    }

    @Override
    public String getId() {
        return RULE_ID;
    }
}

c.使用命令行转到\ languagetool-master \并编写:Mvn程序包

d.在\ languagetool-master \ languagetool-standalone \ target \ LanguageTool-3.4-SNAPSHOT \ LanguageTool-3.4-SNAPSHOT中查看结果.

How do you create a dictionary for spell checking with Language Tool? I'm not a Java programmer and it was the first time I saw LT.

解决方案

Hello this is my experience in creating a dictionary for spell checking with Language Tool ! Hope you enjoy it.

Part 1: How to create the dictionary

You need:

• A .txt file with the dictionary inside

• An .info file specifying the info on how to set LT output file (It is already present in LT directory).

• LanguageTool standalone version

• Java 8

At the end of this section, you will have:

• a .dict file i.e. the file with your dictionary in a readable form for LT

  1. Install the LAST version of LT: https://languagetool.org/download/snapshots/?C=M;O=D
  2. Be sure to have your .txt in the right format (a) and encoding (b): a. 1 word par line b. UTF8 encoding
  3. In the command line write: a. java -cp languagetool.jar org.languagetool.tools.SpellDictionaryBuilder fr_FR -i path of the dictionary file -info path of the .info file -o path of the output file

where:

i. fr_FR is the code related to the language of the dictionary

ii. –i it’s the parameter of the input file (your .txt)

iii. –info it’s the parameter of the .info file related to the dictionary. You can create it following these instructions (http://wiki.languagetool.org/hunspell-support - "Configuring the dictionary" section) or use the .info already present – if present – in \org\languagetool\resource\yourlanguage

iv. –o it’s the parameter for specifing where you wish to save the .dict output file


Part 2: How to integrate the dictionary on LT for spell checking

You need:

• JDK 1.8 (http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)

• Maven (https://maven.apache.org/download.cgi)

• IDE for Java (JetBrains, Eclipse, etc.)

• .info file + .dict file (see part1)

• GitHub LanguageTool project (https://github.com/languagetool-org/languagetool)

  1. Set the JDK and Maven bin path (more info: https://maven.apache.org/install.html)
  2. Copy the .info and .dict files created on part1 in \languagetool-master\languagetool-language-modules\YourLanguage\src\main\resources\org\languagetool\resource\YourLanguage\hunspell
  3. Open with your IDE the java file called as the language of your dictionary (for ex. French.java) :

a. Change HunspellNoSuggestionRule in YourLanguage.java to MorfologikYourLanguageSpellerRule

 @Override
  public List<Rule> getRelevantRules(ResourceBundle messages) throws IOException {
    return Arrays.asList(
new CommaWhitespaceRule(messages),
new DoublePunctuationRule(messages),
new GenericUnpairedBracketsRule(messages,
Arrays.asList("[", "(", "{" /*"«", "‘"*/),
Arrays.asList("]", ")", "}"
/*"»", French dialog can contain multiple sentences. */
/*"’" used in "d’arm" and many other words */)),
new MorfologikYourLanguageSpellerRule(messages, this),
new UppercaseSentenceStartRule(messages, this),
new MultipleWhitespaceRule(messages, this),
new SentenceWhitespaceRule(messages),
// specific to French:
new CompoundRule(messages),
new QuestionWhitespaceRule(messages)
);
}

b. Create the new MorfologikYourLanguageSpellerRule.java in \languagetool-master\languagetool-language-modules\YourLanguage\src\main\java\org\languagetool\rules\YourLanguage :

/* LanguageTool, a natural language style checker
 * Copyright (C) 2012 Marcin Miłkowski (http://www.languagetool.org)
 *
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * This library is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public
 * License along with this library; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301
 * USA
 */

package org.languagetool.rules.fr;

import java.io.IOException;
import java.util.ResourceBundle;

import org.languagetool.Language;
import org.languagetool.rules.spelling.morfologik.MorfologikSpellerRule;

public final class MorfologikYourLanguageSpellerRule extends MorfologikSpellerRule {

    public static final String RULE_ID = "MORFOLOGIK_RULE_CODEOFYOURLANGUAGE"; /* for ex. Fr_FR for French */

    private static final String RESOURCE_FILENAME = "PATH TO YOUR .DICT FILE";

    public MorfologikFrenchSpellerRule(ResourceBundle messages,
                                      Language language) throws IOException {
    super(messages, language);
  }

    @Override
    public String getFileName() {
        return RESOURCE_FILENAME;
    }

    @Override
    public String getId() {
        return RULE_ID;
    }
}

c. Go to \languagetool-master\ with your command line and write : Mvn package

d. See your results in \languagetool-master\languagetool-standalone\target\LanguageTool-3.4-SNAPSHOT\LanguageTool-3.4-SNAPSHOT.

这篇关于在LanguageTool中,如何创建字典并将其用于拼写检查?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆