如何找到无效的链接语法标记? [英] How to find invalid Link Grammar tokens?

查看:33
本文介绍了如何找到无效的链接语法标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用链接语法用于简单语法检查器的 Python3 绑定.虽然链接 API 的文档相对完善,但似乎没有办法访问所有阻止链接的令牌.

I'd like to use the Link Grammar Python3 bindings for a simple grammar checker. While the linkage API is relatively well-documented, there doesn't seem to be way to access all tokens that prevent linkages.

这是我目前所拥有的:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from linkgrammar import Sentence, ParseOptions, Dictionary, __version__
print('Link Grammar Version:', __version__)

for sentence in ['This is a valid sample sentence.', 'I Can Has Cheezburger?']:
    sent = Sentence(sentence, Dictionary(), ParseOptions())
    linkages = sent.parse()
    if len(linkages) > 0:
        print('Valid:', sentence)
    else:
        print('Invalid:', sentence)

(我在测试中使用了 link-grammar-5.4.3.)

(I used link-grammar-5.4.3 for my tests.)

当我使用 Link Parser 命令行工具分析无效例句时,得到以下输出:

When I analyzed the invalid sample sentence using the Link Parser command line tool, I got the following output:

linkparser> I Can Has Cheezburger?
No complete linkages found.
Found 1 linkage (1 had no P.P. violations) at null count 1
    Unique linkage, cost vector = (UNUSED=1 DIS= 0.10 LEN=7)

    +------------------Xp------------------+
    +------------->Wa--------------+       |
    |            +---G--+-----G----+       |
    |            |      |          |       |
LEFT-WALL [I] Can[!] Has[!] Cheezburger[!] ?

如何使用 Python3 获取所有标有 [!] 或 [?] 的潜在无效令牌?

How do I get all potentially invalid tokens marked with [!] or [?] with Python3?

推荐答案

bindings/python-examples/sentence-check.py 中查看它是如何完成的.最好看看最新的repo版本(当前的在这里),因为这个演示程序在 5.4.3 中存在一个错误.

See how it is done in bindings/python-examples/sentence-check.py. It is better to look at the latest repo version (the current one is here), as there was a bug in this demo program at 5.4.3.

具体如下提取词表:

words = list(linkage.words())

未链接的单词包含在 [] 中.附加了 [] 的词是猜测词.例如,[!] 表示该词已被正则表达式分类(出现在文件 4.0.regex 中),然后在字典.如果您将解析选项 display_morphology 设置为 True,则分类正则表达式名称出现在 ! 之后.

Unlinked words are wrapped within []. Words which have [] appended to them are guessed ones. For example, [!] means that the word has been classified by a regex (that appears in the file 4.0.regex) and this classification has then been looked up in the dictionary. If you set the parse-option display_morphology to True, the classifying regex name appears after the !.

这是单词输出格式的完整图例:

Here is the full legend of the word output format:

 [word]            Null-linked word
 word[!]           word classified by a regex
 word[!REGEX_NAME] word classified by REGEX_NAME (turn on by morphology=1)
 word[~]           word generated by a spell guess (unknown original word)
 word[&]           word run-on separated by a spell guess
 word[?]           word is unknown (looked up in the dict as UNKNOWN-WORD)
 word.POS          word found in the dictionary as word.POS
 word.#CORRECTION  word is probably a typo - got linked as CORRECTION

For dictionaries that support morphology (turn on by morphology=1):
 word=             A prefix morpheme
 =word             A suffix morpheme
 word.=            A stem

将输出单词与原始句子单词匹配可能很有用,尤其是在拼写更正或打开形态学的情况下.当您使用 -p 调用它时,上述演示程序 sentence-check.py 会这样做 - 请参阅 if arg.position: 下的代码.

It may be useful to match the output words to the original sentence words, especially in case of spell corrections or when morphology is turned on. The said demo program sentence-check.py does that when you call it with -p - see the code under if arg.position:.

在你的演示语句I Can Has Cheezburger?中,只有词I没有连接,其他词被归类为大写词和作为专有名词链接(G 链接类型).

In the case of your demo sentence I Can Has Cheezburger?, only the word I has no linkage, and the other words have been classified as capitalized-words and got linked as proper nouns (the G link type).

您可以在 summarize 中找到有关链接类型的更多信息- 链接.

这篇关于如何找到无效的链接语法标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆