如何找到无效的链接语法标记? [英] How to find invalid Link Grammar tokens?
问题描述
我想使用链接语法用于简单语法检查器的 Python3 绑定.虽然链接 API 的文档相对完善,但似乎没有办法访问所有阻止链接的令牌.
I'd like to use the Link Grammar Python3 bindings for a simple grammar checker. While the linkage API is relatively well-documented, there doesn't seem to be way to access all tokens that prevent linkages.
这是我目前所拥有的:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from linkgrammar import Sentence, ParseOptions, Dictionary, __version__
print('Link Grammar Version:', __version__)
for sentence in ['This is a valid sample sentence.', 'I Can Has Cheezburger?']:
sent = Sentence(sentence, Dictionary(), ParseOptions())
linkages = sent.parse()
if len(linkages) > 0:
print('Valid:', sentence)
else:
print('Invalid:', sentence)
(我在测试中使用了 link-grammar-5.4.3.)
(I used link-grammar-5.4.3 for my tests.)
当我使用 Link Parser 命令行工具分析无效例句时,得到以下输出:
When I analyzed the invalid sample sentence using the Link Parser command line tool, I got the following output:
linkparser> I Can Has Cheezburger?
No complete linkages found.
Found 1 linkage (1 had no P.P. violations) at null count 1
Unique linkage, cost vector = (UNUSED=1 DIS= 0.10 LEN=7)
+------------------Xp------------------+
+------------->Wa--------------+ |
| +---G--+-----G----+ |
| | | | |
LEFT-WALL [I] Can[!] Has[!] Cheezburger[!] ?
如何使用 Python3 获取所有标有 [!] 或 [?] 的潜在无效令牌?
How do I get all potentially invalid tokens marked with [!] or [?] with Python3?
推荐答案
在 bindings/python-examples/sentence-check.py
中查看它是如何完成的.最好看看最新的repo版本(当前的在这里),因为这个演示程序在 5.4.3 中存在一个错误.
See how it is done in bindings/python-examples/sentence-check.py
.
It is better to look at the latest repo version (the current one is here), as there was a bug in this demo program at 5.4.3.
具体如下提取词表:
words = list(linkage.words())
未链接的单词包含在 []
中.附加了 []
的词是猜测词.例如,[!]
表示该词已被正则表达式分类(出现在文件 4.0.regex
中),然后在字典.如果您将解析选项 display_morphology
设置为 True
,则分类正则表达式名称出现在 !
之后.
Unlinked words are wrapped within []
. Words which have []
appended to them are guessed ones. For example, [!]
means that the word has been classified by a regex (that appears in the file 4.0.regex
) and this classification has then been looked up in the dictionary. If you set the parse-option display_morphology
to True
, the classifying regex name appears after the !
.
这是单词输出格式的完整图例:
Here is the full legend of the word output format:
[word] Null-linked word
word[!] word classified by a regex
word[!REGEX_NAME] word classified by REGEX_NAME (turn on by morphology=1)
word[~] word generated by a spell guess (unknown original word)
word[&] word run-on separated by a spell guess
word[?] word is unknown (looked up in the dict as UNKNOWN-WORD)
word.POS word found in the dictionary as word.POS
word.#CORRECTION word is probably a typo - got linked as CORRECTION
For dictionaries that support morphology (turn on by morphology=1):
word= A prefix morpheme
=word A suffix morpheme
word.= A stem
将输出单词与原始句子单词匹配可能很有用,尤其是在拼写更正或打开形态学的情况下.当您使用 -p
调用它时,上述演示程序 sentence-check.py
会这样做 - 请参阅 if arg.position:
下的代码.
It may be useful to match the output words to the original sentence words, especially in case of spell corrections or when morphology is turned on. The said demo program sentence-check.py
does that when you call it with -p
- see the code under if arg.position:
.
在你的演示语句I Can Has Cheezburger?
中,只有词I
没有连接,其他词被归类为大写词和作为专有名词链接(G
链接类型).
In the case of your demo sentence I Can Has Cheezburger?
, only the word I
has no linkage, and the other words have been classified as capitalized-words and got linked as proper nouns (the G
link type).
您可以在 summarize 中找到有关链接类型的更多信息- 链接.
这篇关于如何找到无效的链接语法标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!