Python中具有特征结构的上下文无关文法 [英] Context free grammar with feature structure in Python
问题描述
我正在尝试使用python从定义的语法生成句子,以避免使用我使用特征结构的一致性问题,
这是我到目前为止所做的代码:
>>> from __future__ import print_function
>>> import nltk
>>> from nltk.featstruct import FeatStruct
>>> from nltk import grammar, parse
>>> from nltk.parse.generate import generate
>>> from nltk import CFG
>>> g = """
% start DP
DP-> D[AGR=[NUM='sg', PERS=3, GND='m']] N[AGR=[NUM='sg', GND='m']]
D[AGR=[NUM='sg', PERS=3, GND='f']] -> 'une' | 'la'
D[AGR=[NUM='sg', PERS=3, GND='m']] -> 'un' | 'le'
D[AGR=[NUM='pl', PERS=3]] -> 'des' | 'les'
N[AGR=[NUM='sg', GND='m']] -> 'garçon'
N[AGR=[NUM='pl', GND='m']] -> 'garçons'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
N[AGR=[NUM='pl', GND='f']] -> 'filles'
"""
>>> for sentence in generate(grammar, n=30):
print(''.join(sentence))
这是输出结果:
unegarçon
unegarçons
unefille
unefilles
lagarçon
lagarçons
lafille
lafilles
ungarçon
ungarçons
unfille
unfilles
legarçon
legarçons
lefille
lefilles
desgarçon
desgarçons
desfille
desfilles
lesgarçon
lesgarçons
lesfille
lesfilles
应该有这样的输出:
un garçon
le garçon
我遇到的问题是:
-
协议未生效,判决不遵守协议
-
句子中两个单词之间没有空格.
我看不到什么?
让我们首先解决问题的简单部分.
Q2.句子中两个单词之间没有空格.
在打印方面您很接近=)
问题在于您如何使用 str.join
函数.
>>> list_of_str = ['a', 'b', 'c']
>>> ''.join(list_of_str)
'abc'
>>> ' '.join(list_of_str)
'a b c'
>>> '|'.join(list_of_str)
'a|b|c'
Q1.协议没有制定出来,判决不遵守协议
第一个警告标志
要生成具有一致性的特征结构语法,应该有一条规则,其右侧(RHS)包含类似D[AGR=?a] N[AGR=?a]
的内容,例如
NP -> D[AGR=?a] N[AGR=?a]
缺少该内容,语法中没有真正的约定规则,请参见 http://www. nltk.org/howto/featgram.html
现在是陷阱!
如果我们仔细看一下 [输出]:
nltk.parse.generate
代码,它只是产生所有可能的终端组合,而且似乎并不在乎功能结构:[['un', 'garcon'], ['un', 'fille'], ['une', 'garcon'], ['une', 'fille']]
但是,如果我们尝试解析有效和无效的句子,则协议规则会生效:
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
DP -> D[AGR=?a] N[AGR=?a]
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'
"""
grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(grammar)
trees = parser.parse('une garcon'.split()) # Invalid sentence.
print ("Parses for 'une garcon':", list(trees))
trees = parser.parse('un garcon'.split()) # Valid sentence.
print ("Parses for 'un garcon':", list(trees))
[输出]:
Parses for 'une garcon': []
Parses for 'un garcon': [Tree(DP[], [Tree(D[AGR=[GND='m', NUM='sg']], ['un']), Tree(N[AGR=[GND='m', NUM='sg']], ['garcon'])])]
要在生成时达成协议规则,一种可能的解决方案是解析每个生成的产品并保留可解析的产品,例如
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
DP -> D[AGR=?a] N[AGR=?a]
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'
"""
grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(grammar)
for tokens in list(generate(grammar, n=30)):
parsed_tokens = parser.parse(tokens)
try:
first_parse = next(parsed_tokens) # Check if there's a valid parse.
print(' '.join(first_parse.leaves()))
except StopIteration:
continue
[输出]:
un garcon
une fille
我想目标是产生以下内容的最后第二列:
没有介词
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
DP -> D[AGR=?a] N[AGR=?a]
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'
D[AGR=[NUM='sg', GND='m']] -> 'le'
D[AGR=[NUM='sg', GND='f']] -> 'la'
D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'
"""
grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(grammar)
valid_productions = set()
for tokens in list(generate(grammar, n=30)):
parsed_tokens = parser.parse(tokens)
try:
first_parse = next(parsed_tokens) # Check if there's a valid parse.
valid_productions.add(' '.join(first_parse.leaves()))
except StopIteration:
continue
for np in sorted(valid_productions):
print(np)
[输出]:
la fille
le garcon
les filles
les garcons
un garcon
une fille
现在要包括介词
语法的TOP(又称START)必须有多个分支,当前DP -> D[AGR=?a] N[AGR=?a]
规则位于TOP处,以允许PP
构造,我们必须使用PHRASE -> DP | PP
和将PHRASE
非终结符设置为新的TOP,例如
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
PHRASE -> DP | PP
DP -> D[AGR=?a] N[AGR=?a]
PP -> P[AGR=?a] N[AGR=?a]
P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'
D[AGR=[NUM='sg', GND='m']] -> 'le'
D[AGR=[NUM='sg', GND='f']] -> 'la'
D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'
"""
french_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)
valid_productions = set()
for tokens in list(generate(french_grammar, n=100)):
parsed_tokens = parser.parse(tokens)
try:
first_parse = next(parsed_tokens) # Check if there's a valid parse.
valid_productions.add(' '.join(first_parse.leaves()))
except StopIteration:
continue
for np in sorted(valid_productions):
print(np)
[输出]:
au garcon
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille
获取表格中的所有内容:
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
PHRASE -> DP | PP
DP -> D[AGR=?a] N[AGR=?a]
PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]
PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]
P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'
P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'
P[AGR=[NUM='pl']] -> 'des' | 'aux'
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'
D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'
D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'
D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'
D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'
D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'
"""
french_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)
valid_productions = set()
for tokens in list(generate(french_grammar, n=100000)):
parsed_tokens = parser.parse(tokens)
try:
first_parse = next(parsed_tokens) # Check if there's a valid parse.
valid_productions.add(' '.join(first_parse.leaves()))
except StopIteration:
continue
for np in sorted(valid_productions):
print(np)
[输出]:
au garcon
aux filles
aux garcons
de la fille
des filles
des garcons
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille
à la fille
桌子后面
也可以生成de|a un(e) garcon|fille
,即
- de un garcon
- de une fille
- 一个加农船
- 一个圆角
但是我不确定它们是否是有效的法语短语,但是如果是,则可以不指定女性单数PP规则并删除DEF
功能:
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]
收件人:
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]
,然后添加一条附加规则以产生雄奇异的不确定PP
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']]
TL; DR
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
PHRASE -> DP | PP
DP -> D[AGR=?a] N[AGR=?a]
PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]
P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'
P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'
P[AGR=[NUM='pl']] -> 'des' | 'aux'
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'
D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'
D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'
D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'
D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'
D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'
"""
french_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)
valid_productions = set()
for tokens in list(generate(french_grammar, n=100000)):
parsed_tokens = parser.parse(tokens)
try:
first_parse = next(parsed_tokens) # Check if there's a valid parse.
valid_productions.add(' '.join(first_parse.leaves()))
except StopIteration:
continue
for np in sorted(valid_productions):
print(np)
[输出]:
au garcon
aux filles
aux garcons
de la fille
de un garcon
de une fille
des filles
des garcons
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille
à la fille
à un garcon
à une fille
Am trying to generate sentences from a defined grammar with python, to avoid agreement problem I used feature structures,
This is the code I have done so far:
>>> from __future__ import print_function
>>> import nltk
>>> from nltk.featstruct import FeatStruct
>>> from nltk import grammar, parse
>>> from nltk.parse.generate import generate
>>> from nltk import CFG
>>> g = """
% start DP
DP-> D[AGR=[NUM='sg', PERS=3, GND='m']] N[AGR=[NUM='sg', GND='m']]
D[AGR=[NUM='sg', PERS=3, GND='f']] -> 'une' | 'la'
D[AGR=[NUM='sg', PERS=3, GND='m']] -> 'un' | 'le'
D[AGR=[NUM='pl', PERS=3]] -> 'des' | 'les'
N[AGR=[NUM='sg', GND='m']] -> 'garçon'
N[AGR=[NUM='pl', GND='m']] -> 'garçons'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
N[AGR=[NUM='pl', GND='f']] -> 'filles'
"""
>>> for sentence in generate(grammar, n=30):
print(''.join(sentence))
This is the output am getting:
unegarçon
unegarçons
unefille
unefilles
lagarçon
lagarçons
lafille
lafilles
ungarçon
ungarçons
unfille
unfilles
legarçon
legarçons
lefille
lefilles
desgarçon
desgarçons
desfille
desfilles
lesgarçon
lesgarçons
lesfille
lesfilles
While am supposed to have an output like this:
un garçon
le garçon
The problems I have are:
The agreement is not working out, am having sentences that does not respect the agreement
There is no space between the two words in the sentence.
What is that I can't see?
Lets solve the easy part of the question first.
Q2. There is no space between the two words in the sentence.
You're close when it comes to the printing =)
The problem lies in how you're using the str.join
function.
>>> list_of_str = ['a', 'b', 'c']
>>> ''.join(list_of_str)
'abc'
>>> ' '.join(list_of_str)
'a b c'
>>> '|'.join(list_of_str)
'a|b|c'
Q1. The agreement is not working out, am having sentences that does not respect the agreement
First warning sign
To produce feature structure grammar with agreement, there should be a rule that contains something like D[AGR=?a] N[AGR=?a]
on the right-hand-side (RHS), e.g.
NP -> D[AGR=?a] N[AGR=?a]
With that missing there's no real agreement rule in the grammar, see http://www.nltk.org/howto/featgram.html
Now comes the gotcha!
If we look at the nltk.parse.generate
code carefully, it's merely yielding all possible combinations of the terminals and it seems like it's not caring about the feature structures: https://github.com/nltk/nltk/blob/develop/nltk/parse/generate.py
(I think that's a bug not a feature so raising an issue to the NLTK repository would be good)
So if we do this, it'll print all combinations of possible terminals (without caring for the agreement):
from nltk import grammar, parse
from nltk.parse.generate import generate
# If person is always 3rd, we can skip the PERSON feature.
g = """
DP -> D[AGR=?a] N[AGR=?a]
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'
"""
grammar = grammar.FeatureGrammar.fromstring(g)
print(list(generate(grammar, n=30)))
[out]:
[['un', 'garcon'], ['un', 'fille'], ['une', 'garcon'], ['une', 'fille']]
But if we try to parse valid and invalid sentences, the agreement rule kicks in:
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
DP -> D[AGR=?a] N[AGR=?a]
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'
"""
grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(grammar)
trees = parser.parse('une garcon'.split()) # Invalid sentence.
print ("Parses for 'une garcon':", list(trees))
trees = parser.parse('un garcon'.split()) # Valid sentence.
print ("Parses for 'un garcon':", list(trees))
[out]:
Parses for 'une garcon': []
Parses for 'un garcon': [Tree(DP[], [Tree(D[AGR=[GND='m', NUM='sg']], ['un']), Tree(N[AGR=[GND='m', NUM='sg']], ['garcon'])])]
To achieve the agreement rule at generation, one possible solution would be to parse each generated production and keep the parse-able ones, e.g.
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
DP -> D[AGR=?a] N[AGR=?a]
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'
"""
grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(grammar)
for tokens in list(generate(grammar, n=30)):
parsed_tokens = parser.parse(tokens)
try:
first_parse = next(parsed_tokens) # Check if there's a valid parse.
print(' '.join(first_parse.leaves()))
except StopIteration:
continue
[out]:
un garcon
une fille
I guess goal is to produce the last 2nd column of:
Without the prepositions:
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
DP -> D[AGR=?a] N[AGR=?a]
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'
D[AGR=[NUM='sg', GND='m']] -> 'le'
D[AGR=[NUM='sg', GND='f']] -> 'la'
D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'
"""
grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(grammar)
valid_productions = set()
for tokens in list(generate(grammar, n=30)):
parsed_tokens = parser.parse(tokens)
try:
first_parse = next(parsed_tokens) # Check if there's a valid parse.
valid_productions.add(' '.join(first_parse.leaves()))
except StopIteration:
continue
for np in sorted(valid_productions):
print(np)
[out]:
la fille
le garcon
les filles
les garcons
un garcon
une fille
Now to include the preposition
The TOP (aka START) of the grammar has to have more than one branch, currently the DP -> D[AGR=?a] N[AGR=?a]
rule is at the TOP, to allow for a PP
construction, we've to something like PHRASE -> DP | PP
and make the PHRASE
non-terminal the new TOP, e.g.
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
PHRASE -> DP | PP
DP -> D[AGR=?a] N[AGR=?a]
PP -> P[AGR=?a] N[AGR=?a]
P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'
D[AGR=[NUM='sg', GND='m']] -> 'le'
D[AGR=[NUM='sg', GND='f']] -> 'la'
D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'
"""
french_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)
valid_productions = set()
for tokens in list(generate(french_grammar, n=100)):
parsed_tokens = parser.parse(tokens)
try:
first_parse = next(parsed_tokens) # Check if there's a valid parse.
valid_productions.add(' '.join(first_parse.leaves()))
except StopIteration:
continue
for np in sorted(valid_productions):
print(np)
[out]:
au garcon
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille
To get everything in the table:
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
PHRASE -> DP | PP
DP -> D[AGR=?a] N[AGR=?a]
PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]
PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]
P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'
P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'
P[AGR=[NUM='pl']] -> 'des' | 'aux'
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'
D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'
D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'
D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'
D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'
D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'
"""
french_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)
valid_productions = set()
for tokens in list(generate(french_grammar, n=100000)):
parsed_tokens = parser.parse(tokens)
try:
first_parse = next(parsed_tokens) # Check if there's a valid parse.
valid_productions.add(' '.join(first_parse.leaves()))
except StopIteration:
continue
for np in sorted(valid_productions):
print(np)
[out]:
au garcon
aux filles
aux garcons
de la fille
des filles
des garcons
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille
à la fille
Beyond the table
It's also possible to produce de|a un(e) garcon|fille
, i.e.
- de un garcon
- de une fille
- a un garcon
- a une fille
But I'm not sure whether they're valid French phrases, but if they are you can underspecify the feminin singular PP rule and remove the DEF
feature:
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]
to:
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]
and then add an additional rule to produce male singular indefinite PP
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']]
TL;DR
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
PHRASE -> DP | PP
DP -> D[AGR=?a] N[AGR=?a]
PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]
P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'
P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'
P[AGR=[NUM='pl']] -> 'des' | 'aux'
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'
D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'
D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'
D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'
D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'
D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'
"""
french_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)
valid_productions = set()
for tokens in list(generate(french_grammar, n=100000)):
parsed_tokens = parser.parse(tokens)
try:
first_parse = next(parsed_tokens) # Check if there's a valid parse.
valid_productions.add(' '.join(first_parse.leaves()))
except StopIteration:
continue
for np in sorted(valid_productions):
print(np)
[out]:
au garcon
aux filles
aux garcons
de la fille
de un garcon
de une fille
des filles
des garcons
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille
à la fille
à un garcon
à une fille
这篇关于Python中具有特征结构的上下文无关文法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!