Python中具有特征结构的上下文无关文法 [英] Context free grammar with feature structure in Python

查看:86
本文介绍了Python中具有特征结构的上下文无关文法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python从定义的语法生成句子,以避免使用我使用特征结构的一致性问题,

这是我到目前为止所做的代码:

>>> from __future__ import print_function
   >>> import nltk
   >>> from nltk.featstruct import FeatStruct
   >>> from nltk import grammar, parse
   >>> from nltk.parse.generate import generate
   >>> from nltk import CFG
   >>> g = """
    % start DP
    DP-> D[AGR=[NUM='sg', PERS=3, GND='m']] N[AGR=[NUM='sg', GND='m']]
    D[AGR=[NUM='sg', PERS=3, GND='f']] -> 'une' | 'la'
    D[AGR=[NUM='sg', PERS=3, GND='m']] -> 'un' | 'le'
    D[AGR=[NUM='pl', PERS=3]] -> 'des' | 'les'
    N[AGR=[NUM='sg', GND='m']] -> 'garçon'
    N[AGR=[NUM='pl', GND='m']] -> 'garçons'
    N[AGR=[NUM='sg', GND='f']] -> 'fille'
    N[AGR=[NUM='pl', GND='f']] -> 'filles'
    """
        >>> for sentence in generate(grammar, n=30):
            print(''.join(sentence))

这是输出结果:

unegarçon
unegarçons
unefille
unefilles
lagarçon
lagarçons
lafille
lafilles
ungarçon
ungarçons
unfille
unfilles
legarçon
legarçons
lefille
lefilles
desgarçon
desgarçons
desfille
desfilles
lesgarçon
lesgarçons
lesfille
lesfilles

应该有这样的输出:

un garçon
le garçon

我遇到的问题是:

  1. 协议未生效,判决不遵守协议

  2. 句子中两个单词之间没有空格.

我看不到什么?

解决方案

让我们首先解决问题的简单部分.

Q2.句子中两个单词之间没有空格.

在打印方面您很接近=)

问题在于您如何使用 str.join 函数.

>>> list_of_str = ['a', 'b', 'c']
>>> ''.join(list_of_str)
'abc'
>>> ' '.join(list_of_str)
'a b c'
>>> '|'.join(list_of_str)
'a|b|c'


Q1.协议没有制定出来,判决不遵守协议

第一个警告标志

要生成具有一致性的特征结构语法,应该有一条规则,其右侧(RHS)包含类似D[AGR=?a] N[AGR=?a]的内容,例如

NP -> D[AGR=?a] N[AGR=?a] 

缺少该内容,语法中没有真正的约定规则,请参见 http://www. nltk.org/howto/featgram.html

现在是陷阱!

如果我们仔细看一下nltk.parse.generate代码,它只是产生所有可能的终端组合,而且似乎并不在乎功能结构:

[输出]:

[['un', 'garcon'], ['un', 'fille'], ['une', 'garcon'], ['une', 'fille']]

但是,如果我们尝试解析有效​​和无效的句子,则协议规则会生效:

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """
DP -> D[AGR=?a] N[AGR=?a] 
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'

"""

grammar =  grammar.FeatureGrammar.fromstring(g)

parser = parse.FeatureEarleyChartParser(grammar)

trees = parser.parse('une garcon'.split()) # Invalid sentence.
print ("Parses for 'une garcon':", list(trees)) 

trees = parser.parse('un garcon'.split()) # Valid sentence.
print ("Parses for 'un garcon':", list(trees)) 

[输出]:

Parses for 'une garcon': []
Parses for 'un garcon': [Tree(DP[], [Tree(D[AGR=[GND='m', NUM='sg']], ['un']), Tree(N[AGR=[GND='m', NUM='sg']], ['garcon'])])]

要在生成时达成协议规则,一种可能的解决方案是解析每个生成的产品并保留可解析的产品,例如

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """
DP -> D[AGR=?a] N[AGR=?a] 
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'

"""

grammar =  grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(grammar)

for tokens in list(generate(grammar, n=30)):
    parsed_tokens = parser.parse(tokens)
    try: 
        first_parse = next(parsed_tokens) # Check if there's a valid parse.
        print(' '.join(first_parse.leaves()))
    except StopIteration:
        continue

[输出]:

un garcon
une fille


我想目标是产生以下内容的最后第二列:

没有介词

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """
DP -> D[AGR=?a] N[AGR=?a] 

N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'

N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'

D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'

D[AGR=[NUM='sg', GND='m']] -> 'le'
D[AGR=[NUM='sg', GND='f']] -> 'la'

D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'


"""

grammar =  grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(grammar)

valid_productions = set()

for tokens in list(generate(grammar, n=30)):
    parsed_tokens = parser.parse(tokens)
    try: 
        first_parse = next(parsed_tokens) # Check if there's a valid parse.
        valid_productions.add(' '.join(first_parse.leaves()))
    except StopIteration:
        continue

for np in sorted(valid_productions):
    print(np)

[输出]:

la fille
le garcon
les filles
les garcons
un garcon
une fille

现在要包括介词

语法的TOP(又称START)必须有多个分支,当前DP -> D[AGR=?a] N[AGR=?a]规则位于TOP处,以允许PP构造,我们必须使用PHRASE -> DP | PP和将PHRASE非终结符设置为新的TOP,例如

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """

PHRASE -> DP | PP 

DP -> D[AGR=?a] N[AGR=?a] 
PP -> P[AGR=?a] N[AGR=?a] 

P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'

N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'

N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'

D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'

D[AGR=[NUM='sg', GND='m']] -> 'le'
D[AGR=[NUM='sg', GND='f']] -> 'la'

D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'

"""

french_grammar =  grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)

valid_productions = set()

for tokens in list(generate(french_grammar, n=100)):
    parsed_tokens = parser.parse(tokens)
    try: 
        first_parse = next(parsed_tokens) # Check if there's a valid parse.
        valid_productions.add(' '.join(first_parse.leaves()))
    except StopIteration:
        continue

for np in sorted(valid_productions):
    print(np)

[输出]:

au garcon
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille


获取表格中的所有内容:

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """

PHRASE -> DP | PP 

DP -> D[AGR=?a] N[AGR=?a] 
PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]
PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]


P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'
P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'
P[AGR=[NUM='pl']] -> 'des' | 'aux'


N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'

N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'

D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'
D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'

D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'
D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'

D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'



"""

french_grammar =  grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)

valid_productions = set()

for tokens in list(generate(french_grammar, n=100000)):
    parsed_tokens = parser.parse(tokens)
    try: 
        first_parse = next(parsed_tokens) # Check if there's a valid parse.
        valid_productions.add(' '.join(first_parse.leaves()))
    except StopIteration:
        continue

for np in sorted(valid_productions):
    print(np)

[输出]:

au garcon
aux filles
aux garcons
de la fille
des filles
des garcons
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille
à la fille


桌子后面

也可以生成de|a un(e) garcon|fille,即

  • de un garcon
  • de une fille
  • 一个加农船
  • 一个圆角

但是我不确定它们是否是有效的法语短语,但是如果是,则可以不指定女性单数PP规则并删除DEF功能:

PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]

收件人:

PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]

,然后添加一条附加规则以产生雄奇异的不确定PP

PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']]

TL; DR

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """

PHRASE -> DP | PP 

DP -> D[AGR=?a] N[AGR=?a] 
PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]


P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'
P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'
P[AGR=[NUM='pl']] -> 'des' | 'aux'


N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'

N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'

D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'
D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'

D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'
D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'

D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'



"""

french_grammar =  grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)

valid_productions = set()

for tokens in list(generate(french_grammar, n=100000)):
    parsed_tokens = parser.parse(tokens)
    try: 
        first_parse = next(parsed_tokens) # Check if there's a valid parse.
        valid_productions.add(' '.join(first_parse.leaves()))
    except StopIteration:
        continue

for np in sorted(valid_productions):
    print(np)

[输出]:

au garcon
aux filles
aux garcons
de la fille
de un garcon
de une fille
des filles
des garcons
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille
à la fille
à un garcon
à une fille

Am trying to generate sentences from a defined grammar with python, to avoid agreement problem I used feature structures,

This is the code I have done so far:

>>> from __future__ import print_function
   >>> import nltk
   >>> from nltk.featstruct import FeatStruct
   >>> from nltk import grammar, parse
   >>> from nltk.parse.generate import generate
   >>> from nltk import CFG
   >>> g = """
    % start DP
    DP-> D[AGR=[NUM='sg', PERS=3, GND='m']] N[AGR=[NUM='sg', GND='m']]
    D[AGR=[NUM='sg', PERS=3, GND='f']] -> 'une' | 'la'
    D[AGR=[NUM='sg', PERS=3, GND='m']] -> 'un' | 'le'
    D[AGR=[NUM='pl', PERS=3]] -> 'des' | 'les'
    N[AGR=[NUM='sg', GND='m']] -> 'garçon'
    N[AGR=[NUM='pl', GND='m']] -> 'garçons'
    N[AGR=[NUM='sg', GND='f']] -> 'fille'
    N[AGR=[NUM='pl', GND='f']] -> 'filles'
    """
        >>> for sentence in generate(grammar, n=30):
            print(''.join(sentence))

This is the output am getting:

unegarçon
unegarçons
unefille
unefilles
lagarçon
lagarçons
lafille
lafilles
ungarçon
ungarçons
unfille
unfilles
legarçon
legarçons
lefille
lefilles
desgarçon
desgarçons
desfille
desfilles
lesgarçon
lesgarçons
lesfille
lesfilles

While am supposed to have an output like this:

un garçon
le garçon

The problems I have are:

  1. The agreement is not working out, am having sentences that does not respect the agreement

  2. There is no space between the two words in the sentence.

What is that I can't see?

解决方案

Lets solve the easy part of the question first.

Q2. There is no space between the two words in the sentence.

You're close when it comes to the printing =)

The problem lies in how you're using the str.join function.

>>> list_of_str = ['a', 'b', 'c']
>>> ''.join(list_of_str)
'abc'
>>> ' '.join(list_of_str)
'a b c'
>>> '|'.join(list_of_str)
'a|b|c'


Q1. The agreement is not working out, am having sentences that does not respect the agreement

First warning sign

To produce feature structure grammar with agreement, there should be a rule that contains something like D[AGR=?a] N[AGR=?a] on the right-hand-side (RHS), e.g.

NP -> D[AGR=?a] N[AGR=?a] 

With that missing there's no real agreement rule in the grammar, see http://www.nltk.org/howto/featgram.html

Now comes the gotcha!

If we look at the nltk.parse.generate code carefully, it's merely yielding all possible combinations of the terminals and it seems like it's not caring about the feature structures: https://github.com/nltk/nltk/blob/develop/nltk/parse/generate.py

(I think that's a bug not a feature so raising an issue to the NLTK repository would be good)

So if we do this, it'll print all combinations of possible terminals (without caring for the agreement):

from nltk import grammar, parse
from nltk.parse.generate import generate

# If person is always 3rd, we can skip the PERSON feature.
g = """
DP -> D[AGR=?a] N[AGR=?a] 
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'

"""

grammar =  grammar.FeatureGrammar.fromstring(g)

print(list(generate(grammar, n=30)))

[out]:

[['un', 'garcon'], ['un', 'fille'], ['une', 'garcon'], ['une', 'fille']]

But if we try to parse valid and invalid sentences, the agreement rule kicks in:

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """
DP -> D[AGR=?a] N[AGR=?a] 
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'

"""

grammar =  grammar.FeatureGrammar.fromstring(g)

parser = parse.FeatureEarleyChartParser(grammar)

trees = parser.parse('une garcon'.split()) # Invalid sentence.
print ("Parses for 'une garcon':", list(trees)) 

trees = parser.parse('un garcon'.split()) # Valid sentence.
print ("Parses for 'un garcon':", list(trees)) 

[out]:

Parses for 'une garcon': []
Parses for 'un garcon': [Tree(DP[], [Tree(D[AGR=[GND='m', NUM='sg']], ['un']), Tree(N[AGR=[GND='m', NUM='sg']], ['garcon'])])]

To achieve the agreement rule at generation, one possible solution would be to parse each generated production and keep the parse-able ones, e.g.

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """
DP -> D[AGR=?a] N[AGR=?a] 
N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'
D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'

"""

grammar =  grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(grammar)

for tokens in list(generate(grammar, n=30)):
    parsed_tokens = parser.parse(tokens)
    try: 
        first_parse = next(parsed_tokens) # Check if there's a valid parse.
        print(' '.join(first_parse.leaves()))
    except StopIteration:
        continue

[out]:

un garcon
une fille


I guess goal is to produce the last 2nd column of:

Without the prepositions:

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """
DP -> D[AGR=?a] N[AGR=?a] 

N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'

N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'

D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'

D[AGR=[NUM='sg', GND='m']] -> 'le'
D[AGR=[NUM='sg', GND='f']] -> 'la'

D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'


"""

grammar =  grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(grammar)

valid_productions = set()

for tokens in list(generate(grammar, n=30)):
    parsed_tokens = parser.parse(tokens)
    try: 
        first_parse = next(parsed_tokens) # Check if there's a valid parse.
        valid_productions.add(' '.join(first_parse.leaves()))
    except StopIteration:
        continue

for np in sorted(valid_productions):
    print(np)

[out]:

la fille
le garcon
les filles
les garcons
un garcon
une fille

Now to include the preposition

The TOP (aka START) of the grammar has to have more than one branch, currently the DP -> D[AGR=?a] N[AGR=?a] rule is at the TOP, to allow for a PP construction, we've to something like PHRASE -> DP | PP and make the PHRASE non-terminal the new TOP, e.g.

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """

PHRASE -> DP | PP 

DP -> D[AGR=?a] N[AGR=?a] 
PP -> P[AGR=?a] N[AGR=?a] 

P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'

N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'

N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'

D[AGR=[NUM='sg', GND='m']] -> 'un'
D[AGR=[NUM='sg', GND='f']] -> 'une'

D[AGR=[NUM='sg', GND='m']] -> 'le'
D[AGR=[NUM='sg', GND='f']] -> 'la'

D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'

"""

french_grammar =  grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)

valid_productions = set()

for tokens in list(generate(french_grammar, n=100)):
    parsed_tokens = parser.parse(tokens)
    try: 
        first_parse = next(parsed_tokens) # Check if there's a valid parse.
        valid_productions.add(' '.join(first_parse.leaves()))
    except StopIteration:
        continue

for np in sorted(valid_productions):
    print(np)

[out]:

au garcon
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille


To get everything in the table:

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """

PHRASE -> DP | PP 

DP -> D[AGR=?a] N[AGR=?a] 
PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]
PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]


P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'
P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'
P[AGR=[NUM='pl']] -> 'des' | 'aux'


N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'

N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'

D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'
D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'

D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'
D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'

D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'



"""

french_grammar =  grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)

valid_productions = set()

for tokens in list(generate(french_grammar, n=100000)):
    parsed_tokens = parser.parse(tokens)
    try: 
        first_parse = next(parsed_tokens) # Check if there's a valid parse.
        valid_productions.add(' '.join(first_parse.leaves()))
    except StopIteration:
        continue

for np in sorted(valid_productions):
    print(np)

[out]:

au garcon
aux filles
aux garcons
de la fille
des filles
des garcons
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille
à la fille


Beyond the table

It's also possible to produce de|a un(e) garcon|fille, i.e.

  • de un garcon
  • de une fille
  • a un garcon
  • a une fille

But I'm not sure whether they're valid French phrases, but if they are you can underspecify the feminin singular PP rule and remove the DEF feature:

PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg', DEF='d']] N[AGR=[GND='f', NUM='sg']]

to:

PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]

and then add an additional rule to produce male singular indefinite PP

PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']]

TL;DR

from nltk import grammar, parse
from nltk.parse.generate import generate

g = """

PHRASE -> DP | PP 

DP -> D[AGR=?a] N[AGR=?a] 
PP -> P[AGR=[GND='m', NUM='sg']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='f', NUM='sg']] N[AGR=[GND='f', NUM='sg']]
PP -> P[AGR=[GND='f', NUM='sg']] D[AGR=[GND='m', NUM='sg', DEF='i']] N[AGR=[GND='m', NUM='sg']]
PP -> P[AGR=[GND=?a, NUM='pl']] N[AGR=[GND=?a, NUM='pl']]


P[AGR=[NUM='sg', GND='m']] -> 'du' | 'au'
P[AGR=[NUM='sg', GND='f']] -> 'de' | 'à'
P[AGR=[NUM='pl']] -> 'des' | 'aux'


N[AGR=[NUM='sg', GND='m']] -> 'garcon'
N[AGR=[NUM='sg', GND='f']] -> 'fille'

N[AGR=[NUM='pl', GND='m']] -> 'garcons'
N[AGR=[NUM='pl', GND='f']] -> 'filles'

D[AGR=[NUM='sg', GND='m', DEF='i']] -> 'un'
D[AGR=[NUM='sg', GND='f', DEF='i']] -> 'une'

D[AGR=[NUM='sg', GND='m', DEF='d']] -> 'le'
D[AGR=[NUM='sg', GND='f', DEF='d']] -> 'la'

D[AGR=[NUM='pl', GND='m']] -> 'les'
D[AGR=[NUM='pl', GND='f']] -> 'les'



"""

french_grammar =  grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(french_grammar)

valid_productions = set()

for tokens in list(generate(french_grammar, n=100000)):
    parsed_tokens = parser.parse(tokens)
    try: 
        first_parse = next(parsed_tokens) # Check if there's a valid parse.
        valid_productions.add(' '.join(first_parse.leaves()))
    except StopIteration:
        continue

for np in sorted(valid_productions):
    print(np)

[out]:

au garcon
aux filles
aux garcons
de la fille
de un garcon
de une fille
des filles
des garcons
du garcon
la fille
le garcon
les filles
les garcons
un garcon
une fille
à la fille
à un garcon
à une fille

这篇关于Python中具有特征结构的上下文无关文法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆