使用NLTK和德国语料库从名词中获取性别 [英] Get gender from noun using NLTK with German corpora
问题描述
我正在尝试NTLK.我的问题是图书馆是否可以检测德语中名词的性别.我想接收此信息,以确定文本是否写成性别中立.浏览此处获取更多信息: https://en.wikipedia.org/wiki/Gender_neutrality_in_languages_with_grammatical_gender
I'm experimenting with NTLK. My question is if the library can detect the gender of a noun in German. I want to receive this information in order to determine if a text is written gender neutral. See here for more information: https://en.wikipedia.org/wiki/Gender_neutrality_in_languages_with_grammatical_gender
底层代码对我的句子进行了分类,但是我看不到有关"Mitarbeiter" 性别的任何信息.到目前为止,我的代码:
The underlying code categorizes my sentence, but I can't see any information about the gender of "Mitarbeiter". My code so far:
sentence = """Der Mitarbeiter geht."""
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
>>> tagged[0:6]
到目前为止,我还没有找到可以完成此任务的工具或脚本.也许还有更好的解决方案可以解决我的任务.
I haven't found any tools or scripts which accomplish this so far. Maybe there's also a better solution for my task.
推荐答案
我不认为NLTK可以为德语提供现成的功能.但是,有免费的德语形态标记器可以为您做到这一点,例如RFTagger:
I don't believe NLTK can do that out of the box for German. However, there are freely available morphological taggers for German which can do that for you, for example RFTagger:
http://www.cis.uni-muenchen.de /〜schmid/tools/RFTagger/
它给出这样的输出:
Das PRO.Dem.Subst.-3.Nom.Sg.Neut
ist VFIN.Sein.3.Sg.Pres.Ind
ein ART.Indef.Nom.Sg.Masc
Testsatz N.Reg.Nom.Sg.Masc
. SYM.Pun.Sent
但是它不是在Python中,因此您必须使用子进程来调用它.另一种选择是获取带有为德国性别标记的名词的语料库,例如Tiger语料库:
However it is not in Python, so you would have to call it using subprocess. Another option would be to obtain a corpus with nouns tagged for German gender, such as the Tiger corpus:
http://www.ims.uni -stuttgart.de/forschung/ressourcen/korpora/tiger.en.html
并训练NLTK识别性别,但是我希望RFTagger是一种更快/更准确的解决方案.
and train NLTK to recognize the genders, but I would expect RFTagger is a quicker/more accurate solution.
这篇关于使用NLTK和德国语料库从名词中获取性别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!