将可读性公式转换为python函数 [英] Converting Readability formula into python function

查看:202
本文介绍了将可读性公式转换为python函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到了一个称为FRES(Flesch易读性测试)的公式,该公式用于测量文档的可读性:

I was given this formula called FRES (Flesch reading-ease test) that is used to measure the readability of a document:

我的任务是编写一个返回文本FRES的python函数.因此,我需要将此公式转换为python函数.

My task is to write a python function that returns the FRES of a text. Hence I need to convert this formula into a python function.

我已经从一个答案中重新实现了我的代码,我要显示到目前为止我所拥有的以及它给我的结果:

I have re-implemented my code from a answer I got to show what I have so far and the result it has given me:

import nltk
import collections
nltk.download('punkt')
nltk.download('gutenberg')
nltk.download('brown')
nltk.download('averaged_perceptron_tagger')
nltk.download('universal_tagset')

import re
from itertools import chain
from nltk.corpus import gutenberg
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

def compute_fres(text):
    """Return the FRES of a text.
    >>> emma = nltk.corpus.gutenberg.raw('austen-emma.txt')
    >>> compute_fres(emma) # doctest: +ELLIPSIS
    99.40...
    """

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename)
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(count_syllables(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
return(score)

运行代码后,这是我收到的结果消息:

After running the code this is the result message I got:

Failure

Expected :99.40...

Actual   :92.84866041488623

File "C:/Users/PycharmProjects/a1/a1.py", line 60, in a1.compute_fres
Failed example:
    compute_fres(emma) # doctest: +ELLIPSIS

Expected:
    99.40...
Got:
    92.84866041488623

我的函数应该通过doctest并得到99.40 ...而且我也不允许编辑音节函数,因为它是随任务一起提供的:

My function is supposed to pass the doctest and result in 99.40... And I'm also not allowed to edit the syllables function since it came with the task:

import re
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

这个问题非常棘手,但至少现在它给了我一个结果而不是一条错误消息,虽然不确定为什么它给了我不同的结果.

This question has being very tricky but at least now it's giving me a result instead of an error message, not sure why it's giving me a different result though.

任何帮助将不胜感激.谢谢.

Any help will be very appreciated. Thank you.

推荐答案

顺便说一句,有 textstat 库.

BTW, there's the textstat library.

from textstat.textstat import textstat
from nltk.corpus import gutenberg

for filename in gutenberg.fileids():
    print(filename, textstat.flesch_reading_ease(filename))

如果您打算自己编写代码,首先必须

If you're bent on coding up your own, first you've to

  • 确定标点符号是否为单词
  • 定义如何计算不.这个词的音节

如果标点符号是一个单词,而您的问题中的正则表达式计算的是音节,则:

If punctuation is a word and syllables is counted by the regex in your question, then:

import re
from itertools import chain
from nltk.corpus import gutenberg

def num_syllables_per_word(word):
    return len(re.findall('[aeiou]+[^aeiou]+', word))

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename) # i.e. list(chain(*sents))
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(num_syllables_per_word(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
    print(filename, score)

这篇关于将可读性公式转换为python函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆