Python中的Flesch-Kincaid可读性测试 [英] Flesch-Kincaid readability test in python

查看:404
本文介绍了Python中的Flesch-Kincaid可读性测试的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解决这个问题.我需要编写一个从文本返回FRES(Flesch易读性测试)的函数.给出公式:

I need help with this problem I'm having. I need to write a function that returns a FRES (Flesch reading-ease test) from a text. Given the formula:

换句话说,我的任务是将这个公式转换为python函数.

In other words my task is to turn this formula into a python function.

这是上一个问题I中的代码有:

import nltk
import collections
nltk.download('punkt')
nltk.download('gutenberg')
nltk.download('brown')
nltk.download('averaged_perceptron_tagger')
nltk.download('universal_tagset')

import re
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

from itertools import chain
from nltk.corpus import gutenberg
def compute_fres(text):
    """Return the FRES of a text.
    >>> emma = nltk.corpus.gutenberg.raw('austen-emma.txt')
    >>> compute_fres(emma) # doctest: +ELLIPSIS
    99.40...
    """

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename)
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(count_syllables(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
return(score)

这是我得到的结果:

Failure
Expected :99.40...

Actual   :92.84866041488623

**********************************************************************
File "C:/Users/PycharmProjects/a1/a1.py", line 60, in a1.compute_fres
Failed example:
    compute_fres(emma) # doctest: +ELLIPSIS
Expected:
    99.40...
Got:
    92.84866041488623

我的任务是通过doctest并得到99.40 ... 我也不允许更改以下代码,因为它是随任务本身提供给我的:

My task is to pass the doctest and result in 99.40... I'm also not allowed the change the following code since it was given to me with the task itself:

import re
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

我觉得我已经接近了,但不确定为什么会得到不同的结果.任何帮助将不胜感激.

I feel like I'm getting close but not sure why I get a different result. Any help will be much appreciated.

推荐答案

这三个num_*变量均为int类型(整数).在大多数编程语言中,将整数相除时,会得到一个整数结果,将其四舍五入,例如14 / 5产生2,而不是2.8.

The three num_* variables are all of type int (integer). When you divide integers in most programming languages, you get an integer result, rounded down, for example 14 / 5 produces 2, not 2.8.

将计算结果更改为

score = 206.835 - 1.015 * (float(num_words) / num_sents) - 84.6 * (num_syllables / float(num_words))

当除法中的一个操作数为float时,另一个也将静默转换为float并执行(精确)浮点除法.尝试float(14)/2.

When one of the operands in a division is a float, the other is also silently converted to a float and (exact) floating-point division is performed. Try float(14)/2.

此外,您的正则表达式VC在元音中不包含"y",并且不将单词末尾的一组元音视为一个音节.这两个错误都忽略了音节的数量,例如count_syllables("myrtle")将返回0.

Additionally, your regular expression VC does not include 'y' among vowels, and does not consider a group of vowels at the end of a word a syllable. Both errors undercount the number of syllables, for example count_syllables("myrtle") will return 0.

这篇关于Python中的Flesch-Kincaid可读性测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆