绘制条件频率分布时以百分比格式显示 y 轴 [英] Displaying y-axis in percentage format when plotting conditional frequency distibution

查看:39
本文介绍了绘制条件频率分布时以百分比格式显示 y 轴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在为文本语料库中的某些词组绘制条件频率分布时,y 轴显示为计数,而不是百分比

When plotting conditional frequency distribution for some set of words in text corpora, y-axis is displayed as counts, not percentages

我遵循 Steven Bird、Ewan Klein & 在使用 Python 进行自然语言处理"中概述的代码.Edward Loper将在Jupyter笔记本中显示UDHR不同语言的单词频率分布.

I follow the code outlined in "Natural Language Processing with Python" by Steven Bird, Ewan Klein & Edward Loper to display the frequency distribution of words for different languages of UDHR in Jupyter Notebooks.

from nltk.corpus import udhr
languages = ['Chickasaw', 'English', 'German_Deutsch', 'Greenlandic_Inuktikut', 'Hungarian_Magyar', 'Ibibio_Efik']
cfd = nltk.ConditionalFreqDist((lang, len(word)) for lang in languages\
                                                 for word in udhr.words(lang + '-Latin1'))
cfd.plot(cumulative = True)

我希望 y 轴显示累积百分比(如书中所述),但 y 轴显示累积计数.请告知如何将 y 轴转换为累积百分比.

I expect y-axis to display cumulative percentage (as in the book), but instead y-axis shows cumulative counts. Please advise on how to turn y-axis into cumulative percentages.

推荐答案

以下是一种解决方案,它将提供您要查找的输出:

Here is a solution which will provide the output you are looking for:

inltk.download('udhr')
import pandas as pd
from nltk.corpus import udhr

languages = ['Chickasaw', 'English', 'German_Deutsch', 'Greenlandic_Inuktikut', 'Hungarian_Magyar', 'Ibibio_Efik']

cfd = nltk.ConditionalFreqDist(
    (lang, len(word))
    for lang in languages
    for word in udhr.words(lang + '-Latin1'))

def plot_freq(lang):
    max_length = max([len(word) for word in udhr.words(lang + '-Latin1')])
    eng_freq_dist = {}

    for i in range(max_length + 1):
        eng_freq_dist[i] = cfd[lang].freq(i)

    ed = pd.Series(eng_freq_dist, name=lang)

    ed.cumsum().plot(legend=True, title='Cumulative Distribution of Word Lengths')

然后,我们可以使用此新功能来绘制示例中提供的所有语言:

Then we can use this new function to plot all the languages provided in the example:

for lang in languages:
plot_freq(lang)

在此线程中,我们将讨论选自 NLTK书第2章的示例.

In this thread we are disscusing examples taken from the NLTK book Chapter 2.

这篇关于绘制条件频率分布时以百分比格式显示 y 轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆