word字符n-gram的快速实现 [英] Quick implementation of character n-grams for word

查看:37
本文介绍了word字符n-gram的快速实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了以下代码来计算字符二元组,输出就在下面.我的问题是,如何获得不包括最后一个字符(即 t)的输出?有没有更快更有效的方法来计算字符 n-gram?

I wrote the following code for computing character bigrams and the output is right below. My question is, how do I get an output that excludes the last character (ie t)? and is there a quicker and more efficient method for computing character n-grams?

b='student'
>>> y=[]
>>> for x in range(len(b)):
    n=b[x:x+2]
    y.append(n)
>>> y
['st', 'tu', 'ud', 'de', 'en', 'nt', 't']

这是我想要得到的结果:['st','tu','ud','de','nt]

Here is the result I would like to get:['st','tu','ud','de','nt]

预先感谢您的建议.

推荐答案

生成二元组:

In [8]: b='student'

In [9]: [b[i:i+2] for i in range(len(b)-1)]
Out[9]: ['st', 'tu', 'ud', 'de', 'en', 'nt']

概括为不同的n:

In [10]: n=4

In [11]: [b[i:i+n] for i in range(len(b)-n+1)]
Out[11]: ['stud', 'tude', 'uden', 'dent']

这篇关于word字符n-gram的快速实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆