如何在字符级别对句子进行一次热编码? [英] How to one-hot-encode sentences at the character level?

查看：87 发布时间：2020/5/18 0:43:04 python pandas numpy nlp one-hot-encoding

本文介绍了如何在字符级别对句子进行一次热编码?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将一个句子转换为一个单向向量数组. 这些向量将是字母的一键表示. 看起来像以下内容:

I would like to convert a sentence to an array of one-hot vector. These vector would be the one-hot representation of the alphabet. It would look like the following:

"hello" # h=7, e=4 l=11 o=14

将成为

[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
 [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

不幸的是，来自sklearn的OneHotEncoder不会将其作为输入字符串.

Unfortunately OneHotEncoder from sklearn does not take as input string.

推荐答案

只需将您传递的字符串中的字母与给定的字母进行比较:

Just compare the letters in your passed string to a given alphabet:

def string_vectorizer(strng, alphabet=string.ascii_lowercase):
    vector = [[0 if char != letter else 1 for char in alphabet] 
                  for letter in strng]
    return vector

请注意，使用自定义字母(例如，"defbcazk"，各列将按原始列表中每个元素的顺序排列).

Note that, with a custom alphabet (e.g. "defbcazk", the columns will be ordered as each element appears in the original list).

string_vectorizer('hello')的输出:

[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

这篇关于如何在字符级别对句子进行一次热编码?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在字符级别对句子进行一次热编码? [英] How to one-hot-encode sentences at the character level?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在字符级别对句子进行一次热编码? [英] How to one-hot-encode sentences at the character level?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭