如何将 pandas 的多栏文本转换成张量？ [英] how to convert pandas multiple columns of text into tensors?

查看：16 发布时间：2022/2/21 22:59:46 machine-learning deep-learning nlp data-preprocessing

本文介绍了如何将 pandas 的多栏文本转换成张量？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您好，我正在处理关键点分析任务，该任务由IBM共享，这里是link。在给定的数据集中有多行文本，任何人都可以告诉我如何将文本列转换为张量，并在同一个dataFrame中再次赋值它们，因为那里还有其他列的数据。

问题这里我面临着一个问题，我以前从未见过这种数据，比如有多个文本列，我如何将所有这些列转换为张量，然后应用一个模型。大多数情况下，数据是这样的：一个文本列其他栏为标签，例如：电影评论、有毒评论分类。

def clean_text(text): """ text: a string return: modified initial string """ text = text.lower() # lowercase text text = REPLACE_BY_SPACE_RE.sub(' ', text) text = BAD_SYMBOLS_RE.sub('', text) text = text.replace('x', '') # text = re.sub(r'W+', '', text) text = ' '.join(word for word in text.split() if word not in STOPWORDS) return text

推荐答案
如果我答对了您的问题，您将执行以下操作：

from transformers import RobertaTokenizer tokenizer = RobertaTokenizer.from_pretrained("roberta-base") DF["args"]=DF["args"].apply(lambda x:tokenizer(x)['input_ids'])

这将把句子转换为令牌数组。

这篇关于如何将 pandas 的多栏文本转换成张量？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将 pandas 的多栏文本转换成张量？ [英] how to convert pandas multiple columns of text into tensors?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何将 pandas 的多栏文本转换成张量？ [英] how to convert pandas multiple columns of text into tensors?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭