如何在 tf.data.Dataset 中编码字符串? [英] How to encode string in tf.data.Dataset?

查看：38 发布时间：2021/9/5 20:04:10 python tensorflow tokenize

本文介绍了如何在 tf.data.Dataset 中编码字符串?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我试图在 tensorflow 数据集中编码一个字符串，以便用它来训练一个预训练的 RoBERTa 模型.training_dataset 是一个由 Pandas 数据框构成的 tensorflow 数据集，如下所示:

So I am trying to encode a string in a tensorflow dataset in order to use it to train a pretrained RoBERTa model. The training_dataset is a tensorflow dataset made from a pandas dataframe that looks like this:

我使用这个数据框来构建 tf.data.Dataset 使用:

I used this dataframe to construct the tf.data.Dataset using:

features = ['OptionA', 'OptionB', 'OptionC']

training_dataset = (
    tf.data.Dataset.from_tensor_slices(
        (
            tf.cast(train_split[features].values, tf.string),
            tf.cast(train_split['Answer'].values, tf.int32)
        )
    )
)

现在我想使用 RobertaTokenizer 对 3 列 OptionA、OptionB 和 Option C 进行编码，其定义如下:

Now I want to encode the 3 columns OptionA, OptionB and Option C using a RobertaTokenizer, which is defined by:

tokenizer = RobertaTokenizer.from_pretrained("roberta-base")

我试过了:

training_dataset = training_dataset.map(lambda x: tokenizer.encode(x))

但这给了我错误:TypeError: () 需要 1 个位置参数，但给出了 2 个"；我不知道如何处理这个或如何声明我只希望对前三列进行编码.

But this gave me the error: "TypeError: () takes 1 positional argument but 2 were given" and I am not sure how to deal with this or how to state that I only want the first three columns to be encoded.

任何帮助将不胜感激！

推荐答案

training_dataset 有特性和输出，在你的 map 函数中，你只使用了一个变量.试试:

training_dataset has features and outputs, and in your map function, you're only using one variable. Try:

training_dataset = training_dataset.map(lambda x, y: (tokenizer.encode(x), y))

这篇关于如何在 tf.data.Dataset 中编码字符串?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 tf.data.Dataset 中编码字符串? [英] How to encode string in tf.data.Dataset?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在 tf.data.Dataset 中编码字符串? [英] How to encode string in tf.data.Dataset?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭