ValueError:TextEncodeInput必须是UNION[TextInputSequence,Tuple[InputSequence,InputSequence]]-标记化BERT/Distilbert错误 [英] ValueError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] - Tokenizing BERT / Distilbert Error

本文介绍了ValueError:TextEncodeInput必须是UNION[TextInputSequence,Tuple[InputSequence,InputSequence]]-标记化BERT/Distilbert错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

def split_data(path):
  df = pd.read_csv(path)
  return train_test_split(df , test_size=0.1, random_state=100)

train, test = split_data(DATA_DIR)
train_texts, train_labels = train['text'].to_list(), train['sentiment'].to_list() 
test_texts, test_labels = test['text'].to_list(), test['sentiment'].to_list() 

train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=0.1, random_state=100)

from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased

train_encodings = tokenizer(train_texts, truncation=True, padding=True)
valid_encodings = tokenizer(valid_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)

当我尝试使用BERT标记器从数据帧中拆分时,我们遇到错误。

推荐答案

我遇到了相同的错误。问题是我的列表中没有,例如:

from transformers import DistilBertTokenizerFast

tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-german-cased')

# create test dataframe
texts = ['Vero Moda Damen Übergangsmantel Kurzmantel Chic Business Coatigan SALE',
         'Neu Herren Damen Sportschuhe Sneaker Turnschuhe Freizeit 1975 Schuhe Gr. 36-46',
         'KOMBI-ANGEBOT Zuckerpaste STRONG / SOFT / ZUBEHÖR -Sugaring Wachs Haarentfernung',
         None]

labels = [1, 2, 3, 1]

d = {'texts': texts, 'labels': labels} 
test_df = pd.DataFrame(d)

因此,在将Dataframe列转换为List之前,我删除了所有None行。

test_df = test_df.dropna()
texts = test_df["texts"].tolist()
texts_encodings = tokenizer(texts, truncation=True, padding=True)

这对我很有效。

这篇关于ValueError:TextEncodeInput必须是UNION[TextInputSequence,Tuple[InputSequence,InputSequence]]-标记化BERT/Distilbert错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆