张量 a (707) 的大小必须与非单维 1 处的张量 b (512) 的大小匹配 [英] The size of tensor a (707) must match the size of tensor b (512) at non-singleton dimension 1

查看：50 发布时间：2021/9/5 19:33:41 python tensorflow pytorch tokenize bert-language-model

本文介绍了张量 a (707) 的大小必须与非单维 1 处的张量 b (512) 的大小匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用预训练的 BERT 模型进行文本分类.我在我的数据集上训练了模型，并处于测试阶段；我知道 BERT 只能接受 512 个令牌，所以我写了 if 条件来检查我的数据帧中测试语句的长度.如果它长于 512，我将句子分成序列，每个序列有 512 个标记.然后进行分词器编码.序列的长度为 512，但是，在进行标记化编码后，长度变为 707，并且出现此错误.

I am trying to do text classification using pretrained BERT model. I trained the model on my dataset, and in the phase of testing; I know that BERT can only take to 512 tokens, so I wrote if condition to check the length of the test senetence in my dataframe. If it is longer than 512 I split the sentence into sequences each sequence has 512 token. And then do tokenizer encode. The length of the seqience is 512, however, after doing tokenize encode the length becomes 707 and I get this error.

The size of tensor a (707) must match the size of tensor b (512) at non-singleton dimension 1

这是我用来执行前面步骤的代码:

Here is the code I used to do the preivous steps:

tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)
import math

pred=[]
if (len(test_sentence_in_df.split())>512):
  
  n=math.ceil(len(test_sentence_in_df.split())/512)
  for i in range(n):
    if (i==(n-1)):
      print(i)
      test_sentence=' '.join(test_sentence_in_df.split()[i*512::])
    else:
      print("i in else",str(i))
      test_sentence=' '.join(test_sentence_in_df.split()[i*512:(i+1)*512])
      
      #print(len(test_sentence.split()))  ##here's the length is 512
    tokenized_sentence = tokenizer.encode(test_sentence)
    input_ids = torch.tensor([tokenized_sentence]).cuda()
    print(len(tokenized_sentence)) #### here's the length is 707
    with torch.no_grad():
      output = model(input_ids)
      label_indices = np.argmax(output[0].to('cpu').numpy(), axis=2)
    pred.append(label_indices)

print(pred)

张量 a (707) 的大小必须与非单维 1 处的张量 b (512) 的大小匹配 [英] The size of tensor a (707) must match the size of tensor b (512) at non-singleton dimension 1

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

张量 a (707) 的大小必须与非单维 1 处的张量 b (512) 的大小匹配 [英] The size of tensor a (707) must match the size of tensor b (512) at non-singleton dimension 1

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭