Tensorflow 2.0将用于nlp的预处理Tonkezier保存到Tensorflow服务器中 [英] Tensorflow 2.0 save preprocessing tonkezier for nlp into tensorflow server

查看:63
本文介绍了Tensorflow 2.0将用于nlp的预处理Tonkezier保存到Tensorflow服务器中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经训练了tensforflow 2.0 keras模型来进行一些自然语言处理.

I have trained a tensforflow 2.0 keras model to make some natural language processing.

我基本上要做的是获得不同新闻的标题,并预测它们所属的类别.为此,我必须标记这些句子,然后添加0以填充数组以具有与我定义的长度相同的长度:

What I am doing basically is get the title of different news and predicting in what category they belong. In order to do that I have to tokenize the sentences and then add 0 to fill the array to have the same lenght that I defined:

 from tensorflow.keras.preprocessing.text import Tokenizer
 from tensorflow.keras.preprocessing.sequence import pad_sequences

 max_words = 1500
 tokenizer = Tokenizer(num_words=max_words )
 tokenizer.fit_on_texts(x.values)
 X = tokenizer.texts_to_sequences(x.values)
 X = pad_sequences(X, maxlen = 32)

  from tensorflow.keras import Sequential
  from tensorflow.keras.layers import Dense, Embedding, LSTM, GRU,InputLayer

  numero_clases = 5

  modelo_sentimiento = Sequential()
  modelo_sentimiento.add(InputLayer(input_tensor=tokenizer.texts_to_sequences, input_shape=(None, 32)))
  modelo_sentimiento.add(Embedding(max_palabras, 128, input_length=X.shape[1]))
  modelo_sentimiento.add(LSTM(256, dropout=0.2, recurrent_dropout=0.2, return_sequences=True))
  modelo_sentimiento.add(LSTM(256, dropout=0.2, recurrent_dropout=0.2))

  modelo_sentimiento.add(Dense(numero_clases, activation='softmax'))
  modelo_sentimiento.compile(loss = 'categorical_crossentropy', optimizer='adam',
                            metrics=['acc',f1_m,precision_m, recall_m])
  print(modelo_sentimiento.summary())

现在,一旦受过训练,我想将其部署在例如tensorflow服务中,但是我不知道如何将该预处理(令牌生成器)保存到服务器中,就像制作scikit-learn管道一样,可以在这里进行?还是我必须保存令牌生成器并自行进行预处理,然后调用经过训练的模型进行预测?

Now once trained I want to deploy it for example in tensorflow serving, but I don't know how to save this preprocessing(tokenizer) into the server, like make a scikit-learn pipeline, it is possible to do it here? or I have to save the tokenizer and make the preprocessing by my self and then call the model trained to predict?

推荐答案

不幸的是,您将无法使用Keras模型做像 sklearn 管道一样优雅的事情(至少我是不知道).当然,您将能够创建自己的Transformer,该变压器将实现所需的预处理.但是考虑到我尝试将自定义对象合并到sklearn管道中的经验,我认为这样做是不值得的.

Unfortunately, you won't be able to do something as elegant as a sklearn Pipeline with Keras models (at least I'm not aware of) easily. Of course you'd be able to create your own Transformer which will achieve the preprocessing you need. But given my experience trying to incorporate custom objects in sklearn pipelines, I don't think it's worth the effort.

您可以做的就是使用以下方式将令牌生成器和元数据保存在一起

What you can do is save the tokenizer along with metadata using,

with open('tokenizer_data.pkl', 'wb') as handle:
    pickle.dump(
        {'tokenizer': tokenizer, 'num_words':num_words, 'maxlen':pad_len}, handle)

然后在要使用它时加载它,

And then load it when you want to use it,

with open("tokenizer_data.pkl", 'rb') as f:
    data = pickle.load(f)
    tokenizer = data['tokenizer']
    num_words = data['num_words']
    maxlen = data['maxlen']

这篇关于Tensorflow 2.0将用于nlp的预处理Tonkezier保存到Tensorflow服务器中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆