如何标记 android studio 中的输入文本以在 NLP 模型中处理? [英] How to tokenize input text in android studio to process in NLP model?

查看：33 发布时间：2021/9/5 19:25:24 android tensorflow tensorflow2.0 tf.keras tensorflow-lite

本文介绍了如何标记 android studio 中的输入文本以在 NLP 模型中处理?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我创建 NLP 模型时，我使用了 keras 标记器来标记我的训练数据.所以训练数据中的每个单词都有一个与之关联的数字.现在我想在 android 应用程序中运行模型.所以我把模型转换成tflite格式.现在在我的应用程序中，当用户给我一个文本输入时，我应该使用我用于训练数据的相同标记将其转换为数字数组.我无法这样做，因为 tflite 仅包含模型而不包含标记器.如何做到这一点?

When I created NLP model, I used keras tokenizer to tokenize my training data. So every word in training data has a number associated with it. Now I want to run the model in android app. So I converted the model into tflite format. Now in my app when the user gives me a text input I should convert it into array of numbers using the same tokens which I used for training data. I am unable to do so because tflite only contains the model and not the tokenizer. How to do this?

推荐答案

您需要将标记词的词汇表从 Python 迁移到 Android.使用 tf.keras.preprocessing.text.Tokenizer.word_index 属性.这是 ( word , index ) 的 dict，您需要将其导出为 JSON 文件.

You need to migrate the vocabulary of tokenized words from Python to Android. Use the tf.keras.preprocessing.text.Tokenizer.word_index property. This is a dict of ( word , index ) which you need to export as a JSON file.

import json

with open( 'android/word_dict.json' , 'w' ) as file:
    json.dump( tokenizer.word_index , file )

现在，我们在 Android 中解析 JSON 文件并创建一个 Hashmap.

Now, we parse the JSON file in Android and create a Hashmap<String,Integer>.

从用户那里获取输入字符串并将其标记化.
接下来，查找 Hashmap 中使用的每个单词的索引.
将这些整数存储在 int[] 中，这是我们模型的输入.

Take the input String from the user and tokenize it.
Next, look for indices of each of the words using in the Hashmap.
Store these Integers in an int[] which is the input for our model.

我已经在这个博客中讨论了整个过程 -> 使用 TensorFlow 在 Android 中进行文本分类

I have discussed the whole process in this blog -> Text Classification in Android with TensorFlow

这篇关于如何标记 android studio 中的输入文本以在 NLP 模型中处理?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何标记 android studio 中的输入文本以在 NLP 模型中处理? [英] How to tokenize input text in android studio to process in NLP model?

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录关闭

如何标记 android studio 中的输入文本以在 NLP 模型中处理? [英] How to tokenize input text in android studio to process in NLP model?

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录 关闭

登录关闭