使用深度学习突出显示句子中的重要单词 [英] Highlighting important words in a sentence using Deep Learning

查看:76
本文介绍了使用深度学习突出显示句子中的重要单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图突出显示imdb数据集中的重要词,这些词最终有助于情感分析预测.

I am trying to highlight important words in imdb dataset which contributed finally to the sentiment analysis prediction .

数据集如下:

X_train-以字符串形式进行审核.

X_train - A review as string .

Y_train-0或1

Y_train - 0 or 1

现在,在使用Glove嵌入来嵌入X_train值之后,我可以将其馈送到神经网络了.

Now after using Glove embeddings for embedding the X_train value I can feed it to a neural net .

现在我的问题是,我该如何在概率论中突出显示最重要的单词?就像deepmoji.mit.edu吗?

Now my question is , how can I highlight the most important words probability wise ? just like deepmoji.mit.edu ?

我尝试了什么:

  1. 我尝试将输入的句子拆分为二元语法,并使用一维CNN对其进行训练.稍后,当我们要查找X_test的重要单词时,我们将X_test拆分成两个字母,并找到它们的概率.它有效,但不准确.

  1. I tried splitting the input sentences into bi-grams and using a 1D CNN to train it . Later when we want to find the important words of X_test , we just split the X_test in bigrams and find their probabilities . It works but not accurate .

我尝试使用预构建的分层注意网络并成功.我得到了想要的东西,但是我无法从代码中找出每一行和所有概念.这对我来说就像一个黑匣子.

I tried using prebuilt Hierarchical Attention Networks and succeeded . I got what I wanted but I couldn't figure out every line and concepts from the code .It's like a black box to me .

我知道神经网络是如何工作的,我可以使用numpy对其进行编码,并从头开始进行手动反向传播.我对lstm的工作原理以及忘记,更新和输出门实际输出的内容有详细的了解.但是我仍然无法弄清楚如何提取注意力权重以及如何将数据制作为3D数组(我们的2D数据的时间步长是什么?)

I know how a neural net works and I can code it using numpy with manual back propagation from scratch . I have detailed knowledge of how a lstm works and what forget , update , and output gates actually outputs . But I couldn't still figure out how to extract attention weights and how to make the data as a 3D array ( what is the timestep in our 2D data ? )

因此,欢迎任何类型的指导

So , any type of guidance is welcome

推荐答案

这里是具有Attention(不是Hierarchical)的版本,但是您应该也能弄清楚如何使其与层次结构一起使用-如果没有,我可以提供帮助也.诀窍是定义2个模型,并使用1个模型进行训练(模型),并使用另一个模型来提取注意力值(model_with_attention_output):

Here is a version with Attention (not Hierarchical) but you should be able to figure out how to make it work with hierarchy too - if not I can help out too. The trick is to define 2 models and use 1 for the training (model) and the other one to extract attention values (model_with_attention_output):

# Tensorflow 1.9; Keras 2.2.0 (latest versions)
# should be backwards compatible upto Keras 2.0.9 and tf 1.5
from keras.models import Model
from keras.layers import *
import numpy as np

dictionary_size=1000

def create_models():
  #Get a sequence of indexes of words as input:
  # Keras supports dynamic input lengths if you provide (None,) as the 
  #  input shape
  inp = Input((None,))
  #Embed words into vectors of size 10 each:
  # Output shape is (None,10)
  embs = Embedding(dictionary_size, 10)(inp)
  # Run LSTM on these vectors and return output on each timestep
  # Output shape is (None,5)
  lstm = LSTM(5, return_sequences=True)(embs)
  ##Attention Block
  #Transform each timestep into 1 value (attention_value) 
  # Output shape is (None,1)
  attention = TimeDistributed(Dense(1))(lstm)
  #By running softmax on axis 1 we force attention_values
  # to sum up to 1. We are effectively assigning a "weight" to each timestep
  # Output shape is still (None,1) but each value changes
  attention_vals = Softmax(axis=1)(attention)
  # Multiply the encoded timestep by the respective weight
  # I.e. we are scaling each timestep based on its weight
  # Output shape is (None,5): (None,5)*(None,1)=(None,5)
  scaled_vecs = Multiply()([lstm,attention_vals])
  # Sum up all scaled timesteps into 1 vector 
  # i.e. obtain a weighted sum of timesteps
  # Output shape is (5,) : Observe the time dimension got collapsed
  context_vector = Lambda(lambda x: K.sum(x,axis=1))(scaled_vecs)
  ##Attention Block over
  # Get the output out
  out = Dense(1,activation='sigmoid')(context_vector)

  model = Model(inp, out)
  model_with_attention_output = Model(inp, [out, attention_vals])
  model.compile(optimizer='adam',loss='binary_crossentropy')
  return model, model_with_attention_output

model,model_with_attention_output = create_models()


model.fit(np.array([[1,2,3]]),[1],batch_size=1)
print ('Attention Over each word: ',model_with_attention_output.predict(np.array([[1,2,3]]),batch_size=1)[1])

输出将是带有每个单词的关注值的numpy数组-值越高,单词就越重要

The output will be the numpy array with attention value of each word - the higher the value the more important the word was

您可能想用embs代替lstm进行乘法运算以获得更好的解释,但会导致性能更差...

You might want to replace lstm in multiplication with embs to get better interpretations but it will lead to worse performance...

这篇关于使用深度学习突出显示句子中的重要单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆