使用LSTM生成文本的TensorFlow [英] TensorFlow using LSTMs for generating text

查看:130
本文介绍了使用LSTM生成文本的TensorFlow的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用tensorflow生成文本并一直在修改LSTM教程(

I would like to use tensorflow to generate text and have been modifying the LSTM tutorial (https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html#recurrent-neural-networks) code to do this, however my initial solution seems to generate nonsense, even after training for a long time, it does not improve. I fail to see why. The idea is to start with a zero matrix and then generate one word at a time.

这是代码,我在下面添加了两个功能 https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/rnn/ptb/ptb_word_lm.py

This is the code, to which I've added the two functions below https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/rnn/ptb/ptb_word_lm.py

生成器如下

def generate_text(session,m,eval_op):

    state = m.initial_state.eval()

    x = np.zeros((m.batch_size,m.num_steps), dtype=np.int32)

    output = str()
    for i in xrange(m.batch_size):
        for step in xrange(m.num_steps):
            try:
                # Run the batch 
                # targets have to bee set but m is the validation model, thus it should not train the neural network
                cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities],
                                                            {m.input_data: x, m.targets: x, m.initial_state: state})

                # Sample a word-id and add it to the matrix and output
                word_id = sample(probabilities[0,:])
                output = output + " " + reader.word_from_id(word_id)
                x[i][step] = word_id

            except ValueError as e:
                print("ValueError")

    print(output)

我在ptb_model中添加了变量概率",它只是logit上的softmax.

I have added the variable "probabilities" to the ptb_model and it is simply a softmax over the logits.

self._probabilities = tf.nn.softmax(logits)

和采样:

def sample(a, temperature=1.0):
    # helper function to sample an index from a probability array
    a = np.log(a) / temperature
    a = np.exp(a) / np.sum(np.exp(a))
    return np.argmax(np.random.multinomial(1, a, 1))

推荐答案

我一直在朝着完全相同的目标努力,只是让它起作用.您在这里进行了许多正确的修改,但我认为您已经错过了一些步骤.

I have been working toward the exact same goal, and just got it to work. You have many of the right modifications here, but I think you've missed a few steps.

首先,要生成文本,您需要创建仅代表单个时间步长的模型的不同版本.原因是我们需要对每个输出y进行采样,然后才能将其输入到模型的下一步.我通过做一个新的配置来做到这一点,该配置将num_stepsbatch_size都设置为等于1.

First, for generating text you need to create a different version of the model which represents only a single timestep. The reason is that we need to sample each output y before we can feed it into the next step of the model. I did this by making a new config which sets num_steps and batch_size both equal to 1.

class SmallGenConfig(object):
  """Small config. for generation"""
  init_scale = 0.1
  learning_rate = 1.0
  max_grad_norm = 5
  num_layers = 2
  num_steps = 1 # this is the main difference
  hidden_size = 200
  max_epoch = 4
  max_max_epoch = 13
  keep_prob = 1.0
  lr_decay = 0.5
  batch_size = 1
  vocab_size = 10000

我还通过以下几行为模型添加了一个概率:

I also added a probabilities to the model with these lines:

self._output_probs = tf.nn.softmax(logits)

@property
def output_probs(self):
  return self._output_probs

然后,我的generate_text()函数有一些区别.第一个是使用tf.train.Saver()对象从磁盘加载保存的模型参数.请注意,我们是在从上方使用新配置实例化PTBModel之后完成的.

Then, there are a few differences in my generate_text() function. The first one is that I load saved model parameters from disk using the tf.train.Saver() object. Note that we do this after instantiating the PTBModel with the new config from above.

def generate_text(train_path, model_path, num_sentences):
  gen_config = SmallGenConfig()

  with tf.Graph().as_default(), tf.Session() as session:
    initializer = tf.random_uniform_initializer(-gen_config.init_scale,
                                                gen_config.init_scale)    
    with tf.variable_scope("model", reuse=None, initializer=initializer):
      m = PTBModel(is_training=False, config=gen_config)

    # Restore variables from disk.
    saver = tf.train.Saver() 
    saver.restore(session, model_path)
    print("Model restored from file " + model_path)

第二个区别是,我获得了从id到单词字符串的查找表(我必须编写此函数,请参见下面的代码).

The second difference is that I get the lookup table from ids to word strings (I had to write this function, see the code below).

    words = reader.get_vocab(train_path)

我以与您相同的方式设置初始状态,但是随后我以不同的方式设置了初始令牌.我想使用句子结尾"标记,以便从正确的单词类型开始我的句子.我浏览了index一词,发现<eos>恰好具有索引2(确定性),因此我对其进行了硬编码.最后,我将其包装在1x1 Numpy Matrix中,以便它成为模型输入的正确类型

I set up the initial state the same way you do, but then I set up the initial token in a different manner. I want to use the "end of sentence" token so that I'll start my sentence with the right types of words. I looked through the word index and found that <eos> happens to have index 2 (deterministic) so I just hard-coded that in. Finally, I wrap it in a 1x1 Numpy Matrix so that it is the right type for the model inputs.

    state = m.initial_state.eval()
    x = 2 # the id for '<eos>' from the training set
    input = np.matrix([[x]])  # a 2D numpy matrix 

最后,这是我们生成句子的部分.请注意,我们告诉session.run()计算output_probsfinal_state.我们给它输入和状态.在第一次迭代中,输入为<eos>,状态为initial_state,但是在后续迭代中,我们将最后一次采样的输出作为输入,并从最后一次迭代传递状态.还要注意,我们使用words列表从输出索引中查找单词字符串.

Finally, here's the part where we generate sentences. Note that we tell session.run() to compute the output_probs and the final_state. And we give it the input and the state. In the first iteration the input is <eos> and the state is the initial_state, but on subsequent iterations we give as input our last sampled output, and we pass the state along from the last iteration. Note also that we use the words list to look up the word string from the output index.

    text = ""
    count = 0
    while count < num_sentences:
      output_probs, state = session.run([m.output_probs, m.final_state],
                                   {m.input_data: input,
                                    m.initial_state: state})
      x = sample(output_probs[0], 0.9)
      if words[x]=="<eos>":
        text += ".\n\n"
        count += 1
      else:
        text += " " + words[x]
      # now feed this new word as input into the next iteration
      input = np.matrix([[x]]) 

然后我们要做的就是打印出我们累积的文本.

Then all we have to do is print out the text we accumulated.

    print(text)
  return

generate_text()函数就是这样.

最后,让我向您展示get_vocab()的函数定义,该函数定义已放置在reader.py中.

Finally, let me show you the function definition for get_vocab(), which I put in reader.py.

def get_vocab(filename):
  data = _read_words(filename)

  counter = collections.Counter(data)
  count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

  words, _ = list(zip(*count_pairs))

  return words

您需要做的最后一件事是能够在训练模型后保存模型,看起来像

The last thing you need to do is to be able to save the model after training it, which looks like

save_path = saver.save(session, "/tmp/model.ckpt")

这就是您稍后在生成文本时从磁盘加载的模型.

And that's the model that you'll load from disk later when generating text.

还有一个问题:我发现,有时Tensorflow softmax函数产生的概率分布并不完全等于1.0.当总和大于1.0时,np.random.multinomial()会引发错误.所以我不得不编写自己的采样函数,看起来像这样

There was one more problem: I found that sometimes the probability distribution produced by the Tensorflow softmax function didn't sum exactly to 1.0. When the sum was larger than 1.0, np.random.multinomial() throws an error. So I had to write my own sampling function, which looks like this

def sample(a, temperature=1.0):
  a = np.log(a) / temperature
  a = np.exp(a) / np.sum(np.exp(a))
  r = random.random() # range: [0,1)
  total = 0.0
  for i in range(len(a)):
    total += a[i]
    if total>r:
      return i
  return len(a)-1 

当您将所有这些放在一起时,小型模型能够为我带来一些很棒的句子.祝你好运.

When you put all this together, the small model was able to generate me some cool sentences. Good luck.

这篇关于使用LSTM生成文本的TensorFlow的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆