Textsum - 与 ref 文件相比,解码结果不正确 [英] Textsum - Incorrect decode results compared to ref file

查看:15
本文介绍了Textsum - 与 ref 文件相比,解码结果不正确的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对我自己的数据集执行训练时会出现此问题,该数据集已通过 data_convert_example.py 转换为二进制文件.经过一周的训练后,我得到了在比较 decode 和 ref 文件时没有意义的解码结果.

This issue is seen when performing training against my own dataset which was converted to binary via data_convert_example.py. After a week of training I get decode results that don't make sense when comparing the decode and ref files.

如果有人成功并使用他们自己的数据获得了与 Textsum 自述文件中发布的结果类似的结果,我很想知道什么对您有用……环境、tf 构建、文章数量.

If anyone has been successful and gotten results similar to what is posted in the Textsum readme using their own data, I would love to know what has worked for you...environment, tf build, number of articles.

我目前在 0.11 上没有运气,但在 0.9 上得到了一些结果,但是解码结果与下面显示的结果相似,我什至不知道它们来自哪里.

I currently have not had luck with 0.11, but have gotten some results with 0.9 however the decode results are similar to those shown below which I have no idea where they are even coming from.

我目前运行的是 Ubuntu 16.04、TF 0.9、CUDA 7.5 和 CuDnn 4.我尝试了 TF 0.11,但正在处理其他问题,所以我又回到了 0.9.解码结果似乎是从有效文章中生成的,但参考文件和解码文件索引没有相关性.

I currently am running Ubuntu 16.04, TF 0.9, CUDA 7.5 and CuDnn 4. I tried TF 0.11 but was dealing with other issues so I went back to 0.9. It does seem that the decode results are being generated from valid articles, but the reference file and decode file indicies have NO correlation.

如果有人可以提供任何帮助或指导,将不胜感激.否则,如果我想出什么,我会在这里发布.

If anyone can provide any help or direction, it would be greatly appreciated. Otherwise, should I figure anything out, I will post here.

最后几个问题.关于引用的词汇文件.它是否完全需要按词频排序?在生成它时,我从未按照这些方式执行任何操作,只是不确定这是否也会产生一些影响.

A few final questions. Regarding the vocab file referenced. Does it at all need to be sorted by word frequency at all? I never performed anything along these lines when generating it and just wasn't sure if this would throw something off as well.

最后,我在生成数据时假设训练数据文章应该分解成更小的批次.我将文章分成多个文件,每个文件 100 篇.然后将它们命名为 data-0、data-1 等.我认为这是我的正确假设吗?我还将所有词汇保存在一个文件中,该文件似乎没有出现任何错误.

Finally, I made the assumption in generating the data that the training data articles should be broken down into smaller batches. I separated out the articles into multiple files of 100 articles each. These were then named data-0, data-1, etc. I assume this was a correct assumption on my part? I also kept all the vocab in one file which has not seemed to throw any errors.

上述假设是否也正确?

以下是一些参考和解码结果,您可以看到这些结果很奇怪,似乎没有相关性.

Below are some ref and decode results which you can see are quite odd and seem to have no correlation.

output=Wild Boy Goes About How I Can't Be Really Go For Love 
output=State Department defends the campaign of Iran
output=John Deere sails profit - Business Insider  
output=to roll for the Perseid meteor shower
output=Man in New York City in Germany

参考:

output=Battle Chasers: Nightwar Combines Joe Mad's Stellar Art With Solid RPG Gameplay
output=Obama Meets a Goal That Could Literally Destroy America
output=WOW! 10 stunning photos of presidents daughter Zahra Buhari   
output=Koko the gorilla jams out on bass with Flea from Red Hot Chili Peppers  
output=Brenham police officer refused service at McDonald's

推荐答案

我自己来回答这个问题.似乎这里的问题是缺乏训练数据.最后,我确实对我的 vocab 文件进行了排序,但是这似乎没有必要.这样做的原因是,允许最终用户根据需要将词汇限制为 20 万个单词.

Going to answer this one myself. Seems the issue here was the lack of training data. In the end I did end up sorting my vocab file, however it seems this is not necessary. The reason this was done, was to allow the end user to limit the vocab words to something like 200k words should they wish.

上述问题的最大原因仅仅是缺乏数据.当我在原始帖子中进行培训时,我正在处理 4 万多篇文章.我认为这已经足够了,但显然不是,当我深入了解代码并更好地了解发生了什么时,这一点更加明显.最后我将文章数量增加到超过 130 万,我在我的 980GTX 上训练了大约一个半星期,平均损失大约为 1.6 到 2.2 我看到了更好的结果.

The biggest reason for the problems above were simply the lack of data. When I ran the training in the original post, I was working with 40k+ articles. I thought this was enough but clearly it wasn't and this was even more evident when I got deeper into the code and gained a better understanding as to what was going on. In the end I increased the number of articles to over 1.3 million, I trained for about a week and a half on my 980GTX and got the average loss to about 1.6 to 2.2 I was seeing MUCH better results.

我边走边学,但我停在了上面的平均损失上,因为我执行的一些阅读指出,当您对测试"数据执行评估"时,您的平均损失应该接近您所看到的在培训中.这有助于确定当它们相距很远时您是否接近过度拟合.再次对这一点持保留态度,因为我正在学习,但对我来说似乎合乎逻辑.

I am learning this as I go, but I stopped at the above average loss because some reading I performed stated that when you perform "eval" against your "test" data, your average loss should be close to what you are seeing in training. This helps to determine whether you are getting close to over-fitting when these are far apart. Again take this with a grain of salt, as I am learning but it seems to make sense logically to me.

最后一个说明我是通过艰难的方式学到的.确保升级到最新的 0.11 Tensorflow 版本.我最初使用 0.9 进行训练,但是当我想弄清楚如何为 tensorflow 导出模型时,我发现该 repo 中没有 export.py 文件.当我升级到 0.11 时,我发现检查点文件结构似乎在 0.11 中发生了变化,我需要再花 2 周时间进行训练.所以我建议只升级,因为他们已经解决了我在 RC 期间看到的一些问题.我仍然必须设置 is_tuple=false 但除此之外,一切都很好.希望这对某人有所帮助.

One last note that I learned the hard way is this. Make sure you upgrade to the latest 0.11 Tensorflow version. I originally trained using 0.9 but when I went to figure out how to export the model for tensorflow, I found that there was no export.py file in that repo. When I upgrades to 0.11, I then found that the checkpoint file structure seems to have changed in 0.11 and I needed to take another 2 weeks to train. So I would recommend just upgrading as they have resolved a number of the problems I was seeing during the RC. I still did have to set the is_tuple=false but that aside, all has worked out well. Hope this helps someone.

这篇关于Textsum - 与 ref 文件相比,解码结果不正确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆