Gensim fasttext 无法获取最新的训练损失 [英] Gensim fasttext cannot get latest training loss

查看:63
本文介绍了Gensim fasttext 无法获取最新的训练损失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

fasttext 中的 get_latest_training_loss 函数似乎只返回 0.gensim 4.1.04.0.0不起作用.

It seems that the get_latest_training_loss function in fasttext returns only 0. Both gensim 4.1.0 and 4.0.0 do not work.

from gensim.models.callbacks import CallbackAny2Vec
from pprint import pprint as print
from gensim.models.fasttext import FastText
from gensim.test.utils import datapath

class callback(CallbackAny2Vec):
    '''Callback to print loss after each epoch.'''

    def __init__(self):
        self.epoch = 0

    def on_epoch_end(self, model):
        loss = model.get_latest_training_loss()
        print('Loss after epoch {}: {}'.format(self.epoch, loss))
        self.epoch += 1

# Set file names for train and test data
corpus_file = datapath('lee_background.cor')

model = FastText(vector_size=100, callbacks=[callback()])

# build the vocabulary
model.build_vocab(corpus_file=corpus_file)

# train the model
model.train(
    corpus_file=corpus_file, epochs=model.epochs,
    total_examples=model.corpus_count, total_words=model.corpus_total_words,
    callbacks=model.callbacks, compute_loss=True,
)

print(model)

'Loss after epoch 0: 0.0'
'Loss after epoch 1: 0.0'
'Loss after epoch 2: 0.0'
'Loss after epoch 3: 0.0'
'Loss after epoch 4: 0.0'

如果目前 FastText 不支持 get_latest_training_loss,这里的文档需要删除:

If currently FastText does not support get_latest_training_loss, the documentation here needs to be removed:

https://radimrehurek.com/gensim/models/fasttext.html#gensim.models.fasttext.FastText.get_latest_training_loss

我已经在三种不同的环境中尝试过这个,但它们都不起作用.

I have tried this in three different environments and neither of them works.

第一个环境:

[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform; print(platform.platform())
Linux-3.10.0-1160.36.2.el7.x86_64-x86_64-with-glibc2.17
>>> import sys; print("Python", sys.version)
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48)
[GCC 9.3.0]
>>> import struct; print("Bits", 8 * struct.calcsize("P"))
Bits 64
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.21.2
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.7.1
>>> import gensim; print("gensim", gensim.__version__)
gensim 4.1.0
>>> from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
FAST_VERSION 0

第二环境:

Python 3.9.5 (default, May 18 2021, 12:31:01)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform; print(platform.platform())
macOS-10.16-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.9.5 (default, May 18 2021, 12:31:01)
[Clang 10.0.0 ]
>>> import struct; print("Bits", 8 * struct.calcsize("P"))
Bits 64
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.20.3
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.7.1
>>> import gensim; print("gensim", gensim.__version__)
gensim 4.1.0
>>> from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
FAST_VERSION 0

第三种环境:

Python 3.9.5 (default, May 18 2021, 12:31:01)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform; print(platform.platform())
macOS-10.16-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.9.5 (default, May 18 2021, 12:31:01)
[Clang 10.0.0 ]
>>> import struct; print("Bits", 8 * struct.calcsize("P"))
Bits 64
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.20.3
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.7.1
>>> import gensim; print("gensim", gensim.__version__)
/Users/jinhuawang/miniconda3/lib/python3.9/site-packages/gensim/similarities/__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
  warnings.warn(msg)
gensim 4.0.0
>>> from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
FAST_VERSION 0

推荐答案

确实,损失跟踪从未在 Gensim 的 FastText 模型中实现,至少在 4.1.0 版(2021 年 8 月)中).

Indeed, loss-tracking hasn't ever been implemented in Gensim's FastText model, at least through release 4.1.0 (August 2021).

该方法的文档出现错误,因为从 Word2Vec 超类继承的方法没有被覆盖,以防止超类方法工作的默认假设.

The docs for that method appear in error, due to the inherited method from the Word2Vec superclass not being overriden to prevent the default assumption that superclass methods work.

有一个长期悬而未决的问题来填补空白&修复 Gensim 的损失跟踪中的问题(对于 Word2Vec 来说,这也有些错误和不完整).但是,目前我认为没有任何贡献者正在研究它,&它没有被任何即将发布的版本优先考虑.可能需要有人自愿挺身而出.解决问题.

There is a long-open issue to fill the gaps & fix the problems in Gensim's loss-tracking (which is also somewhat buggy & incomplete for Word2Vec). But, at the moment I don't think any contributor is working on it, & it hasn't been prioritized for any upcoming release. It may require someone to volunteer to step forward & fix things.

这篇关于Gensim fasttext 无法获取最新的训练损失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆