spaCy:尝试加载序列化文档时出错 [英] spaCy: errors attempting to load serialized Doc
问题描述
我正在尝试对spaCy文档进行序列化/反序列化(设置为Windows 7,Anaconda),并且出现错误.我还找不到任何解释.这是一段代码及其产生的错误:
I am trying to serialize/deserialize spaCy documents (setup is Windows 7, Anaconda) and am getting errors. I haven't been able to find any explanations. Here is a snippet of code and the error it generates:
import spacy
nlp = spacy.load('en')
text = 'This is a test.'
doc = nlp(text)
fout = 'test.spacy' # <-- according to the API for Doc.to_disk(), this needs to be a directory (but for me, spaCy writes a file)
doc.to_disk(fout)
doc.from_disk(fout)
Traceback (most recent call last):
File "<ipython-input-7-aa22bf1b9689>", line 1, in <module>
doc.from_disk(fout)
File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk
File "doc.pyx", line 806, in spacy.tokens.doc.Doc.from_bytes
ValueError: [E033] Cannot load into non-empty Doc of length 5.
我还尝试创建一个新的Doc对象并从中加载,如spaCy中的示例(示例:保存并加载文档")所示. /processing-pipelines#section-serialization"rel =" nofollow noreferrer>文档,它会导致另一个错误:
I have also tried creating a new Doc object and loading from that, as shown in the example ("Example: Saving and loading a document") in the spaCy docs, which results in a different error:
from spacy.tokens import Doc
from spacy.vocab import Vocab
new_doc = Doc(Vocab()).from_disk(fout)
Traceback (most recent call last):
File "<ipython-input-16-4d99a1199f43>", line 1, in <module>
Doc(Vocab()).from_disk(fout)
File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk
File "doc.pyx", line 838, in spacy.tokens.doc.Doc.from_bytes
File "stringsource", line 646, in View.MemoryView.memoryview_cwrapper
File "stringsource", line 347, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
如答复中所指出,提供的路径应为目录.但是,第一个代码段将创建一个文件.将其更改为不存在的目录路径无济于事,因为spaCy仍会创建文件.尝试写入现有目录也会导致错误:
As pointed out in the replies, the path provided should be a directory. However, the first code snippet creates a file. Changing this to a non-existing directory path doesn't help as spaCy still creates a file. Attempting to write to an existing directory causes an error too:
fout = 'data'
doc.to_disk(fout) Traceback (most recent call last):
File "<ipython-input-8-6c30638f4750>", line 1, in <module>
doc.to_disk(fout)
File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
opener=self._opener)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
return self._accessor.open(self, flags, mode)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: 'data'
通过标准文件操作(open
/read
/write
)在此位置编写Python不会出现问题.
Python has no problem writing at this location via standard file operations (open
/read
/write
).
尝试使用Path对象会产生相同的结果:
Trying with a Path object yields the same results:
from pathlib import Path
import os
fout = Path(os.path.join(os.getcwd(), 'data'))
doc.to_disk(fout)
Traceback (most recent call last):
File "<ipython-input-17-6c30638f4750>", line 1, in <module>
doc.to_disk(fout)
File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
opener=self._opener)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
return self._accessor.open(self, flags, mode)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Username\\workspace\\data'
有什么想法可能会发生这种情况吗?
Any ideas why this might be happening?
推荐答案
doc.to_disk(fout)
必须
目录的路径,如果目录不存在,将创建该路径. 路径可以是字符串,也可以是类似路径的对象.
a path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.
https://spacy.io/api/doc 中的spaCy状态文档
as the documentation for spaCy states in https://spacy.io/api/doc
尝试将fout
更改为目录,可能会成功.
Try changing fout
to a directory, it might do the trick.
spacy文档中的示例:
Examples from the spacy documentation:
对于doc.to_disk
:
doc.to_disk('/path/to/doc')
和doc.from_disk
:
from spacy.tokens import Doc
from spacy.vocab import Vocab
doc = Doc(Vocab()).from_disk('/path/to/doc')
这篇关于spaCy:尝试加载序列化文档时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!