TypeError:预期的字节,在自定义python函数中找到的str [英] TypeError: expected bytes, str found in custom python function

查看:85
本文介绍了TypeError:预期的字节,在自定义python函数中找到的str的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一种名为Giggle的新生物信息学工具,并且已经在系统上安装了python包装器. 即使情况很具体,我认为问题也很普遍. 此功能:

I am using a new bioinformatics tool called Giggle and I have installed the python wrapper on my system. Even though the scenario is quite specific, I think the problem is quite general. This function:

index = Giggle.create("index", "HMEC_hg19_BroadHMM_ALL.bed")

应基于多个(或本例中的).bed文件创建索引. 床文件看起来像这样:

should create an index based on several (or in this case one) .bed file. The bed files look like this:

chr1    10000   10600   15_Repetitive/CNV   0   .   10000   10600   245,245,245
chr1    10600   11137   13_Heterochrom/lo   0   .   10600   11137   245,245,245
chr1    11137   11737   8_Insulator 0   .   11137   11737   10,190,254
chr1    11737   11937   11_Weak_Txn 0   .   11737   11937   153,255,102
chr1    11937   12137   7_Weak_Enhancer 0   .   11937   12137   255,252,4
chr1    12137   14537   11_Weak_Txn 0   .   12137   14537   153,255,102
chr1    14537   20337   10_Txn_Elongation   0   .   14537   20337   0,176,80

基本上,这是一个大的制表符分隔文件,其中包含基因组间隔及其相应的染色体.运行上面的命令时,出现以下错误:

It is basically a large tab delimited file containing genomic intervals and their corresponding chromosome. When running the above command I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "giggle/giggle.pyx", line 25, in giggle.giggle.Giggle.create
TypeError: expected bytes, str found

我不知道为什么会这样,我已经尝试将文件转换为其他类型的编码,但是没有任何效果.错误所指向的代码段如下:

I have no clue why this is happening and I have tried converting the files to other types of encoding but nothing worked. The code snippet to which the error refers is as follows:

def create(self, char *path, char *glob):
    giggle_bulk_insert(to_bytes(glob), to_bytes(path), 1)
    return Giggle(path)

我正在Windows 10的Linux子系统上使用Python 3.6.

I am using Python 3.6 on a Linux subsystem for windows 10.

推荐答案

问题是在python 3中,字符串表示为unicode字符串,而不是像python 2中那样是字节字符串.当您安装giggle并运行您的使用python 2的代码,一切正常.但您可以这样做:

The problem is that in python 3 strings are represented as unicode strings, not byte strings as it was the case in python 2. When you install giggle and run your code using python 2 everything works fine. But you can do:

index = Giggle.create("index".encode('utf-8'), "HMEC_hg19_BroadHMM_ALL.bed".encode('utf-8'))

index = Giggle.create(b"index", b"HMEC_hg19_BroadHMM_ALL.bed")

具有显式字节字符串.它对我有用,直到咯咯笑着抱怨.bed文件格式不正确(复制时我可能搞砸了格式)

to have explicit byte strings. It worked for me, up to the point that giggle complains about the .bed file being incorrectly formatted (I probably messed up the format when copying)

更新: 像上面描述的那样调用它时,还会出现另一个问题:

Update: There is another issue that comes up when calling it like described above:

不支持文件类型'HMEC_hg19_BroadHMM_ALL.bed'

File type not supported 'HMEC_hg19_BroadHMM_ALL.bed'

这是由基础库giggle仅接受.bed.gz文件引起的,可以在python-giggle/lib/giggle/src/file_read.c中看到:

Which is caused by the underlying lib giggle only accepting .bed.gz files, which can be seen in python-giggle/lib/giggle/src/file_read.c:

if ( (strlen(i->file_name) > 7) &&
    strcmp(".bed.gz", file_name + strlen(i->file_name) - 7) == 0) {
    i->type = BED;
}

因此,我假设 python-giggle 网站上的自述文件在以下地方不正确声称您可以使用.bed文件来调用它.

So I am assuming that the Readme at the python-giggle site is not correct in claiming that you can call it with .bed files.

我用python-giggle\lib\giggle\test\data中提供的文件之一对其进行了测试,并且运行时没有错误

I tested it with one of the files provided in python-giggle\lib\giggle\test\data and it ran without an error

这篇关于TypeError:预期的字节,在自定义python函数中找到的str的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆