如何将 Count Morgan 指纹计算为 numpy.array? [英] How can I compute a Count Morgan fingerprint as numpy.array?

查看:181
本文介绍了如何将 Count Morgan 指纹计算为 numpy.array?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 rdkit 生成计数 Morgan 指纹并将它们提供给 scikit Learn 模型(在 Python 中).但是,我不知道如何将指纹生成为 numpy 数组.当我使用

I would like to use rdkit to generate count Morgan fingerprints and feed them to a scikit Learn model (in Python). However, I don't know how to generate the fingerprint as a numpy array. When I use

from rdkit import Chem
from rdkit.Chem import AllChem
m = Chem.MolFromSmiles('c1cccnc1C')
fp = AllChem.GetMorganFingerprint(m, 2, useCounts=True)

我得到一个需要转换的 UIntSparseIntVect.我唯一发现的是 cDataStructs(参见:http://rdkit.org/docs/source/rdkit.DataStructs.cDataStructs.html),但目前不支持 UIntSparseIntVect.

I get a UIntSparseIntVect that I would need to convert. The only thing I found was cDataStructs (see: http://rdkit.org/docs/source/rdkit.DataStructs.cDataStructs.html), but this does not currently support UIntSparseIntVect.

推荐答案

回答可能有点晚,但这些方法对我有用

Maybe a little late to answer but these methods work for me

如果你想要位(0 和 1):

If you want the bits (0 and 1):

from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs

mol = Chem.MolFromSmiles('c1cccnc1C')
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024)
array = np.zeros((0, ), dtype=np.int8)
DataStructs.ConvertToNumpyArray(fp, array)

回到指纹:

bitstring = "".join(array.astype(str))
fp2 = DataStructs.cDataStructs.CreateFromBitString(bitstring)
assert list(fp.GetOnBits()) == list(fp2.GetOnBits())

如果你想要计数:

fp3 = AllChem.GetHashedMorganFingerprint(mol, 2, nBits=1024)
array = np.zeros((0,), dtype=np.int8)
DataStructs.ConvertToNumpyArray(fp3, array)
print(array.nonzero())

输出:

(array([ 19,  33,  64, 131, 175, 179, 356, 378, 428, 448, 698, 707, 726,
   842, 849, 889]),)

回到指纹(不确定这是最好的方法):

And back to a fingerprint (Not sure this is the best way to do this):

def numpy_2_fp(array):
    fp = DataStructs.cDataStructs.UIntSparseIntVect(len(array))
    for ix, value in enumerate(array):
        fp[ix] = int(value)
    return fp

fp4 = numpy_2_fp(array)
assert fp3.GetNonzeroElements() == fp4.GetNonzeroElements()

这篇关于如何将 Count Morgan 指纹计算为 numpy.array?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆