分割错误11,带有列表的python散列,散列一百万个对象 [英] Segmentation fault 11, python hash with lists, hashing 1 million objects

查看:140
本文介绍了分割错误11,带有列表的python散列,散列一百万个对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我试图从一个包含一百万首歌曲的文件中创建和散列对象时,我在约12000次成功哈希之后出现了一个奇怪的分割错误。

任何人都有想法为何如此:

 分割错误:11 

发生在我运行该程序时?



我有这些类用于散列对象:

  class Node():
def __init __(self,key,value = None):
self.key = key
self.value = value
$ b def __str __(self):
return str(self.key)+:+ str(self.value)

class Hashtable():
def __init __(self,hashsize,hashlist = [None]):
self.hashsize = hashsize * 2
self.hashlist = hashlist *(self.hashsize)

def __str __(self):
返回self.hashlist

def hash_num(self,name):
result = 0
name_list = list (姓名)
给我的信n name_list:
result =(result * self.hashsize + ord(letter))%self.hashsize
返回结果

def check(self,num):
如果self.hashlist [num]!= None:
num =(num + 11 ** 2)%self.hashsize#Kollahärjättemycket!
chk_num = self.check(num)#härmed
return chk_num#lärdig
else:
return num

def check_atom(self,num ,name):
if self.hashlist [num] .key == name:
return num
else:
num =(num + 11 ** 2)%self。 hashsize
chk_num = self.check_atom(num,name)#läshär
return chk_num#läsdethär

def put(self,name,new_atom):
node = Node(name)
node.value = new_atom
num = self.hash_num(name)
chk_num = self.check(num)
print(chk_num)
self.hashlist [chk_num] =节点

def get(self,name):
num = self.hash_num(name)
chk_num = self.check_atom(num,name )
atom = self.hashlist [chk_num]
return atom.value

我在这段代码中调用函数:

  from time import * 
from hashlist import *
import sys

sys.setrecursionlimit(1000000000)

def lasfil (filnamn,h):
开放(filnamn,r,encoding =utf-8)为fil:
为fil中的rad:
data = rad.split( )
艺术家=数据[2] .strip()
歌曲=数据[3] .strip()
h.put(艺术家,歌曲)

def hitta(artist,h):
try:
start = time()
print(h.get(artist))
stop = time()
tidhash =停止 - 开始
返回tidhash
除了AttributeError:
通过

h = Hashtable(1000000)
lasfil(write.txt ,h)


解决方案

故障是这样的:

  sys.setrecursionlimit(1000000000)

我假设您添加了它,因为您收到了 Runt imeError:超过最大递归深度。提高递归限制不会为调用堆栈分配更多内存,它只会推迟前面提到的异常。如果将其设置得太高,解释器将耗尽堆栈空间并访问不属于它的内存,导致随机错误(可能是段错误,但理论上任何事情都是可能的)。

真正的解决方案是不使用无限递归。对于像平衡搜索树这样的递归深度被限制在几十个级别的东西,没关系,但是你不能用递归替换长循环。



另外,除非这是创建哈希表的练习,否则应该使用内置的 dict 。如果是创建散列表的一项练习,请考虑这个提示:有关散列表的一些内容很糟糕:它指示探针长度至少为1000,更可能是几个千。它应该最多只有几十个,最好是单个数字。


When I try to make and hash objects from a file, containing one million songs, I get a weird segmentation error after about 12000 succesfull hashes.

Anyone have any idea why this:

Segmentation fault: 11

happens when I run the program?

I have these classes for hashing the objects:

class Node():
    def __init__(self, key, value = None):
        self.key = key
        self.value = value

    def __str__(self):
        return str(self.key) + " : " + str(self.value)

class Hashtable():
    def __init__(self, hashsize, hashlist = [None]):
        self.hashsize = hashsize*2
        self.hashlist = hashlist*(self.hashsize)

    def __str__(self):
        return self.hashlist

    def hash_num(self, name):
        result = 0
        name_list = list(name)
        for letter in name_list:
            result = (result*self.hashsize + ord(letter))%self.hashsize
        return result

    def check(self, num):
        if self.hashlist[num] != None:
            num = (num + 11**2)%self.hashsize#Kolla här jättemycket!
            chk_num = self.check(num)#här med
            return chk_num#lär dig
        else:
            return num

    def check_atom(self, num, name):
        if self.hashlist[num].key == name:
            return num
        else:
            num = (num + 11**2)%self.hashsize
            chk_num = self.check_atom(num, name)#läs här
            return chk_num#läs det här

    def put(self, name, new_atom):
        node = Node(name)
        node.value = new_atom
        num = self.hash_num(name)
        chk_num = self.check(num)
        print(chk_num)
        self.hashlist[chk_num] = node

    def get(self, name):
        num = self.hash_num(name)
        chk_num = self.check_atom(num, name)
        atom = self.hashlist[chk_num]
        return atom.value

And I call upon the function in this code:

from time import *
from hashlist import *
import sys

sys.setrecursionlimit(1000000000)

def lasfil(filnamn, h):
    with open(filnamn, "r", encoding="utf-8") as fil:
        for rad in fil:
            data = rad.split("<SEP>")
            artist = data[2].strip()
            song = data[3].strip()
            h.put(artist, song)

def hitta(artist, h):
    try:
        start = time()
        print(h.get(artist))
        stop = time()
        tidhash = stop - start
        return tidhash
    except AttributeError:
        pass

h = Hashtable(1000000)
lasfil("write.txt", h)

解决方案

The reason you're getting a segmentation fault is this line:

sys.setrecursionlimit(1000000000)

I assume you added it because you received a RuntimeError: maximum recursion depth exceeded. Raising the recursion limit doesn't allocate any more memory for the call stack, it just defers the aforementioned exception. If you set it too high, the interpreter runs out of stack space and accesses memory that doesn't belong to it, causing random errors (likely segfaults, but in theory anything is possible).

The real solution is to not use unbounded recursion. For things like balanced search trees, where the recursion depth is limited to a few dozen levels, it's okay, but you can't replace long loops with recursion.

Also, unless this is an exercise in creating hash tables, you should just use the built in dict. If it is an exercise in creating hash tables, consider this a hint that something about your hash table sucks: It indicates a probe length of at least 1000, more likely several thousand. It should only be a few dozen at most, ideally in the single digits.

这篇关于分割错误11,带有列表的python散列,散列一百万个对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆