在python中优化初始化 [英] Optimizing the initialization in python

查看:77
本文介绍了在python中优化初始化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找优化下面的小代码:

I am looking to optimize the small code below:

def update_users_genre_lang_score(cursor):
    cursor.execute("select user_id,playDuration,lang,genre from sd_archive_track_clicks where playDuration > 15 and user_id!=0 and genre!=0 and lang!=0 and lang <21 and genre <24 and playDate > '2016-10-01'order by playDate desc")
    db.commit()
    numrows = int(cursor.rowcount)
    tracks_played= cursor.fetchall()

    genre_list=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
    lang_list=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
    #initialization part
    user_genre_score = {}
    user_lang_score = {}

    for track in tracks_played:
        user_genre_score[track['user_id']]={}
        user_lang_score[track['user_id']]={}
        for genre in genre_list:
            user_genre_score[track['user_id']][genre]=0
        for lang in lang_list:
            user_lang_score[track['user_id']][lang]=0

    #initialization part end
    for track in tracks_played:
        user_genre_score[track['user_id']][track['genre']]=int(user_genre_score[track['user_id']][track['genre']]) + 1
        user_lang_score[track['user_id']][track['lang']]=int(user_lang_score[track['user_id']][track['lang']]) + 1

有什么方法可以优化初始化步骤?

Is there any way I can optimize the initialization step?

推荐答案

通过创建默认dict并将其复制到您的记录中,您可以提高速度.这是带有一些注释的示例代码...

You may get some speedup by creating default dicts and copying them to your records. Here is sample code with some comments...

def update_users_genre_lang_score(cursor):
    # you are asking for a lot of stuff but only using a little. Is this
    # stuff consumed in this function?
    cursor.execute("select user_id,playDuration,lang,genre from sd_archive_track_clicks where playDuration > 15 and user_id!=0 and genre!=0 and lang!=0 and lang <21 and genre <24 and playDate > '2016-10-01'order by playDate desc")
    # what is the commit for?
    # db.commit()
    numrows = int(cursor.rowcount)
    tracks_played= cursor.fetchall()
    #print tracks_played

    genre_list=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
    genre_default = {genre:0 for genre in genre_list}
    lang_list=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
    lang_default = {lang:0 for lang in lang_list}

    #initialization part

    user_genre_score = {}
    user_lang_score = {}
    for track in tracks_played:
            user_id = track['user_id']
            user_genre_score[user_id]=genre_default.copy()
            user_lang_score[user_id]=lang_default.copy()

    #initialization part end

    # this seems like an expensive way to initialize to 1 instead of 0...
    # am i missing something?!
    for track in tracks_played:
        user_genre_score[track['user_id']][track['genre']] += 1
        user_lang_score[track['user_id']][track['lang']] += 1

更新

您可以使用collections.defaultdict进行初始化,以便在触摸项目时动态生成项目.这样一来,每次user_id出现在行中时,您就不必重新访问节点.

You could initialize with collections.defaultdict so that items are generated dynamically as you touch them. This saves you from revisiting the nodes for each time user_id appears in the rows.

import collections

def update_users_genre_lang_score(cursor):
    cursor.execute("select user_id,playDuration,lang,genre from sd_archive_track_clicks where playDuration > 15 and user_id!=0 and genre!=0 and lang!=0 and lang <21 and genre <24 and playDate > '2016-10-01'order by playDate desc")
    # what is the commit for?
    # db.commit()
    numrows = int(cursor.rowcount)
    tracks_played= cursor.fetchall()
    #print tracks_played

    #initialization part

    # this creates a two level nested dict ending in an integer count 
    # that generates items dynamically
    user_genre_score = collections.defaultdict(lambda: collections.defaultdict(int))
    user_lang_score = collections.defaultdict(lambda: collections.defaultdict(int))

    #initialization part end

    for track in tracks_played:
            user_genre_score[track['user_id']][track['genre']] += 1
            user_lang_score[track['user_id']][track['lang']] += 1

工作原理

defaultdict可以使您的大脑爆炸-合理的警告.使用dict,访问不存在的密钥将引发KeyError.但是使用defaultdict,它将调用您提供的初始化程序并为您创建一个密钥.呼叫int()时得到0.

defaultdict can make your brain explode - fair warning. With dict, accessing a non-existent key raises KeyError. But with defaultdict, it calls an initializer you supply and creates a key for you. When you call int() you get a 0.

>>> int() 
0

因此,如果我们将其设置为初始值设定项,则在您首次访问新密钥时会得到0

So if we make it the initializer, you get 0 when you first access a new key

>>> d1 = collections.defaultdict(int)
>>> d1
defaultdict(<class 'int'>, {})
>>> d1['user1']
0
>>> d1
defaultdict(<class 'int'>, {'user1': 0})

如果您增加新键,则python首先获取执行初始化的项

If you increment a new key, python first gets the item which does the initialization

>>> d1['user2'] += 1
>>> d1
defaultdict(<class 'int'>, {'user1': 0, 'user2': 1})

但是您需要两个级别的命令...,因此外部的要创建内部的defaultdict

But you need two levels of dicts..., so have the outer one create inner defaultdict

>>> d2 = collections.defaultdict(lambda:collections.defaultdict(int))
>>> d2['user1']
defaultdict(<class 'int'>, {})
>>> d2['user1']['genre1']
0
>>> d2
defaultdict(<function <lambda> at 0x7efedf493bf8>, {'user1': defaultdict(<class 'int'>, {'genre1': 0})})
>>> d2['user1']['genre2'] += 1
>>> d2
defaultdict(<function <lambda> at 0x7efedf493bf8>, {'user1': defaultdict(<class 'int'>, {'genre1': 0, 'genre2': 1})})

这篇关于在python中优化初始化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆