Flask-WhooshAlchemy与现有的数据库 [英] Flask-WhooshAlchemy with existing database

查看:248
本文介绍了Flask-WhooshAlchemy与现有的数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怎样才能让Flask-WhooshAlchemy为一个已经存在的数据库填充记录创建.seg文件?
通过调用:

  with app.app_context():
whooshalchemy.whoosh_index(app,MappedClass)

我可以得到.toc文件,但只会创建.seg文件,插入直接通过Flask-WhooshAlchemy接口记录。因此,所有已经存在的记录将永远不会被包含在一个whoosh搜索中。

解决方案

这是一个索引现有数据库的脚本。 FWIW,whoosh指的是作为批量索引。

这有点粗糙,但是起作用:

< pre $ #!/ usr / bin / env python2

导入os
导入sys
导入app $ b $ from models import YourModel作为Model
from flask.ext.whooshalchemy import whoosh_index

sys.stdout = os.fdopen(sys.stdout.fileno(),'w',0)
atatime = 512

with app.app_context():
index = whoosh_index(app,Model)
searchable = Model .__ searchable__
print'counting rows ...'$格式(总数)
作者=总数$ b $总数= {int(Model.query.order_by(无).count())
完成= 0
打印总行数:{} index.writer(limitmb = 10000,procs = 16,multisegment = True)
for Model.query.yield_per(atatime):
record = dict([(s,p .__ dict __ [s] )for s in searchable])
record.update({'id':unicode(p.id)})#id是强制性的,或者whoosh将不起作用
writer.add_document(** record)
done + = 1
如果完成%atatime == 0:
print'c {} / {}({}%)'。 ,(total(float)(total)/ total)* 100,2)),

print'{} / {}({}%)'格式(done,total,round (float(done)/ total)* 100,2))
writer.commit()



<你可能想要玩这个参数:


  • atatime -
  • limitmb - max要使用的字节数

  • procs - 并行使用的内核


索引8核AWS实例上的360,000条记录。大约需要4分钟,其中大部分正在等待(单线程) commit()


How can I get Flask-WhooshAlchemy to create the .seg files for an already existing database filled with records? By calling:

with app.app_context():
    whooshalchemy.whoosh_index(app, MappedClass)

I can get the .toc file, but the .seg files will only be created and once I insert a record directly via Flask-WhooshAlchemy interface. Thus all already existing records will never be included in a whoosh search.

解决方案

Here is a script that indexes an existing database. FWIW, Whoosh refers to that as "batch indexing".

This is a little rough, but it works:

#!/usr/bin/env python2

import os
import sys
import app
from models import YourModel as Model
from flask.ext.whooshalchemy import whoosh_index

sys.stdout  = os.fdopen(sys.stdout.fileno(), 'w', 0)
atatime     = 512

with app.app_context():
    index       = whoosh_index(app, Model)
    searchable  = Model.__searchable__
    print 'counting rows...'
    total       = int(Model.query.order_by(None).count())
    done        = 0
    print 'total rows: {}'.format(total)
    writer = index.writer(limitmb=10000, procs=16, multisegment=True)
    for p in Model.query.yield_per( atatime ):
        record = dict([(s, p.__dict__[s]) for s in searchable])
        record.update({'id' : unicode(p.id)}) # id is mandatory, or whoosh won't work
        writer.add_document(**record)
        done += 1
        if done % atatime == 0:
            print 'c {}/{} ({}%)'.format(done, total, round((float(done)/total)*100,2) ),

    print '{}/{} ({}%)'.format(done, total, round((float(done)/total)*100,2) )
    writer.commit()

You may want to play with the the parameters:

  • atatime - the number of records to pull from the database at once
  • limitmb - "max" megabytes to use
  • procs - cores to use in parallel

I used this to index around 360,000 records on an 8-core AWS instance. It took about 4 minutes, most of which was waiting for the (single-threaded) commit().

这篇关于Flask-WhooshAlchemy与现有的数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆