如何使用SQLAlchemy for Scrapy创建pg_trgm索引? [英] How to create a pg_trgm index using SQLAlchemy for Scrapy?

查看:85
本文介绍了如何使用SQLAlchemy for Scrapy创建pg_trgm索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Scrapy从网络论坛中抓取数据。我正在使用SQLAlchemy将这些数据存储在PostgreSQL数据库中。该表和列创建良好,但是,我无法让SQLAlchemy在其中一列上创建索引。我正在尝试使用gin创建一个三字母组合索引(pg_trgm)。

I am using Scrapy to scrape data from a web forum. I am storing this data in a PostgreSQL database using SQLAlchemy. The table and columns create fine, however, I am not able to have SQLAlchemy create an index on one of the columns. I am trying to create a trigram index (pg_trgm) using gin.

将创建该索引的Postgresql代码为:

The Postgresql code that would create this index is:

CREATE INDEX description_idx ON table USING gin (description gin_trgm_ops);

我添加到我的models.py文件中的SQLAlchemy代码是:

The SQLAlchemy code I have added to my models.py file is:

desc_idx = Index('description_idx', text("description gin_trgm_ops"), postgresql_using='gin')

我已将此行添加到我的models.py中,但是当我检入postgresql时,从未创建过索引。

I have added this line to my models.py but when I check in postgresql, the index was never created.

下面是我的完整model.py和pipelines.py文件。

Below are my full models.py and pipelines.py files. Am I going about this all wrong??

任何帮助将不胜感激!

from sqlalchemy import create_engine, Column, Integer, String, DateTime, Index, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.engine.url import URL
import settings

DeclarativeBase = declarative_base()
def db_connect():
return create_engine(URL(**settings.DATABASE))

def create_forum_table(engine):
    DeclarativeBase.metadata.create_all(engine)    


class forumDB(DeclarativeBase):
    __tablename__ = "table"

    id = Column(Integer, primary_key=True)
    title = Column('title', String)
    desc = Column('description', String, nullable=True)
    desc_idx = Index('description_idx', text("description gin_trgm_ops"), postgresql_using='gin')



pipelines.py



pipelines.py

from scrapy.exceptions import DropItem
from sqlalchemy.orm import sessionmaker
from models import forumDB, db_connect, create_forum_table


class ScrapeforumToDB(object):
def __init__(self):
    engine = db_connect()
    create_forum_table(engine)
    self.Session = sessionmaker(bind=engine)

def process_item(self, item, spider):
    session = self.Session()
    forumitem = forumDB(**item)

    try:
        session.add(forumitem)
        session.commit()
    except:
        session.rollback()
        raise
    finally:
        session.close()

    return item


推荐答案

由于索引定义使用 text 表达式,它没有引用 Table table,该表具有由声明类 forumDB 隐式创建。将其与使用作为表达式或它的某些派生进行比较,就像这样:

Since the Index definition uses text expression it has no references to the Table "table", which has been implicitly created by the declarative class forumDB. Compare that to using a Column as expression, or some derivative of it, like this:

Index('some_index_idx', forumDB.title)

在以上定义中,索引

在您的情况下,这意味着 Table 表不知道存在这样的索引。将其添加为声明性类的属性是错误的方法。它应该传递给隐式创建的 Table 实例。属性 __ table_args __ 就是这样:

What this means in your case is that the Table "table" has no idea that such an index exists. Adding it as an attribute of the declarative class is the wrong way to do it. It should be passed to the implicitly created Table instance. The attribute __table_args__ is just for that:

class forumDB(DeclarativeBase):

    __tablename__ = "table"
    # Note: This used to use `text('description gin_trgm_ops')` instead of the
    # `postgresql_ops` parameter, which should be used.
    __table_args__ = (
        Index('description_idx', "description",
              postgresql_ops={"description": "gin_trgm_ops"},
              postgresql_using='gin'),
    )

    id = Column(Integer, primary_key=True)
    title = Column('title', String)
    desc = Column('description', String, nullable=True)

修改后,调用 create_forum_table(engine )结果:

With the modification in place, a call to create_forum_table(engine) resulted in:

> \d "table"
                                 Table "public.table"
   Column    |       Type        |                     Modifiers                      
-------------+-------------------+----------------------------------------------------
 id          | integer           | not null default nextval('table_id_seq'::regclass)
 title       | character varying | 
 description | character varying | 
Indexes:
    "table_pkey" PRIMARY KEY, btree (id)
    "description_idx" gin (description gin_trgm_ops)

这篇关于如何使用SQLAlchemy for Scrapy创建pg_trgm索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆