如何通过在Python API的查询方式插入SQL炼金术关系数据(多对多) [英] How to insert relational data (many to many) in SQL Alchemy by means of API queries in Python

查看:306
本文介绍了如何通过在Python API的查询方式插入SQL炼金术关系数据(多对多)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:我已经作出了<一个href=\"http://stackoverflow.com/questions/15902754/insert-relational-data-in-sql-alchemy-with-api-queries-in-python\">short问题因为我觉得这个人是太长了,不好意思

I have made a short question because I think this one is too long, sorry

首先,我是一个新人,数据库,编程语言等等......很抱歉,如果这个问题不那么正确,也不具体,任何帮助或指导将是非常美联社preciated ...

First of all, I am a newcomer to databases, programming languages and so on... so sorry if this question is not so proper nor specific, any help or guidance would be much appreciated...

我一起工作的背景是这样的:我被它的API的方式,以获取某些信息来设计我自己的数据库查询现有的数据库

The context I am working with is the following: I am querying an existing database by means of its APIs in order to retrieve certain information to design my own database.

要创建这个数据库的一点是,例如让用户引入基因知道在生物体是在(UP)或下(DOWN)前pressed,并在其中实验这种类型恩pression已经看到。

The point to create this database is for example to let the user introduce a gene to know where in the organism it is over (UP) or under (DOWN) expressed, and in which experiment this type of expression has been seen.

有关目前,我在做什么只是查询现有的数据库和解析JSON结果获得每个有机体的一部分,所有这些都超过或低于-EX pressed(以及每个基因的基因我获得以及其中,该类型的前pression已报导的实验)

For the time being, what I'm doing is just querying the existing database and parsing the json result to obtain for each organism part, all the genes that are over or under-expressed (and for each gene I obtain as well the experiments where that type of expression has been reported)

(在大脑)

基因1

Experiment1     UP
Experiment2     UP
Experiment3     UP
Experiment4     DOWN

基因2

Experiment5     DOWN
Experiment2     DOWN
Experiment3     DOWN
Experiment8     UP
Experiment9     DOWN

不同的表,我想我会需要的是:基因,机关,实验和型前pression的(和genes2experiments2organs)

Different tables I think I will need are: "genes", "organs", "experiments" and "type of expression" (and "genes2experiments2organs")

考虑到,一个基因可以是在一个以上的organism_part pssed前$ P $并且可以有不同的类型相关的一个以上的实验前pression的,和一个实验中可包括一个以上的基因(多许多关系)

Take into account that one gene can be expressed in more than one organism_part and can have different types of expression related to more than one experiment, and one experiment can comprise more than one gene (many to many relationships)

我想知道的第一件事就是如何添加关系数据和知道,如果我试图去正确的方向,或者我应该更改数据库的架构/主意......

What I would like know first is how to add relational data and know if my attempt is going to the right direction or should I change the schema/idea of the database...

我的第一次尝试是这样的:

My first attempt is this:

###########################################
DATABASE DEFINITION
###########################################

from sqlalchemy import create_engine, Column, Integer, String, Date, ForeignKey, Table, Float
from sqlalchemy.orm import sessionmaker, relationship, backref
from sqlalchemy.ext.declarative import declarative_base
import requests

Base = declarative_base()

Genes2experiments2organs = Table('genes2experiments2organs',Base.metadata,                                                                                                                                                                                                     
  Column('gene_id', String, ForeignKey('genes.id')),                                                                                                                                                                                                       
  Column('experiment_id', String, ForeignKey('experiments.id')),
  Column('organ_id', String, ForeignKey('organs.id'))
)

class Genes(Base):
    __tablename__ = 'genes'
    id = Column(String(45), primary_key=True)
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<genes(id:'%s')>" % (self.id)

class Experiments(Base):
    __tablename__ = 'experiments'
    id = Column(String(45), primary_key=True)
    experiments = relationship("Experiments", secondary=Genes2experiments2organs, backref="genes")
    organs = relationship("Organs", secondary=Genes2experiments2organs, backref="genes")
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<experiments(id:'%s')>" % (self.id)

class Organs(Base):
    __tablename__ = 'organs'
    id = Column(String(45), primary_key=True)
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<organs(id:'%s')>" % (self.id)

class Expression_type(Base):
    __tablename__ = 'expression_type'
    id = Column(String(45), primary_key=True)
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<expression_type(id:'%s')>" % (self.id)

#####################################################
INSERTING DATA
#####################################################

def setUp():
    global Session
    engine=create_engine('mysql://root:password@localhost/db_name?charset=utf8', pool_recycle=3600,echo=False)
    Session=sessionmaker(bind=engine)

def add_data():   ## I am just adding genes without taking into account the other related data to these genes.....
    session=Session()
    for i in range(0,1000,200):
        request= requests.get('http://www.ebi.ac.uk/gxa/api/v1',params={"updownInOrganism_part":"brain","rows":200,"start":i})
        result = request.json
        for item in result['results']:
            gene_to_add = item['gene']['ensemblGeneId']           
    session.commit()
    session.close()       


setUp()
add_data()
session=Session()
genes=session.query(Genes).all()
print "List of genes introduced:"
for gene in genes:
    print gene.id
session.close()

所以,这个code我只是填充基因表,但没有考虑到与其他数据存在的关系,我将在数据库中包含...什么是做这样的程序首先,将关系数据?和方法,以避免API查询??

So, with this code I just populate "genes" table, but without taking into account the relationships that exist with the other data I will have to include in the database... what is the procedure to do such thing, adding relational data? And a way to avoid inserting duplicate genes for instance when populating the table by means of API queries??

顺便说一句,你可以看到,我并没有把所有的多对多关系(次),只需在基因表,因为我也不知道我会正确或完全错误的...谢谢

By the way, as you can see, I didn't put all many to many relationships (secondary), just in "genes" table because I am no sure if I am going right or completely wrong... thank you

推荐答案

这应该做你在找什么...

This should do what you are looking for ...

from sqlalchemy import (Column, create_engine, Integer, ForeignKey, Unicode,
                        Enum)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship

Base = declarative_base()

class Gene(Base):
    __tablename__ = 'gene'

    id = Column(Integer, primary_key=True)
    name = Column(Unicode(64), unique=True)

    def __init__(self, name):
        self.name = name

class Experiment(Base):
    __tablename__ = 'experiment'

    id = Column(Integer, primary_key=True)

class Organ(Base):
    __tablename__ = 'organ'

    id = Column(Integer, primary_key=True)
    name = Column(Unicode(64), unique=True)

    def __init__(self, name):
        self.name = name

class Measurement(Base):
    __tablename__ = 'measurement'

    id = Column(Integer, primary_key=True)
    experiment_id = Column(Integer, ForeignKey(Experiment.id))
    gene_id = Column(Integer, ForeignKey(Gene.id))
    organ_id = Column(Integer, ForeignKey(Organ.id))

    # Add your measured values here
    expression = Column(Enum('UP', 'DOWN'))
    # ...

    experiment = relationship(Experiment, backref='measurements')
    gene = relationship(Gene, backref='measurements')
    organ = relationship(Organ, backref='measurements')

    def __repr__(self):
        return 'Experiment %d: %s, %s, %s' % (self.experiment.id,
                         self.gene.name, self.organ.name, self.expression)

if __name__ == '__main__':
    engine = create_engine('sqlite://')
    session = sessionmaker(engine)()
    Base.metadata.create_all(engine)

    #
    # Creating the data
    #

    x = Gene('Gene X')
    y = Gene('Gene Y')
    z = Gene('Gene Z')

    heart = Organ('Heart')
    lungs = Organ('Lungs')
    brain = Organ('Brain')

    session.add_all([x, y, z, heart, lungs, brain])
    session.commit()

    experiment_1 = Experiment()
    experiment_1.measurements.extend(
            [Measurement(gene_id=x.id, organ_id=heart.id, expression='UP'),
             Measurement(gene_id=x.id, organ_id=lungs.id, expression='UP'),
             Measurement(gene_id=x.id, organ_id=brain.id, expression='DOWN'),
             Measurement(gene_id=y.id, organ_id=brain.id, expression='UP'),
             Measurement(gene_id=z.id, organ_id=brain.id, expression='DOWN')])

    experiment_2 = Experiment()
    experiment_2.measurements.extend(
            [Measurement(gene_id=y.id, organ_id=lungs.id, expression='UP'),
             Measurement(gene_id=y.id, organ_id=lungs.id, expression='UP'),
             Measurement(gene_id=y.id, organ_id=brain.id, expression='UP'),
             Measurement(gene_id=x.id, organ_id=brain.id, expression='UP'),
             Measurement(gene_id=z.id, organ_id=heart.id, expression='UP')])

    session.add_all([experiment_1, experiment_2])
    session.commit()

    #
    # Querying the data
    #

    print('All measurements in the first experiment')
    experiment = session.query(Experiment).filter(Experiment.id == 1).one()
    for measurement in experiment.measurements:
        print(measurement)
    print('')

    print('All measurements of Gene X')
    gene_x = session.query(Gene).filter(Gene.name == 'Gene X').one()
    for measurement in gene_x.measurements:
        print(measurement)
    print('')

    print('All measurements of the brain')
    the_brain = session.query(Organ).filter(Organ.name == 'Brain').one()
    for measurement in the_brain.measurements:
        print(measurement)
    print('')

这篇关于如何通过在Python API的查询方式插入SQL炼金术关系数据(多对多)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆