通过从文本文件中读取数据自动加载 SQL 表 [英] Automatically Load SQL table by reading data from text file

查看:12
本文介绍了通过从文本文件中读取数据自动加载 SQL 表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个 python 脚本,该脚本将使用 SQL 加载我在 pyhton 中创建的表,并使用来自文本文件的数据自动填充它们.我坚持基本编码.我确实有一个大致的想法,但是当我尝试运行这种方法时出现错误.我创建了 2 个表.我已经阅读了文件.该文件是一个逗号分隔的文本文件,没有标题.

I am trying to write a python script that is going to load the tables that I created in pyhton using SQL and populate them with data automatically that is coming from a text file. I am stuck on basic coding. I do have a general idea but I am getting errors when I try to run this approach. I have created 2 tables. I have read the file. the file is a comma seperated text file with no headers.

文件的前 3 行如下所示.

first 3 lines of the file looks like this.

+ ---- + ----- + -------------------- + -------- + - + --- + ----- +
| John | Smith | 111 N. Wabash Avenue | plumber  | 5 | 1.0 | 200   |
| John | Smith | 111 N. Wabash Avenue | bouncer  | 5 | 1.0 | 200   |
| Jane | Doe   | 243 S. Wabash Avenue | waitress | 1 | 5.0 | 10000 |
+ ---- + ----- + -------------------- + -------- + - + --- + ----- +

import sqlite3
conn= sqlite3.connect('csc455.db')
c = conn.cursor()

#Reading the data file
fd = open ('C:/Users/nasia/Documents/data_hw2.txt','r')
data = fd.readlines()

#Creating Tables
>>> L = """create table L
... (first text, last text, address text, job text, LNum integer,
... constraint L_pk
... primary key(first, last, address, job),
... constraint L_fk
... foreign key (LNum) references LN(LNum)
... );"""
>>> c.execute(L)

LN = """create table LN
... (
... LNum integer, Interest float, Amount, Integer,
... constraint LN_pk
 ... primary key (LNum)
... );"""
 c.execute(LN)

#Inserting into database
for elt in data:
...     currentRow = elt.split(", ")[:-1]
...     insert = """(insert into LN values (%s, %s, %s);, %(currentRow[4], currentRow[5], currentRow[6]))"""
...     c.execute(insert)

这里有一些语法错误.代码停止工作.我无法弄清楚我做错了什么.错误是回溯(最近一次调用最后一次):文件",第 4 行,在OperationalError:接近(":语法错误

There is some syntax error here. The code stops working. I cannot figure out what I am doing wrong. The error is Traceback (most recent call last): File "", line 4, in OperationalError: near "(": syntax error

我不知道我做错了什么

推荐答案

你没有解释数据是什么格式,或者你的表结构是什么,或者你想如何映射它们,这使得这个问题很难回答.但我会自己编造并回答这个问题,希望它会有所帮助:

You haven't explained what format the data are in, or what your table structure is, or how you want to map them, which makes this difficult to answer. But I'll make up my own, and answer that, and hopefully it will help:

infile.txt:

infile.txt:

CommonName,Species,Location,Color
Black-headed spider monkey,Ateles fusciceps,Ecuador,black
Central American squirrel monkey,Saimiri oerstedii,Costa Rica,orange
Vervet,Chlorocebus pygerythrus,South Africa,white

脚本.py

import csv
import sqlite3

db = sqlite3.connect('outfile.db')
cursor = db.cursor()
cursor.execute('CREATE TABLE Monkeys (Common Name, Color, Species)')
cursor.execute('''CREATE TABLE MonkeyLocations (Species, Location,
                  FOREIGN KEY(Species) REFERENCES Monkeys(Species))''')
with open('infile.txt') as f:
    for row in csv.DictReader(f):
        cursor.execute('''INSERT INTO Monkeys 
                          VALUES (:CommonName, :Color, :Species)''', row)
        cursor.execute('''INSERT INTO MonkeyLocations 
                          VALUES (:Species, :Location)''', row)
db.commit()
db.close()

当然,如果您的真实数据采用 CSV 以外的其他格式,您将使用不同的代码来解析输入文件.

Of course if your real data are in some other format than CSV, you'll use different code to parse the input file.

我还做了一些比您的真实数据可能需要处理的事情稍微复杂一些的事情——CSV 列的名称与 SQL 列的名称不同.

I've also made things slightly more complex than your real data might have to deal with—the CSV columns don't have quite the same names as the SQL columns.

在其他方面,您的数据可能更复杂——例如,如果您的架构具有引用自动递增行 ID 而不是文本字段的外键,则您需要在第一次插入后获取 rowid.

In other ways, your data might be more complex—e.g., if your schema has foreign keys that reference an auto-incremented row ID instead of a text field, you'll need to get the rowid after the first insert.

但这应该足以让您有想法了.

But this should be enough to give you the idea.

既然您已经展示了更多细节……您走在了正确的轨道上(尽管调用 readlines 而不是直接迭代 fd 是一种浪费,而且您应该关闭您的数据库和文件,理想情况下使用 with 语句,...),但是您在接近末尾时遇到了一个简单的错误,使您无法继续前进:

Now that you've shown more details… you were on the right track (although it's wasteful to call readlines instead of just iterating over fd directly, and you should close your db and file, ideally with a with statement, …), but you've got a simple mistake right near the end that prevents you from getting any farther:

insert = """(insert into LN values (%s, %s, %s);, %(currentRow[4], currentRow[5], currentRow[6]))"""
c.execute(insert)

您已将格式化 % 表达式直接放入字符串中,而不是在字符串上使用运算符.我认为你想要做的是:

You've put the formatting % expression directly into the string, instead of using the operator on the string. I think what you were trying to do is:

insert = """insert into LN values (%s, %s, %s);""" % (currentRow[4], currentRow[5], currentRow[6])
c.execute(insert)

但是,您不应该这样做.相反,请执行以下操作:

However, you shouldn't do that. Instead, do this:

insert = """insert into LN values (?, ?, ?);"""
c.execute(insert, (currentRow[4], currentRow[5], currentRow[6]))

有什么区别?

嗯,第一个只是将值作为 Python 字符串插入到语句中.这意味着您必须自己处理转换为正确的格式、引用、转义等,而不是让数据库引擎决定如何处理每个值.除了当您尝试保存布尔值或忘记引用字符串时令人沮丧的错误的来源之外,这还使您对 持开放态度SQL 注入攻击,除非您非常小心.

Well, the first one just inserts the values into the statement as Python strings. That means you have to take care of converting to the proper format, quoting, escaping, etc. yourself, instead of letting the database engine decide how to deal with each value. Besides being a source of frustrating bugs when you try to save a boolean value or forget to quote a string, this also leaves you open to SQL injection attacks unless you're very careful.

除此之外还有其他问题.例如,大多数数据库会尝试缓存重复的语句,很容易告诉 insert into LN values (?, ?, ?) 的 3000 个实例都是相同的语句,但不那么重要insert into LN values (5, 1.0, 200)insert into LN values (1, 5.0, 5000) 是同一个语句.

There are other problems besides that one. For example, most databases will try to cache repeated statements, and it's trivial to tell that 3000 instances of insert into LN values (?, ?, ?) are all the same statement, but less so to tell that insert into LN values (5, 1.0, 200) and insert into LN values (1, 5.0, 5000) are the same statement.

这篇关于通过从文本文件中读取数据自动加载 SQL 表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆