如何使用Python有效地将此解析的XML文档存储在MySQL数据库中? [英] How to efficiently store this parsed XML document in MySQL Database using Python?
问题描述
以下是XML文件:book.xml
Following is the XML file : book.xml
<?xml version="1.0" ?>
<!--Sample XML Document-->
<bookstore>
<book _id="E7854">
<title>
Sample XML Book
</title>
<author>
<name _id="AU363">
<first>
Benjamin
</first>
<last>
Smith
</last>
</name>
<affiliation>
A
</affiliation>
</author>
<chapter number="1">
<title>
First Chapter
</title>
<para>
B
<count>
783
</count>
.
</para>
</chapter>
<chapter number="3">
<title>
Third Chapter
</title>
<para>
B
<count>
59
</count>
.
</para>
</chapter>
</book>
<book _id="C843">
<title>
XML Master
</title>
<author>
<name _id="AU245">
<first>
John
</first>
<last>
Doe
</last>
</name>
<affiliation>
C
</affiliation>
</author>
<chapter number="2">
<title>
Second Chapter
</title>
<para>
K
<count>
54
</count>
.
</para>
</chapter>
<chapter number="3">
<title>
Third Chapter
</title>
<para>
K
<count>
328
</count>
.
</para>
</chapter>
<chapter number="7">
<title>
Seventh Chapter
</title>
<para>
K
<count>
265
</count>
.
</para>
</chapter>
<chapter number="9">
<title>
Ninth Chapter
</title>
<para>
K
<count>
356
</count>
.
</para>
</chapter>
</book>
</bookstore>
以下是Python代码:book_dom.py
Following is the Python code : book_dom.py
from xml.dom import minidom, Node
import re, textwrap
class SampleScanner:
def __init__(self, doc):
for child in doc.childNodes:
if child.nodeType == Node.ELEMENT_NODE and child.tagName == 'bookstore':
self.handleBookStore(child)
def gettext(self, nodelist):
retlist = []
for node in nodelist:
if node.nodeType == Node.TEXT_NODE:
retlist.append(node.wholeText)
elif node.hasChildNodes:
retlist.append(self.gettext(node.childNodes))
return re.sub('\s+', ' ', ''.join(retlist))
def handleBookStore(self, node):
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'book':
self.handleBook(child)
def handleBook(self, node):
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'title':
print "Book title is:", self.gettext(child.childNodes)
if child.tagName == 'author':
self.handleAuthor(child)
if child.tagName == 'chapter':
self.handleChapter(child)
def handleAuthor(self, node):
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'name':
self.handleAuthorName(child)
elif child.tagName == 'affiliation':
print "Author affiliation:", self.gettext([child])
def handleAuthorName(self, node):
surname = self.gettext(node.getElementsByTagName("last"))
givenname = self.gettext(node.getElementsByTagName("first"))
print "Author Name: %s, %s" % (surname, givenname)
def handleChapter(self, node):
print " *** Start of Chapter %s: %s" % (node.getAttribute('number'),
self.gettext(node.getElementsByTagName('title')))
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'para':
self.handlePara(child)
def handlePara(self, node):
partext = self.gettext([node])
partext = textwrap.fill(partext)
print partext
print
doc = minidom.parse('book.xml')
SampleScanner(doc)
输出:〜/ $ python book_dom .py
Output :~/$ python book_dom.py
Book ID : E7854
Book title is: Sample XML Book
Name ID : AU363
Author Name: Smith , Benjamin
Author affiliation: A
*** Start of Chapter 1: First Chapter
B 783 .
*** Start of Chapter 3: Third Chapter
B 59 .
Book ID : C843
Book title is: XML Master
Name ID : AU245
Author Name: Doe , John
Author affiliation: C
*** Start of Chapter 2: Second Chapter
K 54 .
*** Start of Chapter 3: Third Chapter
K 328 .
*** Start of Chapter 7: Seventh Chapter
K 265 .
*** Start of Chapter 9: Ninth Chapter
K 356 .
我的目的是将图书存储在图书表和作者信息作者表中保存书 - >作者关系)[MySQL DB]。
My aim is to store the Books in "Books" table and Author info in "Authors" table (preserving the book -> author relationship) [MySQL DB].
**Book table :**
id |title
E7854 Sample XML Book
....
**Chapter table :**
book_id|chapter_number|title |para
E7854 1 First Chapter B 783 .
E7854 3 Third Chapter B 59 .
....
**Author table :**
id |book_id |name |Affiliation
AU363 E7854 Smith Benjamin A
....
如果我有几千本书籍和作者,如何将数据存储在数据库中和章节)?我无法唯一识别每本书/作者的数据集。我可以使用这些ID并将它们传递给函数来保存关系,但是我不知道这是否是最好的方式。任何指针都非常感激。
How do I go about storing the data in the database if I have few thousand books and authors (and chapters)? I am having trouble with uniquely identifying the dataset for each book/author. I can use the IDs and pass them to the functions to preserve the relation but I am not sure if that is the best way to do it. Any pointers are highly appreciated.
p.s:我正在处理脚本的SQL部分,一旦我测试它将更新。随意张贴您的想法,代码示例。谢谢!
p.s : I am working on the SQL part of the script and will update once I test it. Feel free to post your thoughts, code samples. Thanks!
推荐答案
根据上面的评论,我将简单地创建一本书类,作者类,作者列表和一章课。将本书的章节分配给本书上的章节对象列表。将AuthorList保持为其ID的字母,指向实际的作者对象。使用Book对象的数据成员包含ID;您可以提供一种方法,以便方便地将作者拉出AuthorList dict。
Based on your comment above, I would simply create a book class, an author class, an author list, and a chapter class. Assign the chapters of the book to a list of Chapter objects on the Book itself. Maintain the AuthorList as a dict of their IDs, pointing to the actual Author objects. Use a data member of the Book object to contain the ID; you can provide a method to pull the author out of the AuthorList dict for convenience.
这篇关于如何使用Python有效地将此解析的XML文档存储在MySQL数据库中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!