PostgreSQL数据:数组到字符串的说明 [英] Postgresql Data: Array To String Clarification

查看:1205
本文介绍了PostgreSQL数据:数组到字符串的说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我当前正在执行一项任务,该任务会将日期从PostgreSQL迁移到另一个PostgreSQL数据库。我在网上搜索了一个字段的数据,需要将其分为三列(例如,father_name,f_name,f_middle_name,f_last_name),我认为我可以使用string_to_array来完成此任务。现在我的问题是如何将字符串的数组索引分配给目标DB的字段(目标DB具有f_name,f_middle_name,f_last_name,而源DB仅具有father_name字段)。

I am currently working on a task that will migrate a date from PostgreSQL to another PostgreSQL database. One field's data needs to be splitted into three columns (e.g. father_name, needs to split to f_name, f_middle_name, f_last_name) I searched over the net and I think I can use string_to_array for this task. Now my problem is how to assign the array index of string to the fields of the destination DB (destination DB has f_name, f_middle_name, f_last_name while source DB has only father_name field).

    cur_t.execute("""
    SELECT TRANSLATE(studentnumber, '- ', ''), string_to_array(father_name)
    cur_p.execute(""" INSERT INTO "a_recipient" (student_id, f_name,   f_middle_name, f_last_name) VALUES ('%s', '%s', '%s', '%s') """ % (row[0]
row[1][0], row[1][1], row[1][2]))

我只是不知道如何访问数组的索引并将其分配为目标字段上的值。

I just don't know how to access the index of the array and assign it as value on the destination fields.

参考文献: string_to_array string_to_array

有什么建议吗?

推荐答案

可以将数组变成一组列,而您没有固定的一组列。例如,如果将 father_name 分成三部分,这对于 John Wilkes Booth 来说很好,但是 Yarrow Hock ?还是Beyoncé?还是 Bernal Diaz Del Castillo ?您需要的不仅仅是分解空格。

While it is possible to turn an array into a set of columns you won't have a fixed set of columns. For example, if you split father_name into three pieces that's fine for John Wilkes Booth but what about Yarrow Hock? Or Beyoncé? Or Bernal Diaz Del Castillo? You need something more intelligent than just splitting on whitespace.

可以用Postgresql编写某些内容,可能是存储过程,用Python进行数据转换比较容易(虽然较慢)。由于无论如何都必须通过Python运行数据(或做一些复杂的工作来链接两个数据库),并且(希望)这是一次性的事情,因此性能并不重要。

While you could write something in Postgresql, probably as a stored procedure, it's easier, though slower, to do the data transforms in Python. Since you have to run the data through Python anyway (or do something complicated to link the two databases), and since this is (hopefully) a one time thing, performance isn't critical.

我不太擅长Python,但是会是这样。

I'm not very good at Python, but it would be something like this.

cur_t.execute("""SELECT studentnumber, father_name FROM something""")

for row in cur_t:
    father = parse_name(row['father_name'])
    student_id = fix_studentnumber(row['studentnumber'])

    cur_p.execute("""
        INSERT INTO "a_recipient" (student_id, f_name, f_middle_name, f_last_name)
        VALUES ('%s', '%s', '%s', '%s')
        """ % (student_id, father['first'], father['middle'], father['last'])
    )

然后您要写 parse_name fix_studentnumber 以及任何其他必要的函数来清理Python中的数据。您可以对它们进行单元测试。

Then you'd write parse_name and fix_studentnumber and any other necessary functions to clean up the data in Python. And you can unit test them.

注意:因为按数字访问列(即 row [5] )很难阅读和维护,您可能要使用 conn_t.cursor(cursor_factory = psycopg2.extras.DictCursor),以便按上述方式访问名称列。

Note: because accessing columns by number (ie. row[5]) is difficult to read and maintain you'll probably want to use conn_t.cursor(cursor_factory=psycopg2.extras.DictCursor) so you can access columns by name as I have above.

这篇关于PostgreSQL数据:数组到字符串的说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆