带换行符的列名 [英] Column names with line breaks

查看:154
本文介绍了带换行符的列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道对于PostgreSQL中的文本字符串,换行符是通过在符号前面附加符号 E e 来统一的文本:

I know that for text strings in PostgreSQL line breaks are unified by appending symbol E or e in front of the text:

SELECT E'first\nsecond'

结果:

first
second

但是PostgreSQL也支持列名中的换行符-不知道这种做法为什么或多么邪恶,但是可以做到以下内容:

But PostgreSQL also support line breaks within column names - not sure why or how evil this practice is, but one can do the following:

CREATE TABLE One("first\nsecond" text);
CREATE TABLE Two("first
second" text);

当您不幸遇到其中之一时,您会发现虽然这些查询有效:

When you are unfortunate enough to run into one of these, you would find that while these queries work:

SELECT "first\nsecond" from One;
SELECT "first
second" from Two;

这些没有:

SELECT "first
second" from One;
SELECT "first\nsecond" from Two;

我的问题是:PostgreSQL中是否有一种方法可以统一这种差异,类似于列值?

My question is: Is there a way in PostgreSQL that unifies such differences, similar to the situation with the column values?

我尝试将 E 放在 first\nsecond前面 列名,但不受支持。尝试使用 \r\n 代替(我使用Windows)给了我第三种列名,只能查询为:

I have tried putting E in front of "first\nsecond" column names, but it is not supported. Trying to put \r\n instead (I'm using Windows) gave me a third type of column names, one that can only be queried as:

SELECT "first\r\nsecond" FROM Third


推荐答案

列名是标识符,有关标识符语法的详细内容,请参见:

Column names are identifiers, and the gory details of the syntax for identifiers are described at:

http:// www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

TL; DR :使用 U& ... 语法通过其Unicode代码点将不可打印的字符注入到标识符中,并且无法统一 CR,LF LF 单独使用。

TL;DR: use the U&"..." syntax to inject non-printable characters into identifiers through their Unicode codepoints, and there's no way to unify CR,LF with LF alone.

我们被允许在标识符中使用Unicode转义序列,因此对于每个文档,t他的以下方法确实起作用:

We're allowed to use Unicode escape sequences in identifiers, so per documentation, the following does work:

select U&"first\000asecond" from Two;

如果只是两个单词之间的换行符。

if it's just a newline character between the two words.

创建表的方式是:

CREATE TABLE One("first\nsecond" text);

由于反斜杠字符在此处没有特殊含义,因此此列不包含任何换行符。
它包含第一个,然后是 \ ,然后是 n 后跟
这样:

As the backslash character has no special meaning here, this column does not contain any newline. It contains first followed by \ followed by n followed by second. So:

 SELECT "first\nsecond" from One;

之所以起作用,是因为它与 CREATE TABLE

does work because it's the same as what's in the CREATE TABLE

SELECT "first
second" from One;

失败,因为该SELECT中有换行符,其中表中的实际列名带有反斜杠,后跟a n

fails because there's a newline in that SELECT where the actual column name in the table has a backslash followed by a n.

这与 One相反。

CREATE TABLE Two("first
second" text);

换行符是逐字记录的,并且是列的一部分。

The newline is taken verbatim and is part of the column. So

SELECT "first
second" from Two;

可以正常工作,因为换行符与CREATE TABLE中的一样,
带有嵌入式换行符,
,而

works because the newline is there exactly as in the CREATE TABLE, with an embedded newline, whereas

SELECT "first\nsecond" from Two;

失败,因为像以前的 \n 此上下文并不意味着要换行。

fails because as previously \n in this context does not mean a newline.

注释和您的修改,这可能是回车和换行符,在这种情况下,应该执行以下操作:

As mentioned in comments and your edit, this could be carriage return and newline instead, in which case the following should do:

select U&"first\000d\000asecond" from Two;

尽管在我的测试中,使用 psql在列中间按Enter 在Unix和Windows上具有相同的效果:列名称中只有一个换行符。

although in my test, hitting Enter in the middle of a column with psql on Unix and Windows has the same effect: a single newline in the column's name.

检查列名称中到底包含哪些确切字符,我们可以用十六进制对其进行检查。

To check what exact characters ended up in a column name, we can inspect them in hexadecimal.

应用于Unix上psql内部的创建表示例时,

When applied to your create table example, from inside psql under Unix:

CREATE TABLE Two("first
second" text);

select convert_to(column_name::text,'UTF-8')
 from information_schema.columns 
 where table_schema='public'
   and table_name='two';

结果是:

        convert_to         
----------------------------
 \x66697273740a7365636f6e64

对于更复杂的情况(例如,在UTF-8中具有多个字节的非ASCII字符),可能需要更高级的查询帮助,以获得易于阅读的代码点:

For more complex cases (e.g. non-ascii characters with several bytes in UTF-8), a more advanced query might help, for easy-to-read codepoints:

select c,lpad(to_hex(ascii(c)),4,'0') from (
  select regexp_split_to_table(column_name::text,'')  as c
    from  information_schema.columns
    where table_schema='public'
    and table_name='two'
  ) as g;

 c | lpad 
---+------
 f | 0066
 i | 0069
 r | 0072
 s | 0073
 t | 0074
  +| 000a
   | 
 s | 0073
 e | 0065
 c | 0063
 o | 006f
 n | 006e
 d | 0064

这篇关于带换行符的列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆