PostgreSQL:LEFT JOIN创建空白行 [英] PostgreSQL: LEFT JOIN creates blank row

查看:179
本文介绍了PostgreSQL:LEFT JOIN创建空白行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请参阅本说明末尾的重要新发现1和2.

我正在运行Postgres 9.1.3,并且遇到一个奇怪的左连接问题.

我有一个名为 consistent.master 的表,其中有超过200万行.它有一个名为 citation_id 的列,并且该列没有空值.我可以通过以下方式验证这一点:

SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NULL

返回 0 .

这是很奇怪的地方:如果我左移该表到临时表,则会收到一个错误,我试图在 citation_id 字段中插入空值:

错误:列"citation_id"中的空值违反了非空约束 SQL状态:23502

以下是查询:

WITH stops AS (
    SELECT citation_id,
           rank() OVER (ORDER BY offense_timestamp,
                     defendant_dl,
                     offense_street_number,
                     offense_street_name) AS stop
    FROM   consistent.master
    WHERE  citing_jurisdiction=1
)

INSERT INTO consistent.masternew (arrest_id, citation_id, defendant_dl, defendant_dl_state, defendant_zip, defendant_race, defendant_sex, defendant_dob, vehicle_licenseplate, vehicle_licenseplate_state, vehicle_registration_expiration_date, vehicle_year, vehicle_make, vehicle_model, vehicle_color, offense_timestamp, offense_street_number, offense_street_name, offense_crossstreet_number, offense_crossstreet_name, offense_county, officer_id, offense_code, speed_alleged, speed_limit, work_zone, school_zone, offense_location, id, source, citing_jurisdiction, the_geom)

SELECT stops.stop, master.citation_id, defendant_dl, defendant_dl_state, defendant_zip, defendant_race, defendant_sex, defendant_dob, vehicle_licenseplate, vehicle_licenseplate_state, vehicle_registration_expiration_date, vehicle_year, vehicle_make, vehicle_model, vehicle_color, offense_timestamp, offense_street_number, offense_street_name, offense_crossstreet_number, offense_crossstreet_name, offense_county, officer_id, offense_code, speed_alleged, speed_limit, work_zone, school_zone, offense_location, id, source, citing_jurisdiction, the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id

我正在为此抓挠头.如果这是 LEFT JOIN ,并且如果 consistent.master 是该联接的左表,则此查询如何在 citation_id 列中创建空值什么时候没有开始?

这是我用来创建表的SQL代码:

CREATE TABLE consistent.masternew
(
  arrest_id character varying(20),
  citation_id character varying(20) NOT NULL,
  defendant_dl character varying(20),
  defendant_dl_state character varying(2),
  defendant_zip character varying(9),
  defendant_race character varying(10),
  defendant_sex character(1),
  defendant_dob date,
  vehicle_licenseplate character varying(10),
  vehicle_licenseplate_state character(2),
  vehicle_registration_expiration_date date,
  vehicle_year integer,
  vehicle_make character varying(20),
  vehicle_model character varying(20),
  vehicle_color character varying,
  offense_timestamp timestamp without time zone,
  offense_street_number character varying(10),
  offense_street_name character varying(30),
  offense_crossstreet_number character varying(10),
  offense_crossstreet_name character varying(30),
  offense_county character varying(10),
  officer_id character varying(20),
  offense_code integer,
  speed_alleged integer,
  speed_limit integer,
  work_zone bit(1),
  school_zone bit(1),
  offense_location point,
  id serial NOT NULL,
  source character varying(20), -- Where this citation came from--court, PD, etc.
  citing_jurisdiction integer,
  the_geom geometry,
  CONSTRAINT masternew_pkey PRIMARY KEY (id ),
  CONSTRAINT citing_jurisdiction FOREIGN KEY (citing_jurisdiction)
      REFERENCES consistent.jurisdictions (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT offenses FOREIGN KEY (offense_code)
      REFERENCES consistent.offenses (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
  CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL),
  CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 3081)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE consistent.masternew
  OWNER TO postgres;
COMMENT ON COLUMN consistent.masternew.source IS 'Where this citation came from--court, PD, etc.';

CREATE INDEX masternew_citation_id_idx
  ON consistent.masternew
  USING btree
  (citation_id COLLATE pg_catalog."default" );

CREATE INDEX masternew_citing_jurisdiction_idx
  ON consistent.masternew
  USING btree
  (citing_jurisdiction );

CREATE INDEX masternew_defendant_dl_idx
  ON consistent.masternew
  USING btree
  (defendant_dl COLLATE pg_catalog."default" );

CREATE INDEX masternew_id_idx
  ON consistent.masternew
  USING btree
  (id );

CREATE INDEX masternew_offense_street_name_idx
  ON consistent.masternew
  USING btree
  (offense_street_name COLLATE pg_catalog."default" );

CREATE INDEX masternew_offense_street_number_idx
  ON consistent.masternew
  USING btree
  (offense_street_number COLLATE pg_catalog."default" );

CREATE INDEX masternew_offense_timestamp_idx
  ON consistent.masternew
  USING btree
  (offense_timestamp );

CREATE INDEX masternew_the_geom_idx
  ON consistent.masternew
  USING gist
  (the_geom );

重要发现1

我刚刚发现了一些有趣的东西.此查询:

SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NOT NULL
UNION
SELECT COUNT(*)
FROM consistent.master
UNION
SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NULL

结果是:

2085344
2085343
0

我怎么可能解释呢? WHERE citation_id IS NOT NULL的计数怎么可能比没有WHERE子句的同一查询高?

重要发现2 好的,根据下面的评论,我发现我的一行中都有所有空值,这是尽管事实,因为该表具有一个串行id列和一些NOT NULL约束. /p>

我删除了流浪汉行.现在我没有得到空错误.相反,我得到了这个:

ERROR:  duplicate key value violates unique constraint "masternew_pkey"
DETAIL:  Key (id)=(1583804) already exists.

********** Error **********

ERROR: duplicate key value violates unique constraint "masternew_pkey"
SQL state: 23505
Detail: Key (id)=(1583804) already exists.

因此,请确保执行以下查询:

SELECT COUNT(id)
FROM consistent.master
WHERE id=1583804;

猜猜是什么? consistent.master只有1个实例!因此,假设LEFT JOIN中的左侧表格citation_id 中只有1个 1583804 实例,并且 id 列只能来自左边的表格,怎么可能发生此错误?像这样的LEFT JOIN应该不会导致最终结果的行多于左表,对吧?

解决方案

对于INSERT(尤其是复杂的INSERT),您应该始终定义目标列.所以做到这一点:

INSERT INTO consistent.masternew (citation_id, col1, col2, ...)

如果在随附的SELECT语句中出了什么问题-像这样:

the_geom geometry

(用类型名重命名该列是没有意义的-我认为这是意料之外的)-否则,如果基础表定义发生更改,则没有定义目标列的INSERT语句可能会出错.

PostgreSQL不会在SELECT语句中强制执行与目标表相同的列数.我引用关于此的精美手册:

显式或隐式列列表中不存在的每个列将 用默认值填充,该默认值可以是其声明的默认值或 如果没有,则为null.

(加粗强调).如果列列表中不匹配,则可能使NULL值无处不在".

此外,SELECT语句中的列顺序必须与要插入的列顺序匹配.如果未详细说明目标列,则这将是表中列在创建时的顺序.
您似乎希望按名称自动匹配列,但事实并非如此. SELECT语句中的列名称与INSERT的最后一步完全无关.只有他们从左到右的顺序才有意义.

与其他人暗示的 WITH子句完全合法相反.我引用关于插入的手册:

查询(SELECT语句)也可能包含一个WITH 条款.在这种情况下,两组with_query都可以被引用 在查询中,但是第二个优先,因为它更多 紧密嵌套.

您的声明可能如下所示:

WITH stops AS (
    SELECT citation_id
          ,rank() OVER (ORDER BY
                    offense_timestamp
                   ,defendant_dl
                   ,offense_street_number
                   ,offense_street_name) AS stop
    FROM   consistent.master
    WHERE  citing_jurisdiction = 1
    )
INSERT INTO consistent.masternew (citation_id, col1, col2, ...) -- add columns
SELECT m.citation_id -- order colums accordingly!
      ,s.stop
      ,m.defendant_dl
        -- 27 more columns
      ,m.citing_jurisdiction
      ,m.the_geom
FROM   consistent.master m
LEFT   JOIN stops s USING (citation_id);

See important new discoveries 1 and 2 at end of this explanation.

I am running Postgres 9.1.3 and am having a weird left join issue.

I have a table named consistent.master with over 2 million rows. It has a column named citation_id, and that column has no nulls. I can verify that with this:

SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NULL

That returns 0.

Here's where it gets weird: if I LEFT JOIN this table to a temporary table, I get an error that I am trying to insert a null into the citation_id field:

ERROR: null value in column "citation_id" violates not-null constraint SQL state: 23502

Here's the query:

WITH stops AS (
    SELECT citation_id,
           rank() OVER (ORDER BY offense_timestamp,
                     defendant_dl,
                     offense_street_number,
                     offense_street_name) AS stop
    FROM   consistent.master
    WHERE  citing_jurisdiction=1
)

INSERT INTO consistent.masternew (arrest_id, citation_id, defendant_dl, defendant_dl_state, defendant_zip, defendant_race, defendant_sex, defendant_dob, vehicle_licenseplate, vehicle_licenseplate_state, vehicle_registration_expiration_date, vehicle_year, vehicle_make, vehicle_model, vehicle_color, offense_timestamp, offense_street_number, offense_street_name, offense_crossstreet_number, offense_crossstreet_name, offense_county, officer_id, offense_code, speed_alleged, speed_limit, work_zone, school_zone, offense_location, id, source, citing_jurisdiction, the_geom)

SELECT stops.stop, master.citation_id, defendant_dl, defendant_dl_state, defendant_zip, defendant_race, defendant_sex, defendant_dob, vehicle_licenseplate, vehicle_licenseplate_state, vehicle_registration_expiration_date, vehicle_year, vehicle_make, vehicle_model, vehicle_color, offense_timestamp, offense_street_number, offense_street_name, offense_crossstreet_number, offense_crossstreet_name, offense_county, officer_id, offense_code, speed_alleged, speed_limit, work_zone, school_zone, offense_location, id, source, citing_jurisdiction, the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id

I'm scratching my head on this one. If this is a LEFT JOIN, and if consistent.master is the join's left table, how could this query create null values in the citation_id column when there aren't any to begin with?

Here's the SQL code I used to create the table:

CREATE TABLE consistent.masternew
(
  arrest_id character varying(20),
  citation_id character varying(20) NOT NULL,
  defendant_dl character varying(20),
  defendant_dl_state character varying(2),
  defendant_zip character varying(9),
  defendant_race character varying(10),
  defendant_sex character(1),
  defendant_dob date,
  vehicle_licenseplate character varying(10),
  vehicle_licenseplate_state character(2),
  vehicle_registration_expiration_date date,
  vehicle_year integer,
  vehicle_make character varying(20),
  vehicle_model character varying(20),
  vehicle_color character varying,
  offense_timestamp timestamp without time zone,
  offense_street_number character varying(10),
  offense_street_name character varying(30),
  offense_crossstreet_number character varying(10),
  offense_crossstreet_name character varying(30),
  offense_county character varying(10),
  officer_id character varying(20),
  offense_code integer,
  speed_alleged integer,
  speed_limit integer,
  work_zone bit(1),
  school_zone bit(1),
  offense_location point,
  id serial NOT NULL,
  source character varying(20), -- Where this citation came from--court, PD, etc.
  citing_jurisdiction integer,
  the_geom geometry,
  CONSTRAINT masternew_pkey PRIMARY KEY (id ),
  CONSTRAINT citing_jurisdiction FOREIGN KEY (citing_jurisdiction)
      REFERENCES consistent.jurisdictions (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT offenses FOREIGN KEY (offense_code)
      REFERENCES consistent.offenses (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
  CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL),
  CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 3081)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE consistent.masternew
  OWNER TO postgres;
COMMENT ON COLUMN consistent.masternew.source IS 'Where this citation came from--court, PD, etc.';

CREATE INDEX masternew_citation_id_idx
  ON consistent.masternew
  USING btree
  (citation_id COLLATE pg_catalog."default" );

CREATE INDEX masternew_citing_jurisdiction_idx
  ON consistent.masternew
  USING btree
  (citing_jurisdiction );

CREATE INDEX masternew_defendant_dl_idx
  ON consistent.masternew
  USING btree
  (defendant_dl COLLATE pg_catalog."default" );

CREATE INDEX masternew_id_idx
  ON consistent.masternew
  USING btree
  (id );

CREATE INDEX masternew_offense_street_name_idx
  ON consistent.masternew
  USING btree
  (offense_street_name COLLATE pg_catalog."default" );

CREATE INDEX masternew_offense_street_number_idx
  ON consistent.masternew
  USING btree
  (offense_street_number COLLATE pg_catalog."default" );

CREATE INDEX masternew_offense_timestamp_idx
  ON consistent.masternew
  USING btree
  (offense_timestamp );

CREATE INDEX masternew_the_geom_idx
  ON consistent.masternew
  USING gist
  (the_geom );

IMPORTANT DISCOVERY 1

I just discovered something interesting. This query:

SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NOT NULL
UNION
SELECT COUNT(*)
FROM consistent.master
UNION
SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NULL

The results are:

2085344
2085343
0

How can I possibly explain that? How can the count with WHERE citation_id IS NOT NULL possibly be higher than the same query with no WHERE clause?

IMPORTANT DISCOVERY 2 OK, per the comments below, I discovered that I have a row with all empty values, and this is despite the fact that the table has a serial id column and some NOT NULL constraints.

I deleted the bum row. Now I'm not getting the null error. Instead, I'm getting this:

ERROR:  duplicate key value violates unique constraint "masternew_pkey"
DETAIL:  Key (id)=(1583804) already exists.

********** Error **********

ERROR: duplicate key value violates unique constraint "masternew_pkey"
SQL state: 23505
Detail: Key (id)=(1583804) already exists.

So just to make sure, I do this query:

SELECT COUNT(id)
FROM consistent.master
WHERE id=1583804;

Guess what? consistent.master only has 1 instance of this! So given that the left table in the LEFT JOIN only 1 instance of 1583804 in citation_id and that the id column can only come from the left table, how could this error possibly happen? A LEFT JOIN like this should not cause the final result to have more rows than the left table, right?

解决方案

With an INSERT, especially with a complex one, you should always define the target columns. So make that:

INSERT INTO consistent.masternew (citation_id, col1, col2, ...)

If anything goes wrong in the accompanying SELECT statement - like this:

the_geom geometry

(makes no sense to rename the column with a type name - I assume this is unintended) - or if the underlying table definition changes, an INSERT statement without defined target columns can go terribly wrong.

PostgreSQL does not enforce the same number of columns in the SELECT statement as are in the target table. I quote the fine manual on that:

Each column not present in the explicit or implicit column list will be filled with a default value, either its declared default value or null if there is none.

(Bold emphasis mine.) If you have a mismatch in the column list, this could make a NULL value appear "out of nowhere".

Also, the order of the columns in the SELECT statement has to match the order of the columns to insert into. If the target columns are not spelled out, this would be the order of the columns in your table as it was created.
You seem to expect that columns are matched by name automatically, but that is not so. Column names in the SELECT statement are completely irrelevant for the final step of the INSERT. Only their order from left to right is significant.

Contrary to what others have implied the WITH clause is perfectly legit. I quote the manual on INSERT:

It is possible for the query (SELECT statement) to also contain a WITH clause. In such a case both sets of with_query can be referenced within the query, but the second one takes precedence since it is more closely nested.

Your statement could look like this:

WITH stops AS (
    SELECT citation_id
          ,rank() OVER (ORDER BY
                    offense_timestamp
                   ,defendant_dl
                   ,offense_street_number
                   ,offense_street_name) AS stop
    FROM   consistent.master
    WHERE  citing_jurisdiction = 1
    )
INSERT INTO consistent.masternew (citation_id, col1, col2, ...) -- add columns
SELECT m.citation_id -- order colums accordingly!
      ,s.stop
      ,m.defendant_dl
        -- 27 more columns
      ,m.citing_jurisdiction
      ,m.the_geom
FROM   consistent.master m
LEFT   JOIN stops s USING (citation_id);

这篇关于PostgreSQL:LEFT JOIN创建空白行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆