我怎么能在SQL中这样做? [英] How can I do this in SQL?

查看:60
本文介绍了我怎么能在SQL中这样做?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MySQL 4.0.23-standard


的人。


我们使用诸如`movies`,`people`等表来存储实际的

电影信息,简短的情节摘要等。这些表格是MyISAM。


有关谁导演或制作电影的信息,我们有表格

,如`movies_directedBy`和`movies_producedBy`,它们有三个

列 - 电影表中的电影ID,

`people`表中的人物ID和备注栏。


这些"关系"表格大多是InnoDB,除了

用于投射信息的表格,`people_castIn`,这是MyISAM,因为这个表格需要全文搜索。


为了决定一个人占主导地位(例如,导演,演员,制作,声音技术等),我们目前使用PHP来

从这些`movies_ *`关系表中获取一行行数

具有给定的人员ID。该表

人ID最多的表被视为代表他或她的主要职业。


我想知道它是否'可以在SQL中完全执行此操作,

因为它真的很慢,特别是当查询涉及10个或更多

的人时,要获取COUNT(*)来自多个表的总和超过300万行的总额



有可能吗?


预先感谢您的帮助


Jasper Bryant-Greene

白菜促销

MySQL 4.0.23-standard

I run an online movie database with over 80,000 movies and over 500,000
people stored within.

We use tables like `movies`, `people`, etc. for storing the actual
movie information, a short plot summary, etc. These tables are MyISAM.

For information like who directed or produced a movie, we have tables
like `movies_directedBy` and `movies_producedBy` which have three
columns -- the movie ID in the `movies` table, the person ID in the
`people` table, and a notes column.

These "relationship" tables are mostly InnoDB, with the exception of
the table for casting information, `people_castIn`, which is MyISAM as
fulltext searching is required on this table.

In order to decide what occupation a person is predominantly (e.g.
director, actor, produced, sound tech, etc.) we currently use PHP to
fetch a count of rows from each of these `movies_*` relationship tables
which have a given person ID. The table with the most entries for that
person ID is deemed to represent his or her main occupation.

I''d like to know whether it''s possible to do this entirely in SQL,
since it''s really slow, especially when the query involves 10 or more
people, to fetch COUNT(*)s from multiple tables having a combined total
of over 3 million rows.

Is it possible?

Thanks in advance for any help

Jasper Bryant-Greene
Cabbage Promotions

推荐答案

< snip>
<snip>
我们使用像`movies`,`people`等表来存储实际的电影信息,简短的情节摘要等等这些表格是MyISAM。

对于像导演或制作电影这样的信息,我们有像'movies_directedBy`和`movies_producedBy`这样有三个列的表格 - 电影ID在t他是'电影'表,
`people`表中的人员ID,还有一个备注栏。

这些关系表格大多是InnoDB,除了用于投放信息的表格,`people_castIn`,这是MyISAM,因为此表需要全文搜索。

为了决定一个人占主导地位的职业(例如导演,演员,制作,声音技术等)我们目前使用PHP来从这些`movies_ *`关系表中获取每行的行数
具有给定的人员ID。该
人员ID的条目最多的表被视为代表他或她的主要职业。

我想知道是否可以完全执行此操作SQL,
,因为它真的很慢,特别是当查询涉及10个或更多人时,从多个表中获取COUNT(*)s,总计超过300万行。
We use tables like `movies`, `people`, etc. for storing the actual
movie information, a short plot summary, etc. These tables are MyISAM.

For information like who directed or produced a movie, we have tables
like `movies_directedBy` and `movies_producedBy` which have three
columns -- the movie ID in the `movies` table, the person ID in the
`people` table, and a notes column.

These "relationship" tables are mostly InnoDB, with the exception of
the table for casting information, `people_castIn`, which is MyISAM as
fulltext searching is required on this table.

In order to decide what occupation a person is predominantly (e.g.
director, actor, produced, sound tech, etc.) we currently use PHP to
fetch a count of rows from each of these `movies_*` relationship tables
which have a given person ID. The table with the most entries for that
person ID is deemed to represent his or her main occupation.

I''d like to know whether it''s possible to do this entirely in SQL,
since it''s really slow, especially when the query involves 10 or more
people, to fetch COUNT(*)s from multiple tables having a combined total
of over 3 million rows.



我建议你从一个更好的规范化方案开始。将您的数据重组为tblMovies,tblDirectors,tblProducers,并且可能需要

投入tblActors以获得良好的衡量标准。确保每个都有主要的

关键字段(自然键或其他)。你的tblMovies表应该

有链接字段到tblDirectors中的主要键字段和

tblProducers以及像[StarActor]& [StarActress]。

但是,我会建议tblMovies

和tblActors之间的联结表,因为电影可以有几个(3 - 5 - 打打...)

联合主演。称之为tblCoStars并为其提供[ActorID]和

[MovieID]字段,它们之间具有1对多的关系

和其他2个表中的每一个, tblCoStars是这两种关系的许多方面。在tblMovies中构建自己的指数

每个字段:[MovieID](当然),然后[DirectorID],

[ProducerID]和[ActorID]。


这里有一些示例SQL应该生成很多行

如果你有丰富的&你提到的完整数据集......


SELECT DISTINCTROW tblMovies.MovieName,tblMovies.ProducerID,

tblMovies.DirectorID,tblMovies.ReleaseDate,tblMovies.BoxOfficeGross,

tblMovies.MovieRating,tblMovies.MovieDescrip,[DirectorFName]& " " &

[DirectorLName] AS总监,[ProducerFName]& " " &安培; [ProducerLName]

AS制作人

FROM(tblMovies INNER JOIN tblDirectors ON tblMovies.DirectorID =

tblDirectors.DirectorID)INNER JOIN tblProducers ON

tblMovies.ProducerID = tblProducers.ProducerID

WHERE((tblMovies.ProducerID = 1或tblMovies.ProducerID = 19或

tblMovies.ProducerID = 39或tblMovies.ProducerID = 101或

tblMovies.ProducerID = 311)AND(tblMovies.DirectorID = 17或

tblMovies.DirectorID = 77或tblMovies.DirectorID = 211或

tblMovies.DirectorID = 321));


(未经测试)


I suggest you start with a better normalization scheme. Restructure
your data into tblMovies, tblDirectors, tblProducers and maybe
throw in tblActors for good measure. Make sure each has primary
key field (natural keys or otherwise). Your tblMovies table should
have link fields to the primary keyfields in tblDirectors and
tblProducers and maybe fields like [StarActor] & [StarActress].
However, I would suggest a junction table between tblMovies
and tblActors, as movies can have several (3 - 5 - a dozen...)
co-stars. Call it tblCoStars and furnish it with [ActorID] and
[MovieID] fields with a 1-to-many relationship between itself
and each of the other 2 tables, with tblCoStars being the many
side of both relationships. Build yourself indices in tblMovies on
each of these fields: [MovieID] (of course), then [DirectorID],
[ProducerID] and [ActorID].

Here''s some sample SQL that should churn out a lot of rows
if you have the rich & full datasets you mentioned...

SELECT DISTINCTROW tblMovies.MovieName, tblMovies.ProducerID,
tblMovies.DirectorID, tblMovies.ReleaseDate, tblMovies.BoxOfficeGross,
tblMovies.MovieRating, tblMovies.MovieDescrip, [DirectorFName] & " " &
[DirectorLName] AS Director, [ProducerFName] & " " & [ProducerLName]
AS Producer
FROM (tblMovies INNER JOIN tblDirectors ON tblMovies.DirectorID =
tblDirectors.DirectorID) INNER JOIN tblProducers ON
tblMovies.ProducerID = tblProducers.ProducerID
WHERE ((tblMovies.ProducerID=1 Or tblMovies.ProducerID=19 Or
tblMovies.ProducerID=39 Or tblMovies.ProducerID=101 Or
tblMovies.ProducerID=311) AND (tblMovies.DirectorID=17 Or
tblMovies.DirectorID=77 Or tblMovies.DirectorID=211 Or
tblMovies.DirectorID=321));

(untested)


< big snip>

如果你打电话给你的联结表tblCast&给它2个字段:[MovieID]

和[ActorID],你可以把它放在查询中,以便在查询动态集中包含联合星。




以下内容应制作电影,制作人,导演名单,以及由多名不同的b
导演执导的多部电影的演员名单。由几个不同的生产者生产。


SELECT DISTINCTROW tblMovies.MovieName,tblMovies.ProducerID,

tblMovies.DirectorID,tblMovies.ReleaseDate,tblMovies.BoxOfficeGross,

tblMovies.MovieRating,tblMovies.MovieDescrip,[DirectorFName]& " " &

[DirectorLName] AS总监,[ProducerFName]& " " &安培; [ProducerLName]

AS Producer,[ActorFName]& " " &安培; [ActorLName] AS演员

FROM tblActors INNER JOIN(((tblMovies INNER JOIN tblDirectors ON

tblMovies.DirectorID = tblDirectors.DirectorID)INNER JOIN

tblProducers ON tblMovies.ProducerID = tblProducers.ProducerID)INNER

JOIN tblCast ON tblMovies.MovieID = tblCast.MovieID)ON

(tblMovies.ActorID = tblActors.ActorID)AND (tblActors.ActorID =

tblCast.ActorID)

WHERE((tblMovies.ProducerID = 1或tblMovies.ProducerID = 19或

tblMovies。 ProducerID = 39或tblMovies.ProducerID = 101或

tblMovies.ProducerID = 311))或((tblMovies.DirectorID = 17或

tblMovies.DirectorID = 77或tblMovies。 DirectorID = 211或

tblMovies.DirectorID = 321));

<big snip>
If you call your junction table tblCast & give it 2 fields: [MovieID]
and [ActorID], you could put it in the query to include co-stars in
the query dynaset.

The following should produce list of movies, producers, directors,
actors for a number of films directed by any of several different
directors OR produced by any of several different producers.

SELECT DISTINCTROW tblMovies.MovieName, tblMovies.ProducerID,
tblMovies.DirectorID, tblMovies.ReleaseDate, tblMovies.BoxOfficeGross,
tblMovies.MovieRating, tblMovies.MovieDescrip, [DirectorFName] & " " &
[DirectorLName] AS Director, [ProducerFName] & " " & [ProducerLName]
AS Producer, [ActorFName] & " " & [ActorLName] AS Actor
FROM tblActors INNER JOIN (((tblMovies INNER JOIN tblDirectors ON
tblMovies.DirectorID = tblDirectors.DirectorID) INNER JOIN
tblProducers ON tblMovies.ProducerID = tblProducers.ProducerID) INNER
JOIN tblCast ON tblMovies.MovieID = tblCast.MovieID) ON
(tblMovies.ActorID = tblActors.ActorID) AND (tblActors.ActorID =
tblCast.ActorID)
WHERE ((tblMovies.ProducerID=1 Or tblMovies.ProducerID=19 Or
tblMovies.ProducerID=39 Or tblMovies.ProducerID=101 Or
tblMovies.ProducerID=311)) OR ((tblMovies.DirectorID=17 Or
tblMovies.DirectorID=77 Or tblMovies.DirectorID=211 Or
tblMovies.DirectorID=321));





感谢您的回复,但我认为您误会了我。我不是想要为b / b
电影创建演员,制片人,导演等的名单。我们已经做到了,它工作正常。我想要做什么

创建一个SQL查询,它将与我目前在PHP中执行的
相同,即:


1.给定一个人ID,获得大约10-15个记录的数量
不同的表格(导演,制作人,演员,工作人员,声音

techs,art department,makeup,sfx等)。

2.返回具有最多

记录的表的名称。


这告诉我这个人的主要职业是什么。问题

是,步骤1需要10-15个不同的SQL查询执行COUNT(*)

操作。我正在寻找一种方法,用更少的SQL查询在单个或至少

中完成所有操作,以提高性能。


另外你建议的方案需要大量的数据

冗余,因为一个人可能是一个演员,一个制片人,一个b $ b导演,一个机组成员等等。在你的计划中,每个人的每个职业都会单独存储

信息,

指数级的信息量。当我们处理数百万条记录时,我们无法负担这么多的数据冗余。


我们已经有更多的桌子(演员) ,船员,艺术部,化妆

dep。,sfx,visual fx,声音技术,运动控制等等。)但是我没有列出
他们都没有必要。


贾斯珀

Hi

Thanks for your reply, but I think you have misunderstood me. I''m not
looking to create lists of actors, producers, directors, etc. for
movies. We already do that and it works fine. What I am looking to do
is create an SQL query that will do the equivilant of what I am
currently doing in PHP, which is:

1. Given a person ID, get the number of records in about 10-15
different tables (directors, producers, actors, crewmembers, sound
techs, art department, makeup, sfx, etc.).
2. Return the name of the table which had the largest number of
records.

This tells me what the person''s predominant occupation is. The problem
is, step 1 requires 10-15 different SQL queries executing COUNT(*)
operations. I''m looking for a way to do it all in a single, or at least
in fewer, SQL queries, to improve performance.

Also the scheme you have suggested requires a huge amount of data
redundancy as a single person could be an actor, a producer, a
director, a crewmember and much more. In your scheme there would be
information stored seperately for each occupation of each person, an
exponentially larger amount of information. We can''t afford for that
much data redundancy when we''re dealing with millions of records.

We already have many more tables (actors, crewmembers, art dep., makeup
dep., sfx, visual fx, sound techs, motion control, etc. etc.) but I
didn''t list them all as it wasnt necessary.

Jasper


这篇关于我怎么能在SQL中这样做?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆