如何像kayak.com总内容的网站? [英] How does a site like kayak.com aggregate content?

查看:152
本文介绍了如何像kayak.com总内容的网站?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问候,
我一直在玩弄一个想法,一个新项目,并想知道如果任何人有像Kayak.com服务是如何能够如此快速而准确地从这么多的资源汇总数据,任何想法。更具体地讲,你认为Kayak.com与API的交互或者他们爬行/刮航空公司和酒店的网站为了满足用户的要求?我知道有这一类的事情不是一个正确的答案,但我很好奇,想知道别人怎么想会去这一个很好的方式。如果有帮助,pretend你要创建kayak.com明天...这里是你的数据来自哪里?

Greetings, I've been toying with an idea for a new project and was wondering if anyone has any idea on how a service like Kayak.com is able to aggregate data from so many sources so quickly and accurately. More specifically, do you think Kayak.com is interacting with APIs or are they crawling/scraping airline and hotel websites in order to fulfill user requests? I know there isn't one right answer for this sort of thing but I'm curious to know what others think would be a good way to go about this. If it helps, pretend you are going to create kayak.com tomorrow ... where is your data coming from?

推荐答案

我在旅游行业工作作为在precisely那种你描述项目的软件架构师/项目负责人 - 在我们的地区,我们与供应商合作直接,但对于传出我们连接到多个聚合。

I'm working in travel industry as a software architect / project lead on the precisely kind of project you describe - in our region we work with suppliers directly, but for outgoing we connect to several aggregators.

要回答你的问题......一些数据,你有,你的一些以各种方式得到,有的你要折磨和扭曲,直到它承认。

To answer your question... some data you have, some you get in various ways, and some you have to torture and twist until it confesses.

你要问是的问题...你想卖广告像皮艇或者你喜欢采取优惠和削减?你到搜索或进入销售旅游服务?你的目标定位(例如,只是航空旅行),或一切(住宿,航空公司,租赁车,另外像运输/观光/会议等服务)?你的目标区域(美国的美国或一部分)或世界?有多深你去? - 你只是显示在一个屏幕上几个网站,或者你不同的捆绑在一起的服务,动态将它们打包

The questions you have to ask are... Do you want to sell advertising like Kayak or do you take a cut like Expedia? Are you into search or into selling travel services? Do you target niche (for example, just air travel) or everything (accommodation, airlines, rent-a-car, additional services like transport/sightseeing/conferences etc)? Do you target region (US or part of US) or the world? How deep do you go - do you just show several sites on a single screen, or do you bundle different services together and package them dynamically?

如果你使用独木舟的商业模式去,你在技术上并不需要网站的许可......但很多网站有的IFrame或其他简单的方法给客户直接到他们的网站联盟计划。从积极的一面,你不必应付款项/投诉和游客自己。至于利弊...如果你想对自己和present最便宜的选择给用户比较价格,你必须更深层次整合,这意味着API和网页抓取。

If you're going with Kayak business model, you technically don't need site's permission... but a lot of sites have affiliate programs with IFrames or other simple ways to direct the customer to their site. On the plus side, you don't have to deal with payments/complaints and travelers themselves. As for the cons... if you want to compare prices yourself and present the cheapest option to the user, you'll have to integrate on a deeper level, and that means APIs and web scraping.

至于网络刮...避免它。它吮吸。真。只是不去做。相信我在这一个。例如,像lowcosters有些事情你不能没有网络拼抢。低成本航空公司住在离增值服务。如果用户没有看到他们的网站,他们不卖多余的东西,他们不赚什么。因此,他们没有分支机构,他们不提供的API,他们改变自己的站点布局几乎不断。不过,也有它通过网络谋生刮lowcoster的网站和包装他们进入漂亮的API公司。如果你能负担得起,你可以给你的用户的低成本航班的成本比较,这是巨大的。

As for web scraping... avoid it. It sucks. Really. Just don't do it. Trust me on this one. For example, some things like lowcosters you can't get without web scraping. Low cost airlines live from value added services. If the user doesn't see their website, they don't sell extra stuff, and they don't earn anything. Therefore, they don't have affiliates, they don't offer APIs, and they change their site layout almost constantly. However, there are companies which earn a living by web scraping lowcoster's sites and wrapping them into nice APIs. If you can afford them, you can give your users cost-comparison of low cost flights and that's huge.

在另一方面,也有正常的载体它们提供的API。这不是什么大不了的问题,去航空公司,因为他们都在 IATA 团结;基本上,你从国际航空运输协会购买,而国际航空运输协会钱分配给运营商。但是,您可能不希望直接连接到运营商网络。他们有Web服务和SOAP这些天,但相信我,当我说,有SOAP协议,该协议只是围绕着文字提示,通过它可以用一个80es式协议的主机交互的疯狂瘦包装(想到的Unix提示你在哪里每个命令嘴,大约需要20个命令做一个搜索)。这就是为什么你可能想连接到更多的人顺着食物链了一下,有一个更好的API。

On the other hand, there are "normal" carriers which offer APIs. It's not that big of a problem to get to airlines since they're all united under IATA; basically, you buy from IATA, and IATA distributes the money to carriers. However, you probably don't want to connect directly to carrier network. They have web services and SOAP these days, but believe me when I say that there are SOAP protocols which are just an insanely thin wrappers around a text prompt through which you can interact with a mainframe with an 80es-style protocol (think of a Unix prompt where you're billed per command; and it takes about 20 commands to do one search). That's why you probably want to connect to somebody a bit more down the food chain, with a better API.

因此​​,航空公司都在高斯曲线的两个极端;一边是个别供应商,并在其他高度集中的系统,你实现一个API,你就可以在世界任何地方飞行。住宿和旅游产品,其余都是在两者之间。有这家聚集几个大牌球员,并与很多只涉及频谱的一部分聚合的一吨小供应商。例如,你可以租一个灯塔,它甚至不贵 - 但你无法不同灯塔的价格比在一个地方

Airlines are thus on both extremes of Gaussian curve; on one side are individual suppliers, and on the other highly centralized systems where you implement one API and you're able to fly anywhere in the world. Accommodation and the rest of travel products are in between. There are several big players which aggregate hotels, and a ton of small suppliers with a lot of aggregators which cover only part of a spectrum. For example, you can rent a lighthouse and it's even not that expensive - but you won't be able to compare the prices of different lighthouses in one place.

如果你到独木舟的商业模式,你可能最终会刮网站。如果你到集成不同的供应商,你会经常使用的API,其中有些是pretty良好的工作,其中大部分是可以容忍的。我还没有与RSS的工作,但有没有大量的RSS和网页抓取之间的差异。还有第四个选项,在杰夫的回答没有提及......一个,你通过FTP和类似的让您的数据夜间,例如.CSV文件。

If you're into Kayak business model, you'll probably end up scraping websites. If you're into integrating different providers, you'll often work with APIs, some of which are pretty good, and most of which are tolerable. I haven't worked with RSS but there's not a lot of difference between RSS and web scraping. There is also a fourth option not mentioned in Jeff's answer... the one where you get your data nightly, for example .CSV files through FTP and similar.

然后还有复杂性。要添加更多的价值,更多的复杂性,你必须处理。你可以搜索住宿允许宠物吗?对于位于距离市中心不到5公里的宿舍?你相结合的飞行,你能保证旅客将有足够的时间从一个机场到另一个...你可以卖提前运输?一位著名的大提琴家不希望部分来自于他precious 18世纪大提琴;你可以卖给他另一座大提琴(是的,不是做这一个)?

And then there's complexity. The more value you want to add, the more complexity you'll have to handle. Can you search accommodations which allow pets? For a hostel which is located less than 5 km from the town center? Are you combining flights, and are you able to guarantee that the traveler will have enough time to get from one airport to another... can you sell the transport in advance? A famous cellist doesn't want to part from his precious 18th century cello; can you sell him another seat for the cello (yep, not making this one up)?

要比较价格?当然,房间是每晚30欧元。但是,你可以得到30一张双人床和一个单一的20,也可以在双得到一个额外的床,并获得70%为第三人。但是,只有当它的12岁以下儿童;我们的加床不适合成年人。而你没有得到在搜索结果加床价格 - 只有当你计算最终价格

Want to compare prices? Sure, the room is EUR 30 per night. But you can either get one double for 30 and one single for 20, or you can get one extra bed in a double and get 70% off for third person. But only if it's a child under 12 years of age; our extra beds are not for adults. And you don't get the price for extra bed in search results - only when you calculate the final price.

和甚至没有让我开始对动态打包。想卖的住宿+租赁汽车?没问题;有两个不同的供应商整合,和您去...手动更新的城市(从租赁汽车供应商)位置列表,以配合酒店(从住宿提供者,谁给你的每家酒店只有市)。当然,前提是你已经匹配到这两个城市的名单,因为有城市codeS。

And don't even get me started on dynamic packaging. Want to sell accommodation + rent-a-car? No problem; integrate with two different providers, and off you go... manually updating list of locations in the city (from rent-a-car provider) to match with hotels (from accommodation provider, who gives you only the city for each hotel). Of course, provided that you've already matched the list of cities from the two, since there is no international standard for city codes.

不像很多其中有很多产品等行业,旅游行业有许多非常复杂的产品。亚马逊很容易;卖书和卖土豆,这是一回事;你甚至可以将它们在同一个盒子。他们结合容易,并且不从许多零件组装而成。 :)

Unlike a lot of other industries which have many products, travel industry has many very complex products. Amazon has it easy; selling books and selling potatoes, it's the same thing; you can even ship them in the same box. They combine easily and aren't assembled from many parts. :)

这篇关于如何像kayak.com总内容的网站?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆