Screen-scraping , a transition technology in an industry undergoing transformation.

06 fév. 2010 | Written by Julia

What is screen-scraping ?  Where does this strange name come from? Why is it so much used ? And why can we consider it as being just a transition technology ?

We don’t need any more to demonstrate that internet has changed the travel industry:  the players have grown in number, products have become more accessible, customers’ habits have changed…

From the client point of view, it’s amazing. But for the internet actors, this means an industry getting more and more complicated because of the number of connections between one another. This is how screen-scraping has became so popular.

What is screen-scraping ?

This technique appeared in 1999/2000, when Internet gained importance. It started booming in 2003, when contents were growing  and meta-search engines were becoming relevant.

In practical terms, screen-scraping consists in scanning the content of a web site, with the aim to use it again on another web site. To do that, screen-scrapers use web crawlers like the one used by search engines (Googlebots, Yahoo! Slurp…). The difference is that these robots are specialized in e-commerce web sites and are able to identify specific information: prices, dates, description… They read the HTML code of the web page in order to recreate the original data base of the web site.

The screen-scraping business model.

Several business models are used. The majority of meta search web sites get income with cost per click (CPC) and cost per action (CPA) models, then they redirect the internet user to the retailer. The problem with this model is that the two parties need a prior agreement to fix the earnings, compromising the meta-searcher  neutrality. Moreover, this commercial agreement isn’t visible for the internet user who thinks that meta-searcher results led him to the best deals.

In 2007, 11 of the 12 French meta-searcher were recalled in order by the DGCCRT.

So, some of them have made the choice of a business model calqued on Google. That means that they generate “organic results”, supposed to be as exhaustive as possible. At the same time they offer sponsored results consisting in text-ads (ex: Wego) or banners (ex:Sprice).

Finally, we can find the latest model on online travel agencies websites. They can’t accept to redirect their visitors to a partner website, because of their retailer business model. So in some case they have direct connections with suppliers, but in some other cases, when they don’t have partnerships or when technologies are incompatible, they develop an advanced form of screen-scraping. They have developed a copy of the booking process of the supplier and have integrated it on their own website.

Thus, it is not visible for the user that he is booking from an external web site. In this case, the seller can add his own income to the producer price. This technique is used by famous online travel agencies as Lastminute, Edream, Atrapalo, Govolo or Ebookers. Because it is not always used with common consent, this is one of the most controversial usage.

Troubles generated by screen-scraping.

So, we can identify 3 kind of trouble when we use screen-scraping:  technical (web crawler slow down web site performances), qualitative (a wrong setup can make a lot of mistakes)  and ethical (using screen-scraping without prior permission can be the beginning of drifts).

As we can see this last point is a recurrent cause of conflicts in our industry:

*2003: FareChase vs American Airlines
*2004: FareChase vs South West Airlines
*2008: Ryanair vs several Travel Agencies
*2008: EasyJet vs Expedia

And if we compare with e-commerce in general, we notice that screen-scraping is no longer used: meta search web sites and e-commerce web sites prefer XML catalogues.

In the travel industry, because of the complexity of the offer (products are perishable and prices varies with time), the tendency is more using API. This allows direct and reliable connection with stocks. It also allows to go beyond the standards imposed by GDS (cf: Air Canada case study).

Comments are closed