Hotels.com has upgraded its data science operations to Apache Spark running in the cloud to help it become more “algorithmic” in the way it presents accommodation options and features to customers.
Chief data science officer and vice president Matt Fryer told the Spark Summit in San Francisco that the online booking site – which is owned by Expedia – wanted to get better at tailoring image content and hotel listings to individual users.
That required a major uplift of the underlying technology used to perform data science and big data analytics.
“About 18 months ago there was data piling up everywhere,” Fryer said.
“We had an on-premises Hadoop cluster in our data centre, and we were using SQL and SAS to do our data science, algorithms and stats.
“In 18 months we’ve completely changed that view. We’ve gone 100 percent into the cloud, particularly AWS. We’ve reinvigorated our applications, and in particular we’ve moved to using [Apache] Spark for 100 percent of our workflows. That’s been a huge boost both for our business and our customers.”
The company is now running two major data science platforms.
Fryer said Hotels.com makes “extensive use” of Databricks, a cloud-based big data platform powered by Spark. It provides a single place to manage clusters and explore data, among other functions.
“We also have a back-up [platform] called Maestro which picks up some of the elements that we can’t do in Databricks right now,” Fryer said.
The internally-developed Maestro also uses Spark and sits on AWS. It harnesses a number of other tools and technologies including TensorFlow, RStudio, Jupyter and interactive build tool SBT.
Fryer said Hotels.com is also experimenting with a proof-of-concept platform running Apache Beam and Spark as well as TensorFlow on top of the Google Cloud Platform.
In addition, it has been experimenting with GPUs and TensorFlow to reduce the time it takes to train deep learning models to recognise certain things, particularly in image data.
However, Fryer said the GPU work “has been really hard” and enormous effort had been expended to extract speed improvements that justified the expense of using them.
Exploring use cases
One of the areas of Hotels.com where machine learning is being put to use is around how photos are captioned, presented and ordered.
“After price, location and what’s in the hotel, the photo images are the next most important thing to customers,” Fryer said.
“We have three big use cases for the image world around data science, machine learning and deep learning.”
Broadly, those use cases involve detection of “near-duplicate” photos uploaded for a particular property, the classification of photos according to what they show – for example, a lobby or restaurant – and how the images are ranked in galleries.
“We have millions of photos from hotels, we also have 100,000s of photos and growing fast by the day from our customers that really helps to grow authenticity of what the experience will be like. But how do we rank them?” Fryer said.
“If you’re on a mobile phone – and over half our customers are – then you’ve got to cycle through each photo, using bandwidth, and it’s a really crucial thing I get that order right.”
Fryer showed high levels of accuracy in algorithms being able to recognise duplicate photos (allowing them to be removed from the site) and also in getting image categorisations correct.
Future work will see the company able to offer highly-personalised results and on-screen presentation to each customer, based on their preferences.
“We can now build feedback loops to mesh deep learning with behavioural feedback loops, click behaviours and customer behaviours in Spark,” Fryer said.
“We can stand those together to find the best hotel order for each individual customer. I can find the right hotel and right photo order for you. We think this is going to be a huge value for our customers going forward.”
However, this presents challenges because a typical customer makes “between four and nine searches” on the site before actually making a booking.
It means data science would need to present the right personalised options “nine times in a row”, each time correctly guessing what might suit the customer best.
Fryer said Hotels.com’s work so far constituted “baby steps", and said he was “quite proud to say I think we’re now in the toddler phase".
“I think the crucial thing is this is the golden age of machine learning and data science,” he said.
“I’m incredibly excited to be part of it and the wider community that is working on this."
However, he cautioned that the path ahead was not always clear, owing to the rapid evolution of the big data analytics and AI space.
“It feels like we’re on the frontier an awful lot,” he said.
“Where we go from here often doesn’t exist or the technology’s immature, and it’s usually quite hard.”