Big Data and Machine Learning: Where Are We Heading To?
From Monday morning moans to Saturday night selfies, more information about our lives is being captured and analysed than ever before. In fact, over 90% of all available human data has been recorded in the last two years and is already being used to transform the world around us.
A couple of months ago, my 13 year old cousin sat at our kitchen table and looked at me dubiously as I recalled one of the most desired gadgets I had when I was his age, a simple pocket calculator. Today, he and his friends have such a dizzying array of ‘toys’ designed to amuse, educate and stimulate, that the wonder of a hand-held calculating machine is completely lost on him.
What I don’t think he realises, is the true potential of the extraordinary technical revolution we are blessed to be living in, where lifestyle devices are only some of the first widely recognised results.
The next wave of sophisticated tools, many of which are being perfected by teams of specialists, will truly change our lives – and society as a whole.
Just imagine a society in which potential illnesses are identified before we even contract them, or are tracked before they spread, in which we can control air pollution and reduce energy consumption, or where a healthy meal, thanks to technology, is cooked to perfection before anyone gets home.
Big data makes many of these innovations possible. Where once infrastructures were molded by tools of stone and metal, today we have begun to build an IT world in ‘the cloud’ with a collection of data so enormous and complex that we are only beginning to understand its true worth.
Just as oil was to the 20th century, so data will become the essential ingredient for the 21st. But this lucrative resource must also be refined to make it more meaningful and actionable for mankind.
Companies have been developing ways to visualise and explore massive data sets, curating them into a useful resource. Now we are at the stage where we can create devices that don’t just provide a service, like a pocket calculator, but have broader potential to improve our lives in more meaningful ways.
Today’s consumers are a tough nut to crack. They tend to look around before they buy, talk to their entire social network about their purchases, demand to be treated as unique and want to be sincerely thanked for buying products. Big Data allows us to profile these increasingly vocal and inconstant little ‘tyrants’ in a far-reaching manner so that we can engage in an almost one-on-one, real-time conversation with them. This is not actually a luxury. If we don’t treat them like they want to, they will leave us in the blink of an eye.
An example in case: when any customer enters a bank, Big Data tools allow the clerk to check their profile in real-time and learn which relevant products or services they might advise. Big Data will also have a key role to play in uniting the digital and physical shopping environments: a retailer could suggest an offer on a mobile carrier, on the basis of a consumer indicating a certain need on social media.
In the world of digital media, we’ve seen an immense volume of complex and dynamic – but hugely valuable new data. Social media and geo-location data-sets offer high-value audience signals as to the next big thing, overnight sensation or breakout hit. Multiple screens and mobile viewing have forever changed how consumers learn about, find and consume all kinds of content from movies and TV shows, to music and games, to books, magazines and newspapers.
Social media is no longer merely an option for businesses but a requisite component of success. Hence any analysis of social media marketing data, to be effective, must be viewed in the larger context of a business’s market penetration, brand engagement, and other return on investment metrics.
Content is information, remember. But so are views, likes, shares, follows, retweets, comments, and downloads. Social media allows marketers to view the life of a story in a way previously unimaginable. So when we think of Big Data in relation to digital / social media, we must first realise that they are not separate from one another.
But how can we organise, analyse and make sense of all this data generated and available? Machine Learning can be a solution (or at least a suitor!) for Big Data!
Machine Learning
Machine learning is ideal for exploiting the opportunities hidden in big data. Freed from the limitations of human scale thinking and analysis, machine learning is able to discover and display the patterns buried in the data.
It is also growing as a buzz-word among researchers today. Basically a computer is taught to think for itself without being told explicitly what to do. Facebook use it to provide you with friend recommendations and Coursera to verify users through their typing patterns. In market research, machine learning is used for automated predictions and segmentations using existing data.
It delivers on the promise of extracting value from big and disparate data sources with far less reliance on human direction. It is data driven and runs at machine scale. It is well suited to the complexity of dealing with disparate data sources and the huge variety of variables and amounts of data involved. And unlike traditional analysis, machine learning thrives on growing datasets. The more data fed into a machine learning system, the more it can learn and apply the results to higher quality insights.
The basis of Machine Learning is an algorithm that tell us something interesting about a set of data without us having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data.
For example, one kind of algorithm is a classification algorithm that can put data into different groups. The same classification algorithm could be used to recognise handwritten numbers and also to classify emails into spam and not-spam without changing a line of code.
Of course gathering and maintaining large collections of data may seem kind of neat, but extracting useful information from these collections is much more challenging but very interesting. Big Data not only changes the tools one can use for advanced analytics, it also changes our entire way of thinking about knowledge extraction and interpretation. Traditionally, data science has always been dominated by trial-and-error analysis, an approach that becomes impossible when datasets are large and heterogeneous. Ironically, availability of more data usually leads to fewer options in constructing models, because very few tools allow for processing large datasets in a reasonable timeframe. In addition, traditional methodological solutions typically focus on static analytics limited to the analysis of samples that are frozen in time, often resulting in static conclusions.
Machine learning can offer intelligent alternatives that overcome these problems. At the cutting edge of statistics, computer science and emerging applications in industry, the machine learning community focusses on the development of fast and efficient algorithms for real-time processing of data with the ultimate goal of delivering accurate outputs. To name only a few applications, think of business cases such as product recommendation, fraud detection or churn prevention. Machine learning techniques can solve such applications using a set of generic methods that differ from more traditional techniques. The emphasis is on real-time and highly scalable predictive analytics, using fully automated and generic methods that simplify some of the typical data scientist’s tasks.
I think that the long-term future of machine learning is very bright and is already an incredibly powerful tool that can do a surprisingly good job of solving really complex day-to-day problems. But almost all machine learning algorithms have the potential to make errors, consequently I have been checking interesting articles in how we can keep humans in the loop in order to ameliorate the consequences of those errors. It is also important to mention that sometimes we want to solve specific problems rather than coming up with general paradigms or inventive solutions that may be tricky to conjure.
Finally, it’s important to briefly mention transparency and accountability. Although I’m a firm believer in interdisciplinary teams, it’s likely that many machine learning methods will, at some point, be used by social scientists, policy-makers, or other end-users, without the involvement of the developers. If we want people to draw responsible conclusions using our models and tools, then we need people to understand how they work, rather than treating them as infallible “black boxes.” This means not only publishing academic papers and making research code available, but also explaining our models and tools to general audiences and, when doing so, focusing on elucidating implicit assumptions, best practices for selecting and deploying them, and the types of conclusions they can and can’t be used to draw. In this case we have to go to the Statistics field, and thankfully Stats techniques won’t change with time: a regression or cluster analysis will be the same independently of the volume of the data we’re dealing with.
While the potential of the marriage of big data and machine learning is enormous, we must ensure that concerns over data privacy, ownership and accountability are resolved. However, by responsibly making big data meaningful, I truly believe that it can help us to build a promising future for our wired world.