Machine Learning can be thought of as a set of techniques and tools that allow computers to "think" by making predictions or determining patterns in data. These machines use mathematical algorithms as vehicles to scalable self-learning. At its most basic, it is the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world. Rather than hand-coding software routines with a specific set of instructions to accomplish a particular task, the machine is "trained" using large amounts of data and algorithms that give it the ability to learn how to perform the task.
Machine Learning is a field of computer science concerned with the design and implementation of algorithms that optimize a performance criterion. Computers are programmed to learn from some combination of example (training) data, prior domain knowledge, and trial and error reinforcement, with the goal to predict, classify, or cluster new data. Machine learning is founded in math, expressed in code, and productionized into tools.
The core task of machine learning is making inference from a sample – that is, generalizing from limited sets of data. By developing programs that automatically improve with experience, machine learning procedures adapt themselves to changing conditions and detect and extrapolate meaningful patterns from data.
Usefulness of Machine Learning
As Arthur Samuel, a pioneer of Artificial Intelligence, noted: "Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed." This is when Machine Learning is most useful; specifically, when a data problem cannot be solved directly.
Computers require sequences of instructions to produce output from a given input. Yet there are some tasks for which an algorithm is not evident or changes over time, such as speech recognition or detecting fraudulent credit card transactions. Even though the input data is available and a practitioner may know what the output should be like, it is not apparent how to transform the input to output.
Given enough example data a machine learning algorithm can learn what the output should be. This is because it is assumed that there is a process that explains the observed data – that a generative process exists that is not completely random – and that patterns in the data can be extracted. While this mechanism cannot be directly defined, a good and useful approximation can be constructed.
Predictive VS Descriptive
A Machine Learning model can be predictive, with an aim toward predictive accuracy in the future, or descriptive, with the goal being knowledge gain, or both.
This distinction can also be thought of in terms of the type of learning: Supervised or Unsupervised. Supervised Learning is a category of learning techniques where the algorithms are "trained" on an initial set of data and then tested on a brand new set of data. Unsupervised Learning takes place when there is no particular thing being predicted, rather a practitioner is just looking for patterns that emerge from the data naturally.
An arguably more useful way to describe this distinction in Machine Learning is to divide the space into predictive learning and representation learning. As such, in predictive learning, data drawn from some distribution is observed and the goal is to predict some aspect of observable values by way of a well-defined predictive task (e.g. Random Forest, Neural Nets). In representation learning, the design isn't to predict, but rather to observe something about the underlying structure such that this representation can help answer various queries going forward (e.g. Clustering, Dimensionality Reduction).
The general approach of a predictive Machine Learning task is to start with a training phase, during which a model is initialized and "trained" on example data. Next a testing phase, where new data is used to test how good the model is. Cross-validation should be performed during this evaluation and performance is measured by some error rate on the optimized criterion.
For representation Machine Learning, in the absence of labeled data, the task is to uncover a representation of the data by learning how the observations in the data are similar to or different from one other and group the data accordingly, extract the most descriptive features, or find common attributes or latent classes. Here there is no standard way to measure performance.
Feature extraction and selection are perhaps the most important, and often overlooked, steps of Machine Learning. Better features are better than better algorithms. It is important to lean on domain expertise when selecting features and most of the energy spent on a model should be in trying to come up with better ways to describe the domain while staying clear and realistic about which features can actually be learned.
The Black Box
In a predictive model, Machine Learning algorithms are sometimes referred to as "black box." What this means is that in these cases a practitioner doesn't usually focus on the interpretation of the parameters or on explaining the underlying generative process, and if they do, it is with an eye toward manually tuning the features or model with the sole aim to increase predictive power.
Machine Learning In Context
In contrast with "learning" as a human behavior, where skills and knowledge are flexibly developed and are applicable to a wide array of contexts and problems, machine learning is about figuring out patterns of incoming information that correspond to a very specific result. As such, Machine Learning algorithms almost always require a significant body of data which to tune the model.
Machine Learning can be considered a relative of Artificial Intelligence insofar as it is interested in intelligent systems that have the ability to learn and adapt in a changing environment. A model is defined with some parameters and the job of the learning is to optimize these parameters or improve upon a performance metric.
A term sometimes muddled with Machine Learning is Data Mining. Data Mining is the process of extracting useful models of data, typically from large databases. One of the ways to discover these models is to use algorithms from Machine Learning to mine the data. The term "mining" is used for its analogous relationship to extracting raw material from a mine and processing it into a small amount of precious metal. In Data Mining, volumes of data are processed to build a simple yet useful models. Other examples of Data Mining techniques include Summarization and Feature Extraction.
Deep Learning & Neural Networks
Deep Learning is a branch of Machine Learning that learns feature hierarchies or representations and is often based on artificial neural networks and commonly used in fields such as natural language processing, computer vision, and bioinformatics. A Neural Network is a computer system designed to work by classifying information in the same way a human brain does. It can be taught to recognize, for example, images, and classify them according to elements they contain.
Machine learning approaches are widely used in the domain of Text Analytics.
In Machine Learning, Ensembling is when multiple methods are used together to gain increasing predictive power over what could be learned from any one algorithm alone.
A Machine Learning topology
Ipsos Point Of View
Machine Learning can be defined very broadly, in the sense of any method in which a computer can improve how well it does on a task with additional experience. On the narrower extreme, or in popular literature, the term is sometimes used as an exotic buzzword and confused with Artificial Intelligence of a level that does not actually exist with current technology.
For the analysis of survey, transactional or behavioral data as typically collected by Ipsos, machine learning falls somewhere in between these two extremes and may be best thought of as a diverse collection of advanced, often computationally-intensive algorithms. These programs generally run in an iterative manner, checking performance of some type as it runs, and continuing to run as necessary to improve that performance. By this definition, methods that reduce to a relatively simple formulaic solution (as in much of traditional statistics) would not be considered Machine Learning.
Some methods known to be used at Ipsos include random forest and support vector machine models for making predictions; k-means, self-organizing maps, affinity propagation, and hierarchical clustering methods for segmentation; and Bayesian networks as a component of drivers analysis. Again, as defined here, more traditional methods like linear regression would not be considered Machine Learning. If a client has questions about these or other specific techniques, please consult your local marketing science team or a global team such as the Ipsos Science Centre.
It should be noted that Machine learning is not a magic hammer for every nail. If variables are weak or unrelated to an outcome, there is little that advanced algorithmic techniques such as Machine Learning can do to fix that. In these cases, major model improvements are more likely to come from thinking more carefully about which variables to include, or how to define them.
Another word of caution is that some clients initially view advanced techniques in terms of the Black Box concept detailed above. This may lead clients to think they cannot understand or explain the analysis to their stakeholders. A marketing science expert can often provide that understanding -- and in doing so, help strengthen Ipsos' position as a thought leader -- but one should be prepared for this critique.
Lastly, some Supervised Learning methods produce, along with their predictions, various measures of predictor variable "importance" in the model. These are often complex or academic, rather than practical, measures of importance, and should not be confused with nor used as drivers analysis scores.