This is a generic, practical approach that can be applied to most machine learning problems:
1-Categorize the problem
The next step is to categorize the problem.
Categorize by the input: If it is a labeled data, it’s a supervised learning problem. If it’s unlabeled data with the purpose of finding structure, it’s an unsupervised learning problem. If the solution implies to optimize an objective function by interacting with an environment, it’s a reinforcement learning problem.
Categorize by output: If the output of the model is a number, it’s a regression problem. If the output of the model is a class, it’s a classification problem. If the output of the model is a set of input groups, it’s a clustering problem.
2-Understand Your Data
Data itself is not the end game, but rather the raw material in the whole analysis process. Successful companies not only capture and have access to data, but they’re also able to derive insights that drive better decisions, which result in better customer service, competitive differentiation, and higher revenue growth. The process of understanding the data plays a key role in the process of choosing the right algorithm for the right problem. Some algorithms can work with smaller sample sets while others require tons and tons of samples. Certain algorithms work with categorical data while others like to work with numerical input.
Analyze the Data
In this step, there are two important tasks which are understood data with descriptive statistics and understand data with visualization and plots.
Process the data
The components of data processing include pre-processing, profiling, cleansing, it often also involves pulling together data from different internal systems and external sources.
Transform the data
The traditional idea of transforming data from a raw state to a state suitable for modeling is where feature engineering fits in. Transform data and feature engineering may, in fact, be synonyms. And here is a definition of the latter concept. Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data. By Jason Brownlee.
3-Find the available algorithms
After categorizing the problem and understand the data, the next milestone is identifying the algorithms that are applicable and practical to implement in a reasonable time. Some of the elements affecting the choice of a model are:
- The accuracy of the model.
- The interpretability of the model.
- The complexity of the model.
- The scalability of the model.
- How long does it take to build, train, and test the model?
- How long does it take to make predictions using the model?
- Does the model meet the business goal?
4-Implement machine learning algorithms.
Set up a machine learning pipeline that compares the performance of each algorithm on the dataset using a set of carefully selected evaluation criteria. Another approach is to use the same algorithm on different subgroups of datasets. The best solution for this is to do it once or have a service running that does this in intervals when new data is added.
5-Optimize hyperparameters. There are three options for optimizing hyperparameters, grid search, random search, and Bayesian optimization.
If you want to make free money and have a blog like this one using our platform then sign up with this referral link of digital ocean platform if you don’t like money forget it, my friend.
Types of machine learning tasks
- Supervised learning
- Unsupervised learning
- Reinforcement learning
Supervised learning is so named because the human being acts as a guide to teach the algorithm what conclusions it should come up with. Supervised learning requires that the algorithm’s possible outputs are already known and that the data used to train the algorithm is already labeled with correct answers. If the output is a real number, we call the task regression. If the output is from the limited number of values, where these values are unordered, then it’s classification.
Unsupervised machine learning is more closely aligned with what some call true artificial intelligence — the idea that a computer can learn to identify complex processes and patterns without a human to provide guidance along the way. There is less information about objects, in particular, the train set is unlabeled. It’s possible to observe some similarities between groups of objects and include them in appropriate clusters. Some objects can differ hugely from all clusters, in this way these objects to be anomalies.
Reinforcement learning refers to goal-oriented algorithms, which learn how to attain a complex objective or maximize along a particular dimension over many steps. For example, maximize the points won in a game over many moves. It differs from the supervised learning in a way that in supervised learning the training data has the role key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of training dataset, it is bound to learn from its experience.
Commonly used machine learning algorithms
Linear regression is a statistical method that allows to summarize and study relationships between two continuous (quantitative) variables: One variable, denoted X, is regarded as the independent variable. The other variable denoted y is regarded as the dependent variable. Linear regression uses one independent variable X to explain or predict the outcome of the dependent variable y, while multiple regression uses two or more independent variables to predict the outcome according to a loss function such as mean squared error (MSE) or mean absolute error (MAE). So whenever you are told to predict some future value of a process which is currently running, you can go with a regression algorithm. Despite the simplicity of this algorithm, it works pretty well when there are thousands of features, for example, a bag of words or n-grams in natural language processing. More complex algorithms suffer from overfitting many features and not huge datasets, while linear regression provides decent quality. However, is unstable in case features are redundant.
Don’t confuse these classification algorithms with regression methods for using regression in its title. Logistic regression performs binary classification, so the label outputs are binary. We can also think of logistic regression as a special case of linear regression when the output variable is categorical, where we are using a log of odds as the dependent variable. What is awesome about logistic regression? It takes a linear combination of features and applies a nonlinear function (sigmoid) to it, so it’s a tiny instance of the neural network!
Say you have a lot of data points (measurements for fruits) and you want to separate them into two groups apple and pears. K-means clustering is a clustering algorithm used to automatically divide a large group into smaller groups.
The name comes because you choose K groups in our example K=2. You take the average of these groups to improve the accuracy of the group (average is equal to mean, and you do this several times). The cluster is just another name for a group.
Let’s say you have 13 data points, which in actuality are seven apples and six pears, (but you don’t know this) and you want to divide them into two groups. For this example let’s assume that all the pear are larger than all the apples. You select two random data points as a starting position. Then, you compare these points to all the other points and find out which starting position is closest. This is your first pass at clustering and this is the slowest part.
You have your initial groups, but because you chose randomly, you are probably inaccurate. Say you got six apples and one pear in one group, and two apples and four pears in the other. So, you take the average of all the points in one group to use as a new starting point for that group and do the same for the other group. Then you do the clustering again to get new groups.
Success! Because the average is closer to the majority of each cluster, on the second go around you get all apples in one group and all pears in the other. How do you know you’re done? You do the average and you are performing group again and see if any points changed the groups. None did, so you’re finished. Otherwise, you’d go again.
Straight away, the two seek to accomplish different goals. K-nearest neighbors is a classification algorithm, which is a subset of supervised learning. K-means is a clustering algorithm, which is a subset of unsupervised learning.
If we have a dataset of football players, their positions, and their measurements, and we want to assign positions to football players in a new dataset where we have measurements but no positions, we might use K-nearest neighbors.
On the other hand, if we have a dataset of football players who need to be grouped into K distinct groups based off of similarity, we might use K-means. Correspondingly, the K in each case also mean different things!
In K-nearest neighbors, the K represents the number of neighbors who have a vote in determining a new player’s position. Check the example where K=5. If we have a new football player who needs a position, we take the five football players in our dataset with measurements closest to our new football player, and we have them vote on the position that we should assign the new player.
In K-means the K means the number of clusters we want to have in the end. If K= 7, I will have seven clusters, or distinct groups, of football players after I run the algorithm on my dataset. In the end, two different algorithms with two very different purpose, but the fact that they both use K can be very confusing.
5-Support Vector Machines
SVM uses hyperplanes (straight things) to separate two differently labeled points (X’s and O’s). Sometimes the points can’t be separated by straight things, so it needs to map them to a higher dimensional space (using kernels!) where they can be split by straight things (hyperplanes!). This looks like a curvy line on the original space, even though it is really a straight thing in a much higher dimensional space!
Let’s say we want to know when to invest in Procter & Gamble so we have three choices buy, sell and hold based on several data from the past month like open price, close price, change in the price and volume
Imagine you have a lot of entries, 900 points of data.
We want to build a decision tree to decide the best strategy, for example, if there is a change in the price of the stock more than ten percent higher than the day before with high volume we buy this stock. But we don’t know which features to use, we have a lot.
So we take a random set of measures and a random sample of our training set and we build a decision tree. Then we do the same many times using a different random set of measurements and a random sample of data each time. At the end we have many decision trees, we use each of them to forecast the price and then decide the final prediction based on a simple majority.
A neural network is a form of artificial intelligence. The basic idea behind a neural network is to simulate lots of densely interconnected brain cells inside a computer so it can get it to learn things, recognize patterns, and make decisions in a human-like way. The amazing thing about a neural network is that it doesn’t have to program it to learn explicitly: it learns all by itself, just like a brain!
On one hand of the neural network, there are the inputs. This could be a picture, data from a drone, or the state of a Go board. On the other hand, there are the outputs of what the neural network wants to do. In between, there are nodes and connections between those. The strength of the connections determines what output is called for based on the inputs.
Thanks for reading. If you loved this article, feel free to hit that subscribe button so we can stay in touch.