Supervised vs. Unsupervised Learning in Machine Learning
There is more than one way to build a piece of furniture. The end result for each should be the same, a completed desk or bookcase for example. How you choose to get there, though, may not be straightforward.
If you have the instruction manual it is as simple as following the steps. Maybe you have built one before and decide to go from memory. Or maybe it never came with a manual and you have to figure it out through trial and error.
Machine learning works in a similar fashion. The end result you are going for is a well-trained algorithm, but there are different ways to get there. Two of the most common through supervised and unsupervised learning.
Much like it sounds, supervised learning means something is watching over the results to determine whether they are correct or not. In machine learning, this means that the algorithm needs a complete set of labeled training data to work with.
Labeled training data means that each correct answer, the outcome you want the algorithm to come to, is tagged accordingly. For example, if you were trying to train an algorithm to correctly recognize photos of sofas, the training data would tell it which supplied pictures were sofas rather than chairs or bookcases. The algorithm then compares every new image to what it has already learned about the original images and makes a prediction.
Supervised learning is most commonly used for regression and classification.
In classification, the algorithm tries to analyze the new data it is fed, then assign it to a certain group or class. Sticking with the furniture analogy, the labeled training data would tell the algorithm which pictures were of sofas, which were chairs, and which were bookcases. The algorithm then takes new images, attempts to classify them into one of those groups, and is scored based on how accurately it did so.
Regression, on the other hand, deals with continuous data. It is more useful when you have multiple data points you are trying to train the algorithm on. Maybe you don’t have photos of the different types of furniture, but information about the furniture instead. You could feed data such as the cost of the item, the width, height, length, and weight, and train the algorithm to make the prediction based on that.
Supervised learning is great when you have reference data to train with. When you don’t, you may have to turn to unsupervised learning.
Looking for career tips from Chicago's top IT staffing team?
When you don’t have fully labeled training data, or are asking questions you don’t know the answer to, you’ll likely have to turn to unsupervised learning.
In unsupervised learning, the algorithm is given data without much information about what to do with it. In other words, there is no known “right” answer. It is up to the algorithm to decide what to do with it. It tries to analyze the data looking for patterns, structure, commonalities, anomalies, and features. It will then try to display what data points it thinks belong together or are connected in some way, and which are not. There are quite a few ways it can do this:
Clustering The algorithm tries to find data points that are similar and cluster them into groups. It may notice that sofas tend to be wide and short, while bookcases are tall and thin, and cluster those items together accordingly.
Anomaly detection The algorithm looks for data points that differ drastically from the dataset as a whole. Nearly all sofas have 4 legs, so if you snuck in 1 photo of a legless sofa, the algorithm would mark this as anomalous.
Association The algorithm uses some key data points to make assumptions about others. This is heavily used in eCommerce. If a website notices that people who have bought lots of pet food and are now buying a dark-colored sofa typically purchase lint rollers as well, it will notate the association between pet ownership and cleaning shed fur.
Autoencoders The algorithm attempts to compress the data points, then rebuild the original input data from the compressed version. Imagine it took the original photo of a sofa but compressed it by 95%. That new, much blurrier image is treated as new input data. The algorithm would then use only that blurry image to attempt to reproduce a clean image as close to the original as possible.
There is no such thing as the perfect machine learning algorithm. The type of algorithm that should be used depends on a number of factors. What is the question being asked or the problem being solved? Do we have accurately labeled data to train it on? Do we even know what the right answer is, or are we just looking for patterns? Both supervised and unsupervised learning can produce amazing outcomes, but only if we know what we are using it for.
Are you looking for a job in Information Technology?
See all of our current openings here!
Check out our latest video on YouTube!
About the Company:
Peterson Technology Partners (PTP) has been Chicago's premier Information Technology (IT) staffing, consulting, and recruiting firm for over 22+ years. Named after Chicago's historic Peterson Avenue, PTP has built its reputation by developing lasting relationships, leading digital transformation, and inspiring technical innovation throughout Chicagoland.
Based in Park Ridge, IL, PTP's 250+ employees have a narrow focus on a single market (Chicago) and expertise in 4 innovative technical areas;
Cloud & DevOps
PTP exists to ensure that all of our partners (clients and candidates alike) make the best hiring and career decisions.
Peterson Technology Partners is an equal opportunity employer.