Machine Learning Setup and Overview
Introduction
This article will teach you how to set up your computer so you may code what you learn, as well as the ML Environment Setup, machine learning vocabulary, and its paradigms. Before we start with our ML Environment Setup, read this article Machine Learning : A Gentle Introduction to get an overview of machine learning.
Machine Learning Terminology
When you start learning any new technology the first step is to familiarize yourself with the terminology.
Dataset – Dataset is the core of any Machine Learning model. It is simply the collection of data required to build an ML system.
Instances – It denotes the rows of the dataset, which means the number of entries our dataset contains.
Features or Attributes – They are the individual measurable properties or characteristics of the data used to make predictions. They serve as input for models, helping them understand patterns or trends in the data to predict outcomes.
Targets or Labels – This is what our model learns to predict.
Labeled Data – It denotes the data for which labels for the features are provided.
Unlabeled Data – It denotes the data for which labels for the features are not provided.
Numerical Features – These are the features that consist of numerical data i.e. int, float, etc.
Categorical Features – These are the features that consist of categorical data. Categorical data is a type of data that can be divided into groups. For Eg: Type of Weather, Blood Group, etc.
Regression Problem – When the model predicts numerical data. For Eg: House Price Prediction, Stock Price Prediction, etc.
Classification Problem – When a model classifies the data into a particular class. For Eg: Image Classification, Sentiment Analysis, etc.
Machine Learning Paradigms
The three basic paradigms of machine learning are:-
Supervised Learning
A type of problem where the model is trained to map an input to an output based on the labeled dataset it was trained on.
Regression Problem: A type of problem in which the target variable has a continuous value.
Classification Problem: A type of problem in which the target variable represents a particular class.
Unsupervised Learning
A type of problem where the model is trained to find undetected patterns in an unlabelled dataset.
Clustering: Task of grouping a set of data points such that data points belonging to the same cluster are more similar than the ones in another cluster.
Dimensionality Reduction: Task of reducing the features in the dataset.
Reinforcement Learning
A type of problem deals with training an agent to take action in an environment in such a way that maximizes the cumulative reward.
Environment Setup
Now, that you’re familiar with ML let’s set up your environment so that you are able to apply what you learn via python.
Conda Environment and Jupyter Notebook
ML requires the use of numerous libraries, such as NumPy, Pandas, and others, and downloading each one separately can be highly time-consuming. This can be resolved by downloading Anaconda, which enables you to use practically all of the libraries and tools needed for machine learning.
Anaconda is open-source software that contains Jupyter, spyder, etc that is used for large data processing, data analytics, and heavy scientific computing. Anaconda works for R and python programming languages. Package versions are managed in anaconda by the package management system called conda.
You can download Anaconda from here. There is also in-depth installation documentation which you can check out to find out how to install Anaconda. If you want a lite version of anaconda with all necessary libraries and tools then you can also install Miniconda.
Now that you have Anaconda all set let’s use a handy utility it provides us called virtual environment i.e conda environment. You can use the default conda environment called base by typing the following command in your terminal:-
#for linux and Mac
conda activate base
#for Windows
activate base
Once you do that you can access the tools. You now need a platform to code and if you ask almost any Data Scientist they’ll give you the same answer i.e. Jupyter Notebook. Now to open the Jupyter notebook type the following command after activating your environment:-
jupyter notebook
Now to code you can create a notebook by going to New > Python as seen in the below image. And then you can code python as you do.
Once you do that you’ll be automatically redirected to a site in your browser, that’ll look like this. What makes jupyter special is the cells and markdown functionality that it offers.
And that’s you are all set to get your hands dirty with the actual coding part.
Google Colaboratory
In the previous section we learned about the Jupyter notebook setup, and now we quickly know to use Google Colab. To be precise, Colab is a free Jupyter notebook environment that runs entirely in the cloud. Most importantly, it does not require a setup and the notebooks that you create can be simultaneously edited by your team members. Colab supports many popular machine learning libraries which can be easily loaded into your notebook.
To start coding with Google Colab , go to the website and the page below will be displayed to you.
After getting started with the main page, Click on File>New Notebook to create a new notebook.
With Google Colab, you are now prepared to start working on the real coding portion. Colab also allows you to create projects, however doing so requires internet access. The primary benefit of using Google Colab is that no software needs to be installed in order to set up the ML environment. Google Colab is a great substitute for Jupyter Notebook if your computer does not have adequate disc space.
Summary
In this article, we have learned about the ML Environment Setup, Machine Learning terminology, its paradigms, and a guide to help you set up your machine for getting started with Machine Learning. Hope you found this article useful! Let’s meet again with an exciting new article.