# Machine Learning: Overview#

Contents

## What is Machine Learning?#

Simply put, ML is the science of programming computers so that they can learn from data.

[Machine learning is the] field of study that gives computers the ability to learn without being explicitly programmed.

—Arthur Samuel, 1959

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

—Tom Mitchell, 1997

## Why use machine learning?#

Limitations of

**rules**/**heuristics**-based systems:Rules are hard to be

**exhaustively**listed.Tasks are simply too complex for rule generalization.

The rule-based

**deductive**approach is not helpful in discovering novel things.

## Types of Machine Learning#

We can categorize ML into four types according to the amount and type of **supervision** it gets during training:

Supervised Learning

Unsupervised Learning

Semisupervised Learning

Reinforcement Learning

### Supervised Learning#

The data we feed to the ML algorithm includes the desired solutions, i.e.,

**labels**.**Classification**task (e.g., spam filter): the target is a categorical label.**Regression**task (e.g., car price prediction): the target is a numeric value.

Classification and regression are two sides of the coin.

We can sometimes utilize regression algorithms for classification.

A classic example is Logistic Regression.

Examples of Supervised Learning

K-Nearest Neighbors

Linear Regression

Naive Bayes

Logistic Regression

Support Vector Machines (SVMs)

Decision Trees and Random Forests

### Unsupervised Learning#

The data is

**unlabeled**. We are more interested in the underlying grouping patterns of the data.Clustering

Anomaly/Novelty detection

Dimensionality reduction

Association learning

Examples of Unsupervised Learning

Clustering

K-means

Hierarchical Clustering

Dimensionality reduction

Principal Component Analysis

Latent Dirichlet Allocation

### Semisupervised Learning#

It’s a combination of supervised and unsupervised learning.

For example, we start with unlabeled training data and use unsupervised learning to find groups. Users then label these groups with meaningful labels and then transform the task into a supervised learning.

A classific example is the photo-hosting service (e.g., face tagging).

### Reinforcement Learning#

This type of learning is often used in robots.

The learning system, called an Agent, will learn based on its observation of the environment. During the learning process, the agent will select and perform actions, and get rewards or penalties in return. Through this trial-and-error process, it will figure the most optimal strategy (i.e., policy) for the target task.

A classific example is DeepMind’s AlphaGo.

## Workflow for Building a Machine Learning Classifier#

In NLP, most often we deal with

**classification**problems. In particular, we deal with**supervised classifcation**learning problems.Given a dataset of

**texts**and their corresponding**labels**, the objectives of the classifier are:How can we identify particular

**features**of language data that are**salient**for texts of each**label**?How can we construct

**models**of language that can be used to perform the classification**automatically**?What can we learn about language from these classifiers?

A common workflow for classifier building is shown as follows:

(Source: from NLTK Book Ch 6, Figure 6-1)

Most classification methods require that features be encoded using simple value types, such as booleans, numbers, and strings.

But note that just because a feature has a simple type, this does not necessarily mean that the feature’s value is simple to express or compute.

Indeed, it is even possible to use very complex and informative values, such as the output of a second supervised classifier, as features. (i.e.,

**boosting**techniques)

## Feature Engineering#

### What is feature engineering?#

It refers to a process to feed the extracted and preprocessed texts into a machine-learning algorithm.

It aims at capturing the

**characteristics**of the text into a**numeric vector**that can be understood by the ML algorithms. (Cf.*construct*,*operational definitions*, and*measurement*in experimental science)In short, it concerns how to meaningfully represent texts quantitatively, i.e., the

**vectorized representation**of texts.

### Feature Engineering for Classical ML#

Word-based frequency lists

Bag-of-words representations (TF-IDF)

Domain-specific word frequency lists (e.g., Sentiment Dictionary)

Hand-crafted features based on domain-specific knowledge

### Feature Engineering for DL#

DL directly takes the texts as inputs to the model.

The DL model is capable of learning features from the texts (e.g., embeddings)

The price is that the model is often less interpretable.

### Strengths and Weakness#

## Challenges of ML#

Insufficient

**quantity**of training dataNon-

**representative**training dataPoor

**quality**dataIrrelevant

**features****Overfitting**the training data**Underfitting**the training data

## Testing and Validating#

Hyperparameter tuning and model selection

Many ML algorithms are available.

Algorithms often come with many parameter settings.

It is usually not clear which algorithms would perform better.

**Cross-validation**comes to rescue.

Cross-Validation

(Source: https://scikit-learn.org/stable/modules/cross_validation.html)

**Cross Validation**Before testing our model on the testing dataset, we can utilize

**\(k\)-fold cross-validation**to first evaluate our trained model within the**training dataset**and at the same time fine-tune the**hyperparameters**.Specifically, we often split the

**training set**into**\(k\)**distinct subsets called**folds**, and trains the model on the**\({k-1}\)**folds and test on the remaining 1 folds. A**\(k\)-fold**split allows us to do this training-testing for \(k\) times.Based on the distribution of the evaluation scores in all \(k\) folds of datasets, we get to see the

**average**performance of our model.Which ML algorithm performs the best?

Which sets of hyperparameters yield the best performance?

Note

Hyperparameters are parameters that are set before the learning process begins and determine the behavior of a machine learning algorithm. They are not learned from the data but are specified by the user. Hyperparameters control aspects of the learning process such as the complexity of the model, the learning rate, the regularization strength, and the number of iterations.

Here are some common examples of hyperparameters in machine learning algorithms:

**Regularization parameter (C)**in Support Vector Machines (SVM): This hyperparameter controls the trade-off between maximizing the margin and minimizing the classification error. A larger value of C allows for more complex decision boundaries, potentially leading to overfitting.**Learning rate (alpha)**in Gradient Descent: This hyperparameter determines the step size taken during each iteration of gradient descent. A smaller learning rate may lead to slower convergence but can help prevent overshooting the minimum, while a larger learning rate may speed up convergence but risk overshooting.**Number of hidden layers and units**in Neural Networks: These hyperparameters define the architecture of the neural network, including the number of hidden layers and the number of neurons in each layer. The choice of these hyperparameters can significantly impact the model’s capacity to learn complex patterns.**Number of trees and maximum depth**in Random Forest: The number of trees and the maximum depth of each tree are hyperparameters that control the complexity of the ensemble model. Increasing the number of trees and maximum depth can lead to higher model complexity and potentially better performance, but may also increase computational cost and risk overfitting.**Number of neighbors (k)**in k-Nearest Neighbors (kNN): This hyperparameter specifies the number of nearest neighbors to consider when making predictions. A larger value of k leads to a smoother decision boundary but may reduce the model’s ability to capture local patterns.

Examples of hyperparameters in common machine learning algorithms include:

**Logistic Regression**:**Regularization parameter (C)**: Controls the strength of regularization, with larger values indicating weaker regularization.**Penalty**: Specifies the type of regularization (e.g., L1 or L2 regularization).**Solver**: Determines the optimization algorithm used to fit the logistic regression model (e.g., ‘liblinear’, ‘lbfgs’, ‘newton-cg’, ‘sag’, or ‘saga’).

**Naive Bayes**:**Smoothing parameter (alpha)**: Controls the degree of smoothing applied to the estimated probabilities of features.**Distribution assumption**: Specifies the distribution assumption for the features (e.g., Gaussian Naive Bayes assumes a Gaussian distribution).

**Clustering (e.g., K-Means)**:**Number of clusters (k)**: Specifies the number of clusters to partition the data into.**Initialization method**: Determines how initial cluster centroids are selected (e.g., ‘k-means++’, ‘random’).**Maximum number of iterations**: Sets the maximum number of iterations for the algorithm to converge.

**Topic Modeling (e.g., Latent Dirichlet Allocation - LDA)**:**Number of topics**: Specifies the number of topics to be identified in the corpus.**Alpha parameter**: Controls the distribution of topics per document.**Beta parameter**: Controls the distribution of words per topic.**Number of iterations**: Sets the number of iterations for the model to converge.

**Support Vector Machines (SVM)**:**Kernel type**: Specifies the type of kernel function to be used (e.g., linear, polynomial, radial basis function).**Kernel coefficient (gamma)**: Controls the influence of each training example on the decision boundary.**Degree**: Specifies the degree of the polynomial kernel function (for polynomial kernels).**Class weights**: Assigns weights to different classes to address class imbalance.

Tuning these hyperparameters appropriately can significantly impact the performance of the models on unseen data.

## Machine Learning Project Checklist#

A machine learning project often involves a few important steps: [Géron, 2023]

Frame the problem and look at the big picture.

Get the data.

Explore the data to gain insights.

Prepare the data tp better explore the underlying data patterns to machine learning algorithms.

Explore many different models and shortlist the best ones.

Fine-tune the models amd combine them into a great solution.

Present your solution.

(Interpret the insights from the best-performing model).

Launch, monitor, and maintain your system.

## References#

Géron (2023), Ch 1-2