• Category
  • >Machine Learning

7 Types of Cost Functions in Machine Learning

  • Hrithik Saini
  • Aug 22, 2022
7 Types of Cost Functions in Machine Learning title banner

In machine learning, the objective of a regression issue is to identify the value of the objective function that can precisely predict the data pattern. Similar to this, solving a classification task entails determining the value of the dependent variable that best categorizes the various classes of data. 

 

How effectively the model assumes the extracted features directly from the input values serves as the basis for evaluating the model's accuracy. Finding the value of the objective function that can precisely anticipate the sequence is the aim of a regression issue in machine learning. 

 

Obtaining the value of the dependent variable that can properly identify the various classes of data is similar to how to solve a making responsible. The model's accuracy is assessed based on how well it anticipates the expected output from the input parameters.

 

Here, we'll talk about the cost function, one such statistic used to iteratively calibrate the performance of the models.

 

 

What is Cost Function in Machine Learning?

 

In machine learning, cost functions may be thought of as an effective criterion for a strategy. The discrepancy or distance between the projected value and the true value quantity is used to calculate a cost function. 

 

It is calculated by iteratively running the model and comparing the predicted values to the actual values of y. It requires an indication of the model's effectiveness.

 

It is often referred to as the model's error measurement or loss function. Better models result from lower-cost function values. As a result, the cost function estimates how distant the model's anticipated value is from its actual value while also attempting to improve the model's assumptions.

 

Since the cost function can only be reduced thus far, the objective of a machine learning or supervised learning model is to determine the optimal set of requirements through an ongoing procedure.

 

Also Read | What is Stochastic Gradient Descent?

 

 

What is Gradient Descent?

 

The Gradient Descent algorithm is used to reduce the model's inaccuracy or cost function. It is used to determine the smallest amount of inaccuracy that can exist in your model.

 

You may think of gradient descent as the path you must travel to arrive at the least amount of error. To avoid wasting resources, you must identify the shortest technique to decrease the inaccuracy in your model, which might vary at different locations.

 

One way to picture gradient descent is as a momentum going down an incline. The lowest point on the slope will be reached by the ball at this point. Since the error will always be smallest at some time before increasing once again, it may be assumed that this is the moment when the error is lowest.

 

With gradient descent, you may determine your model's error for various input variable values. When this happens repeatedly, the error levels quickly start to decrease. You'll soon reach the variable values where the error is minimized and the objective functions are optimized.

 

Types of Cost Functions in Machine Learning

 

Now let's take a deeper look at the kind of typical cost function types that are employed in machine learning.

 

  1. Mean Squared Error (MSE)

 

One of the most basic and widely used estimation methods in machine learning is mean squared error. Since it combines and totals the square error values, it is sometimes referred to as the statistical model. 

 

As it squares the error's significant difference, it eliminates all negative values, overcoming the drawback of distance-based inaccuracy. L2 loss, another name for the mean squared error, is computed as follows:

 

MSE formula = (1/n) * Σ(actual – forecast)2

 

Where:

 

  • n = Number of items,

  • Σ = Summation notation,

  • Actual = Original or observed y-value,

  • Forecast = y-value from regression.

 

The error signal is further amplified by MSE for data that is prone to outliers and noise, which significantly raises the objective functions as a whole. We will thus talk about a different kind of cost function that really can help with this issue, the average absolute error.
 

 

  1. Distance-Based Error

 

The basic cost function that unites the idea for numerous types of cost functions is a distance-based error. Assume the real result is y for a specific set of input data. The equation y = wx + b's parameters are first initialized randomly in the system, and the projected output is supplied as y'. Next, the distance-based variance is explained as,

 

MSE formula = y-y’

 

Where:

 

  • y = The actual value

  • y' = The model's anticipated value.

 

The cost functions for regression issues are calculated using this equation as the foundation. We will describe a different kind of cost function that gets around this restriction because computing the distance-based errors functionality is prone to negative mistakes.

 

 

  1. Root Mean Squared Error

 

Root Mean Square Error (RMSE) is the residuals' measure of dispersion (prediction errors). The distance between the data points and the regression line is measured by residuals, and the distribution of these residuals is measured by RMSE. 

 

In other words, it provides information on how tightly the data is clustered all-around confidence intervals. In climatology, forecasts, and regression analysis, root mean square error is frequently used to validate experimental findings. In order to determine the root mean squared error,

 

Where 

 

  • y = Output's actual value.

  • N = The total number of measurements taken

  • y' = Output's anticipated value. 

  • Σ = Summation (“add up”)

  • (zfi – zoi)2 = differences, squared


 

If we wish to assess the standard deviation (sigma) of a typically observed value from our model's forecast, RMSE is seen to be a useful indicator of a model's effectiveness.

 

The correlation coefficient is directly correlated with the usage of standardized measurements and predictions as RMSE inputs. For instance, if the correlation coefficient is 1, all of the values are on the linear regression, and the RMSE will be 0.

 

Also Read | Introduction to Neural Networks & Deep Learning

 

 

  1. Cross-Entropy Function

 

An information theory metric called cross-entropy builds on entropy by computing the distinction between different probability distributions. While cross-entropy may be considered as calculating the overall entropy between the populations, it differs from KL convergence, which estimates the comparative entropy between two probability density functions.

 

Cross-entropy and logistic loss, sometimes known as log loss, are related concepts. The two measurements may be used indiscriminately even though they come from distinct sources when employed as loss functions in classification models since they both calculate the same amount. The cross-entropy is determined as follows:

 

Where,

 

  • p(x) = The posterior distribution of the measured results

  • N = The total number of questionnaires collected

  • q(x) = The conditional probability of the projected values.

 

Consider a classification issue where there are three classes of fruit images: orange, apple, and mango. A projected probability is produced by the trained classification model with each of the three alternative categories. 

 

The probability distribution's expected value according to the model is q = [0.5, 0.2, 0.3]. We are aware that an Orange will be the input in this supervised learning task. As a result, p = [1, 0, 0] represents a specific posterior distribution for the issue.

 

The cross-entropy function calculates the difference between the two populations; the cross-entropy increases as the difference between the two values increases.

 

 

  1. Mean Absolute Error

 

The average discrepancy between the estimated and real values is computed using mean absolute error. As it analyzes inaccuracy in field observations on the very same scaling, it is also referred to as scale-dependent accuracy. It serves as a machine learning assessment indicator for regression analysis

 

It estimates the discrepancies between both the model's projected numerical results and the regression coefficients. It is employed to forecast the machine learning model's accuracy.

 

 

Where,

 

  • Σ: Greek symbol for summation

  • yi: Actual value for the ith observation

  • xi: Calculated value for the ith observation

  • n: Total number of observations

 

Also Read | Classification & Regression Tree Algorithm

 

 

  1. Kullback-Liebler (KL) Divergence

 

The KL Divergence function, which measures the change (or divergence) among two probability density functions, is very similar to cross-entropy. 

 

The procedure calculates the distance between the two likelihood distributions p and q, where q is the projected probability distribution of the model's outcome and p is the actual probability density function. In order to compute the Kullback-Leibler divergence from q to p,

 

Where,

 

  • p(x) = The sampling distribution of the measured results,

  • q(x) = The probability distribution of the expected values,

  • N = The overall number of observations made.

 

 

  1. Hinge Loss

 

A typical cost function used during Support Vector Machines (SVM) for categorization is the hinge loss function. The output is converted to values between 1 and -1. Calculations of the Hinge loss function are as follows:

 

 

Where,

 

  • h(y) = The classification value obtained from the model

  • y = The actual value of the output.

 

It is clear from the expression that the cost function is zero when y*h(y) geq 1. Therefore, the cost function rises when y*h(y) lt 1. As a result, the hinge loss function for the real value of y = 1.

 

 

Conclusion

 

In machine learning, cost functions, sometimes referred to as loss functions, are crucial for model training and construction. Machine learning and deep learning models are trained using a variety of cost functions. 

 

We hope you learned something from this article. In this post, we covered a few key cost formulas that are used depending on the nature of the issue.

Latest Comments