which of the following are ml methods

which of the following are ml methods. Machine learning methods are usually provided.

Machine learning

There are two kinds of ML methods for calculating the model:

Linearized Random Model (Linear RLMS)

Linearized Linear Models (LMS)

Linearized Linear MLMs are algorithms that can be used inside machines if you have some difficulty with them, but are useful for very specific problems, e.g. for training multiple models using the same algorithm for different sub-variables. Examples for lm and MLM are presented in the tutorial.

For LMS, the general rule is to only use some of the parameters for any one variable (ie, when you’re working with multiple models) and let the algorithm solve the problem that you are solving. For example, if I’m working with a dataset that has a variable that is two or more variables, I can use the following function to work out how many different variables I can take in.

SELECT * FROM model WHERE x = ‘‘ SELECT d from model WHERE d2 = 1 WHERE model2> = ‘

Linear MLMs are a very common type of ML method that are used for running regularizing the model. Many LMS algorithms will do the same thing, but sometimes they may not be optimized for this particular task.

Linear RLMS

A special variant of ML algorithm called MLM is built into the ML

Introduction

Introduction

When I wrote this blog post (this Pytorch tutorial), I remembered the challenge I set for myself at the beginning of the year to learn deep learning, I did not even know Python at the time. What makes things difficult is not necessarily the complexity of the concepts, but it starts with questions like: What framework to use for deep learning? Which activation function should I choose? Which cost function is best suited for my problem?

My personal experience has been to study the PyTorch framework and especially for the theory the online course made available for free by Yan LeCun whom I thank (link here Website fr.wikipedia.org) . I still have to learn and work in the field but through this blog post, I would like to share and give you an overview of what I have learned about deep learning this year.

which of the following are ml methods

Deep Learning: Which activation and cost function to choose? – which of the following are ml methods

The objective of this article is to sweep through this central topic in DeepLearning of choosing the activation and cost (loss) function according to the problem you are looking to solve by means of a DeepLearning algorithm.

We are talking about the activation function of the last layer of your model, that is to say the one that gives you the result.

This result is used in the algorithm which checks for each learning the difference between the result predicted by the neural network and the real result (gradient descent algorithm), for this we apply to the result a cost function (loss function) which represents this difference and which you seek to minimize as you practice in order to reduce the error. (See this article here to learn more : Website machinelearnia.com)

which of the following are ml methods
Source https://upload.wikimedia.org/wikipedia/commons/6/68/Gradient_descent.jpg :
L’algorithe de descente de Gradient permet de trouver le minimum de n’importe quelle fonction convexe pas à pas. – Pytorch tutorial

As a reminder, the machine learns by minimizing the cost function, iteratively by successive training steps, the result of the cost function and taken into account for the adjustment of the parameters of the neurons (weight and bias for example for linear layers) .

This choice of activation function on the last layer and the cost function to be minimized (loss function) is therefore crucial since these two elements combined will determine how you are going to solve a problem using your neural network.

Pytorch tutorial deeplearning with python and pytorch mnist tutorial
https://upload.wikimedia.org/wikipedia/commons/6/60/ArtificialNeuronModel_english.png – Pytorch tutorial

This article also presents simple examples that use the Pytorch framework in my opinion a very good tool for machine learning.

The question here to ask is the following:

  • Am I looking to create a model that performs binary classification? (In this case you are trying to predict a probability that the result of an entry will be 0 or 1)
    Am I looking to calculate / predict a numeric value with my neural network? (In this case you are trying to predict a decimal value for example at the output that corresponds to your input)
  • Am I looking to create a model that performs single or multiple label classification for a set of classes? (In this case, you are trying to predict for each output class the probability that an input matches this class)
  • Am I looking to create a model that searches for multiple classes within a set of possible classes? (In this case, you are trying to predict for each class at the exit its attendance rate at the entrance).

Binary classification problem:

You are trying to predict by means of your DeepLearning algorithm whether a result is true or false, and more precisely it is a probability that a result of 1 or that of a result of 0 that you will get.

Pytorch tutorial deeplearning with python and pytorch mnist tutorial

The output size of your neural network is 1 (final layer) and you seek to obtain a result between 0 and 1 which will be assimilated to the probability that the result is 1 (example at the output of the network if you obtain 0.65 this will correspond to 65% chance that the result is true).
Application example: Predicting the probability that the home team in a football match will win. The closer the value is to 1 the more the home team has a chance of winning, and conversely the closer the score is to 0 the more chance the home team has of losing.
It is possible to qualify this result using a threshold, for example by admitting that if the output is> 0.5 then the result is true if not false.

Final activation function (case of binary classification):

The final activation function should return a result between 0 and 1, the correct choice in this case may be the sigmoid function. The sigmoid function will easily translate a result between 0 and 1 and therefore is ideal for translating the probability that we are trying to predict.

If you want to plot the sigmoid function in Python here is some code that should help you (see alsoWebsite squall0032.tumblr.com) :

import math
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(x):
    a = []
    for item in x:
        a.append(1/(1+math.exp(-item)))
    return a
x = np.arange(-10., 10., 0.2)
sig = sigmoid(x)
plt.plot(x,sig)
plt.show()
Pytorch tutorial deeplearning with python and pytorch mnist tutorial
Tracé de la fonction sigmoid, fonction utilisée comme fonction d’activation finale dans le cas d’un algorithme de Deep Learning – Pytorch tutorial
The cost function – Loss function (case of binary classification):

You have to determine during training the difference between the probability that the model predicts (translated via the final sigmoid function) and the true and known response (0 or 1). The function to use is Binary Cross Entrropy because this function allows to calculate the difference between 2 probability.
The optimizer will then use this result to adjust the weights and biases in your model (or other parameters depending on the architecture of your model).

Example :

In this example I will create a neural network with 1 linear layer and a final sigmoid activation function.

First, I perform all the imports that we will use in this post:

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
from collections import OrderedDict
import math
from random import randrange
import torch 
from torch.autograd import Variable
import torch.nn as nn 
from torch.autograd import Function 
from torch.nn.parameter import Parameter 
from torch import optim 
import torch.nn.functional as F 
from torchvision import datasets, transforms
from echoAI.Activation.Torch.bent_id import BentID
from sklearn.model_selection import train_test_split
import sklearn.datasets
from sklearn.metrics import accuracy_score
import hiddenlayer as hl
import warnings
warnings.filterwarnings("ignore")
from torchviz import make_dot, make_dot_from_trace

The following lines allow you not to display the warnings in python:

import warnings
warnings.filterwarnings("ignore")

I will generate 1000 samples generated by the sklearn library allowing me to have a set of tests for a binary classification problem. The test set covers 2 decimal inputs and a binary output (0 or 1). I also display the result as a point cloud:

inputData,outputData = sklearn.datasets.make_moons(1000,noise=0.3) 
plt.scatter(inputData[:,0],inputData[:,1],s=40,c=outputData,cmap = plt.cm.get_cmap("Spectral"))
Pytorch tutorial deeplearning with python and pytorch mnist tutorial
Creation of 1000 samples with a result of 0 or 1 via sklearn

The test set generated here corresponds to a data set with 2 classes (class 0 and 1).

The goal of my neural network is therefore a binary classification of the input.

I will take 20% of this test set as test data and the remaining 80% as training data. I split the test set again to create a validation set on the training dataset.

X_train, X_test, y_train, y_test = train_test_split(inputData, outputData, test_size=0.20, random_state=42)
Input Data
Pytorch tutorial deeplearning with python and pytorch mnist tutorial
Output data

In my example I’m going to create a network with any architecture what interests us is to show how the last layer works:

  • input layer: linear activation function
  • hidden layers: activation function: bent identity
  • activation function that I import from the EchoAI package which implements additional activation functions for Pytorch (Website en.wikipedia.org and Website en.wikipedia.org).
    linear activation function: output layer with an activation function chosen sigmoid as explained above, the objective is to reduce the result to a probability that the input is of class 0 or 1. (Website pytorch.org)
fc1 = nn.Linear(2, 2)
fc2 = nn.Linear(2, 1)

model = nn.Sequential(OrderedDict([
                      ('lin1', fc1),
                      ('BentID1', BentID()),
                      ('lin2', fc2),
                      ('sigmoid', nn.Sigmoid())
                        ]))
model = model.double()

I added a line to transform the pytorch model to use the double type. This avoids an error of the type:

Expected object of scalar type Double but got scalar type Float for argument

Also it will be necessary to use dtype = torch.double when creating torch.tensor at each training.

As indicated in my documents above the cost function is the Binary Cross Entropy function in this case (BCELoss : Website pytorch.org) :

loss = nn.BCELoss()

I am using the Adam optimizer with a learning rate of 0.01:
Deep learning optimizers are algorithms or methods used to modify attributes of your neural network such as weights and bias in such a way that the loss function is minimized.

The learning rate is a hyperparameter that controls how much the model should be changed in response to the estimated error each time the model weights are updated.

Here I am using the Adam optimization algorithm. In Deep Learning Adam is a stochastic gradient descent method that calculates individual adaptive learning rates for different parameters from estimates of the first and second order moments of the gradients.
More information on optimizing Adam: Website machinelearningmastery.com
and in PyToch doc : Website pytorch.org

optimizer = optim.Adam(model.parameters(), lr=0.01)

I previously split the training data a second time to take 20% of the data as validation data. They allow us at each training to validate that we are not doing over-training (over-adjustment, or over-interpretation, or simply in English overfitting).

X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.20, random_state=42)

In this example I have chosen to implement the EarlyStopping algorithm with a patience of 5. This means that if the cost function of the validation data increases during 15 training sessions (ie the distance between the prediction and the true data). After 15 training sessions with an increasing distance, we reload the previous data which represents the configuration of the neural network producing a distance of the minimum cost function (in the case of our example this consists in reloading the weight and the bias for the 2 layers linear). As long as the function decreases, the configuration of the neural network is saved (weight and bias of the linear layers in our example).

Here I have chosen to display the weights and bias of the 2 linear layers every 10 workouts, to give you an idea of how the Adam optimization algorithm works:

I used Website github.com to display various graphs around training and validation metrics. The graphics are thus refreshed during use.

history1 = hl.History()
canvas1 = hl.Canvas()

Here is the main code of the learning loop:

bestvalidloss = np.inf
saveWeight1 = fc1.weight 
saveWeight2 = fc2.weight 
saveBias1 = fc1.bias
saveBias2 = fc2.bias
wait = 0
for epoch in range(1000):
    model.train()
    optimizer.zero_grad()
    result = model(torch.tensor(X_train,dtype=torch.double))
    lossoutput = loss(result, torch.tensor(y_train,dtype=torch.double))
    lossoutput.backward()
    optimizer.step()
    print("EPOCH " + str(epoch) + "- train loss = " + str(lossoutput.item()))
    if((epoch+1)%10==0):
        #AFFICHE LES PARAMETRES DU RESEAU TOUT LES 10 entrainements
        print("*************** PARAMETERS WEIGHT & BIAS *********************")
        print("weight linear1= " + str(fc1.weight))
        print('Bias linear1=' + str(fc1.bias))
        print("weight linear2= " + str(fc2.weight))
        print('bias linear2=' + str(fc2.bias))
        print("**************************************************************")
    model.eval()
    validpred = model(torch.tensor(X_valid,dtype=torch.double))
    validloss = loss(validpred, torch.tensor(y_valid,dtype=torch.double))
    print("EPOCH " + str(epoch) + "- valid loss = " + str(validloss.item()))
    
    # Store and plot train and valid loss.
    history1.log(epoch, trainloss=lossoutput.item(),validloss=validloss.item())
    canvas1.draw_plot([history1["trainloss"], history1["validloss"]])
    
    if(validloss.item() < bestvalidloss):
        bestvalidloss = validloss.item()
        #
        saveWeight1 = fc1.weight 
        saveWeight2 = fc2.weight 
        saveBias1 = fc1.bias
        saveBias2 = fc2.bias
        wait = 0
    else:
        wait += 1
        if(wait > 15):
            #Restauration des mailleurs parametre et early stopping
            fc1.weight = saveWeight1
            fc2.weight = saveWeight2
            fc1.bias = saveBias1
            fc2.bias = saveBias2
            print("##############################################################")
            print("stop valid because loss is increasing (EARLY STOPPING) afte EPOCH=" + str(epoch))
            print("BEST VALID LOSS = " + str(bestvalidloss))
            print("BEST PARAMETERS WHEIGHT AND BIAS = ")
            print("FIRST LINEAR : WEIGHT=" + str(saveWeight1) + " BIAS = " + str(saveBias1))
            print("SECOND LINEAR : WEIGHT=" + str(saveWeight2) + " BIAS = " + str(saveBias2))
            print("##############################################################")
            break
        else:
            continue

Here is an overview of the result, training stops after about 200 epochs (an « epoch » is a term used in machine learning to refer to a passage of the complete training data set). In my example I didn’t use a batch to load the data so the epoch count is the iteration count.

Pytorch tutorial deeplearning with python and pytorch mnist tutorial

Evolution of the cost function on the training test and the validation test

Here’s a look at the prediction results:

result = model(torch.tensor(X_test,dtype=torch.double))
plt.scatter(X_test[:,0],X_test[:,1],s=40,c=y_test,cmap = plt.cm.get_cmap("Spectral"))
plt.title("True")
plt.colorbar()
plt.show()
plt.scatter(X_test[:,0],X_test[:,1],s=40,c=result.data.numpy(),cmap = plt.cm.get_cmap("Spectral"))
plt.title("predicted")
plt.colorbar()
plt.show()
Pytorch tutorial deeplearning with python and pytorch mnist tutorial
Results: The first graph shows the « true » answer the second shows the predicted answer (the color varies according to the predicted probability) – Pytorch tutorial

To display the architecture of the neural network I used the hidden layer library which requires the installation of graphviz to avoid the following error:

RuntimeError: Make sure the Graphviz executables are on your system's path


To resolve this error see Website graphviz.org

On Mac OS X, for example, I installed via Homebrew:

brew install graphviz

Here is an overview of the hidden layer graph:

# HiddenLayer graph
hl.build_graph(model, torch.zeros(2,dtype=torch.double))
Hidden Layer architecture pour le classifier binaire – Pytorch tutorial

I also used PyTorchViz Website github.com to display a graph of Pytorch operations during the forward of an entry and thus we can clearly see what will happen pass during the backward (ie the application the calculation of the dols / dx differential for all the parameters x of the network which has requires_grad = True).

make_dot(model(torch.ones(2,dtype=torch.double)), params=dict(model.named_parameters()))
Pytorch tutorial deeplearning with python and pytorch mnist tutorial
Architecture diagram of the neural network created with PyTorch – Pytorch tutorial

Regression problem (numeric / decimal value calculation):

In this case, you are trying to predict a continuous numerical quantity using your DeepLearning algorithm.
In this case, the gradient descent algorithm consists in comparing the difference between the numerical value predicted by the network and the true value.
The output size of the network will be 1 since there is only one value to predict.

Pytorch tutorial deeplearning with python and pytorch mnist tutorial
Final activation function (decimal or numeric value case):

The activation function to use in this case depends on the range in which your data is located.

For a data between -infinite and + infinite then you can use a linear function at the output of your network.

Linear function graph function. Website pytorch.org. The particularity is that this linear function (of the type y = ax + b) contains learnable parameters (weight and bias which are modified by the optimizer over the training sessions, i.e. y = weight * x + bias).

Pytorch tutorial deeplearning with python and pytorch mnist tutorial
Source https://upload.wikimedia.org/wikipedia/commons/9/9c/Linear_function_kx.png: Linear functions.

You can also use the Relu function if your value to predict is strictly positive
the output of ReLu is the maximum value between zero and the input value. An output is zero when the input value is negative and the input value when the input is positive.
Note that unlike the rectified linear activation function Relu does not have an adjustable parameter (learnable).

Website pytorch.org

Pytorch tutorial deeplearning with python and pytorch mnist tutorial
Fonction d’activation ReLu Source = https://upload.wikimedia.org/wikipedia/commons/f/fe/Activation_rectified_linear.svg

Another possible function is PReLu (parametric linear rectification unit):

Pytorch tutorial deeplearning with python and pytorch mnist tutorial
PReLu activation function
Source https://upload.wikimedia.org/wikipedia/commons/a/ae/Activation_prelu.svg


Also another possible final activation function is Bent identity.

Pytorch tutorial deeplearning with python and pytorch mnist tutorial
Bent identity activation function Source https://upload.wikimedia.org/wikipedia/commons/c/c3/Activation_bent_identity.svg


Finally I recommend the possible use of Parametric Soft Exponential, I use the echoAi implementation because it is not natively in PyTorch
Website en.wikipedia.org

Soft Exponential
Source https://upload.wikimedia.org/wikipedia/commons/b/b5/Activation_soft_exponential.svg
The cost function (regression case, calculation of numerical value):

The cost function which makes it possible to determine the distance between the predicted value and the real value is the average of the squared distance between the 2 predictions. The function to use is mean squared error (MSE): Website pytorch.org

Here is a simple example that illustrates the use of each of the final functions:

I started off with the example of house prices which I then adapted to test with each of the functions.

Example: Loading test data

First, I import the necessary libraries:

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
from collections import OrderedDict
import math
from random import randrange
import torch 
from torch.autograd import Variable
import torch.nn as nn 
from torch.autograd import Function 
from torch.nn.parameter import Parameter 
from torch import optim 
import torch.nn.functional as F 
from torchvision import datasets, transforms
from echoAI.Activation.Torch.bent_id import BentID
from echoAI.Activation.Torch.soft_exponential import SoftExponential
from sklearn.model_selection import train_test_split
import sklearn.datasets
from sklearn.metrics import accuracy_score
import hiddenlayer as hl
import warnings
warnings.filterwarnings("ignore")
from torchviz import make_dot, make_dot_from_trace
from sklearn.datasets import make_regression
from torch.autograd import Variable

I create the dataset which is quite simple and display it:

house_prices_array = [30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140]
house_price_np = np.array(house_prices_array, dtype=np.float32)
house_price_np = house_price_np.reshape(-1,1)
house_price_tensor = Variable(torch.from_numpy(house_price_np))
house_size = [ 7.5, 7, 6.5, 6.0, 5.5, 5.0, 4.5,3.5,3.2,2.8,3.0,2.5]
house_size_np = np.array(house_size, dtype=np.float32)
house_size_np = house_size_np.reshape(-1, 1)
house_size_tensor = Variable(torch.from_numpy(house_size_np))
import matplotlib.pyplot as plt
plt.scatter(house_prices_array, house_size_np)
plt.xlabel("House Price $")
plt.ylabel("House Sizes")
plt.title("House Price $ VS House Size")
plt.show()

In the example, we see that the function to find is close to
f (x) = – 0.05 * x + 9
Example: – 0.05 * 40 + 9 = 7 and -0.05 * 30 + 9 = 7.5

I set the bias and the weight to -0.01 and 8 to limit the training time.

Linear activation function (Solving regression problem):
fc1 = nn.Linear(1, 1)

model = nn.Sequential(OrderedDict([
                       ('lin', fc1)
                        ]))

model = model.float()
fc1.weight.data.fill_(-0.01)
fc1.bias.data.fill_(8)

I declare MSELoss and an Adam optimizer tuned with a learning rate 0.01

loss = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

Here is the training loop:

history1 = hl.History()
canvas1 = hl.Canvas()
for epoch in range(100):
    model.train()
    optimizer.zero_grad()
    result = model(house_price_tensor)
    lossoutput = loss(result, house_size_tensor)
    lossoutput.backward()
    optimizer.step()
    print("EPOCH " + str(epoch) + "- train loss = " + str(lossoutput.item()))
    history1.log(epoch, trainloss=lossoutput.item())
    canvas1.draw_plot([history1["trainloss"]])

After a hundred training sessions we obtain an MSELoss = 0.13

Pytorch tutorial deeplearning with python and pytorch mnist tutorial

If I display the weights and biases found by the model I get in my example:

print("weight" + str(fc1.weight))
print("bias" + str(fc1.bias))

The network therefore executes here the function f (x) = -0.0408 * x + 8.1321

I then display the predicted result to compare it with the actual result:

result = model(house_price_tensor)
plt.scatter(house_prices_array, house_size_np)
plt.title("True")
plt.show()
plt.scatter(house_prices_array,result.detach().numpy())
plt.title("predicted")
plt.show()
Pytorch tutorial deeplearning with python and pytorch mnist tutorial

Here is a preview of the architecture diagram for a simple Pytorch neuron network with Linear function, we see the multiplication operator and the addition operator to execute y = ax + b.

hl.build_graph(model, torch.ones(1,dtype=torch.float))
Diagramme of the neural network pytorch – Pytorch tutorial
Pytorch tutorial deeplearning with python and pytorch mnist tutorial

ReLu activation function (Regression problem solving):

ReLu is not an activation function with « learnable » parameters (modified by the optimizer) so I add a linear layer upstream for the test:

The result is identical, which is logical since in my case reLu only passes the result which is strictly positive

In the previous example, a ReLu layer must be added at the output after the linear layer:

fc1 = nn.Linear(1, 1)

model = nn.Sequential(OrderedDict([
                       ('lin', fc1),
                       ('relu',nn.ReLU())
                        ]))

model = model.float()
fc1.weight.data.fill_(-0.01)
fc1.bias.data.fill_(8)
Model architecture with linear layer + ReLu layer

The same for PrRelu in my case:
These two functions simply make a flat pass between the linear function and the output. ReLu remains interesting if you want to set to 0 all the outputs concerning a negative input.

Bent Identity Function (Regression Problem Solving):

Bent identity there is no learning parameter but the function does not just forward the input it applies the curved identity function to it which slightly modifies the result:

fc1 = nn.Linear(1, 1)

model = nn.Sequential(OrderedDict([
                       ('lin', fc1),
                       ('BentID',BentID())
                        ]))

model = model.float()
fc1.weight.data.fill_(-0.01)
fc1.bias.data.fill_(8)

In our case after 100 training sessions the network found the following weights for the linear function upstream of bent identity the final loss is 0.78 so less good than with the linear function (the network converges less quickly):

In our case, the function applied by the network will therefore be:

-0.0481 ((math.sqt(x**2 + 1) -1)/2 + x) + 7.7386

Result obtained with Bent Identity:

hl.build_graph(model, torch.ones(1,dtype=torch.float))

I display the architecture of the Pytorch network which is now linear layer + bent identity layer :

Also :

make_dot(model(torch.ones(1,dtype=torch.float)), params=dict(model.named_parameters()))
Pytorch tutorial deeplearning with python and pytorch mnist tutorial
Soft Exponential – (Regression problem solving)


Soft Exponential has a trainable alpha parameter.We are going to place a linear function upstream and softExponential in output and check the approximation performed in this case:

fc1 = nn.Linear(1, 1)
fse = SoftExponential(1,torch.tensor([-0.9],dtype=torch.float))
model = nn.Sequential(OrderedDict([
                       ('lin', fc1),
                       ('SoftExponential',fse)
                        ]))

model = model.float()
fc1.weight.data.fill_(-0.01)
fc1.bias.data.fill_(8)

After 100 training sessions the results on the linear layer parameters and the soft exponential layer parameters are as follows:

print("weight" + str(fc1.weight))
print("bias" + str(fc1.bias))
print("fse alpha" + str(fse.alpha))

The result is not really good since the approximation is not done in the right direction,

We therefore notice that we could invert the function in our case to define a Soft Exponential customize activation function with a bias criterion and that is what we will do here.

Custom function – (Regression problem solving)

We declare a custom activation function to PyTorch compared to the original soft exponential I added the beta bias criterion as well as the torch.div inversion (1, …)

class SoftExponential2(nn.Module):
    def __init__(self, in_features, alpha=None,beta=None):
        super(SoftExponential2, self).__init__()
        self.in_features = in_features
        # initialize alpha
        if alpha is None:
            self.alpha = Parameter(torch.tensor(0.0))  # create a tensor out of alpha
        else:
            self.alpha = Parameter(torch.tensor(alpha))  # create a tensor out of alpha
        if beta is None:
            self.beta = Parameter(torch.tensor(0.0))  # create a tensor out of alpha
        else:
            self.beta = Parameter(torch.tensor(beta))  # create a tensor out of alpha

        self.alpha.requiresGrad = True  # set requiresGrad to true!
        self.beta.requiresGrad = True  # set requiresGrad to true!

    def forward(self, x):
        if self.alpha == 0.0:
            return x

        if self.alpha < 0.0:
            return torch.add(torch.div(1,(-torch.log(1 - self.alpha * (x + self.alpha)) / self.alpha)),self.beta)

        if self.alpha > 0.0:
            return torch.add(torch.div(1,((torch.exp(self.alpha * x) - 1) / self.alpha + self.alpha)),self.beta)

We now have 2 parameters that can be trained in this custom function in Pytorch.

By also lowering the learning rate to 0.01 after 100 training sessions and initializing alpha = 0 .1 and beta = 0.7 I arrive at a loss <5

fse = SoftExponential2(1,torch.tensor([0.1],dtype=torch.float),torch.tensor([7.0],dtype=torch.float))

model = nn.Sequential(OrderedDict([
                       ('SoftExponential',fse)
                        ]))

model = model.float()
fc1.weight.data.fill_(-0.01)
fc1.bias.data.fill_(8)
print("fse alpha" + str(fse.alpha))
print("fse beta" + str(fse.beta))

Network architecture applied for my custom soft exponentialt inversion function with addition of a bias criterion.

Categorization problem (predict a class among several classes possible) – single-label classifier with pytorch

You are looking to predict the probability that your entry will match a single class and exit betting multiple possible classes.
For example you want to predict the type of vehicle on an image allowed the classes: car, truck, train

The dimensioning of your output network corresponds to n neurons for n classes. And for each output neurons corresponding to a possible class to be predicted, we will obtain a probability between 0 and 1 which will represent the probability that the input corresponds to this class at the output. So for each class this comes down to solving a binary classification problem in part 1 but in addition to which it must be considered that an input can only correspond to a single output class.

Activation function Categorization problem (predict a class among several possible classes):

The activation function to be used on the final layer is a Softfmax function with n dimensions corresponding to the number of classes to be predicted.
Softmax is a mathematical function that converts a vector of numbers (tensor) into a vector of probabilities, where the probabilities of each value are proportional to the relative scale of each value in the vector.


(Cf Website machinelearningmastery.com)

In other words, for an output tensor it will return a probability for each class by scaling each of them so that their sum is equal to one.

Website pytorch.org

Cost function – Categorization problem (predict a class among several possible classes):

The cost function to be used is close to that used in the case of binary classification but with this notion of probability vector.
Cross Entropy will evaluate the difference between 2 probability distributions. We will therefore use it to compare the predicted value and the true value.

See Website en.wikipedia.org

Based on the tutorial and the data set on this page we have:Website pytorch.org

For info if you get the following error:

ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html

You need to setup Iprogress on jupyter lab

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension

In our example we will first retrieve the input dataset
The dataset used is CIFAR-10, more information here:

Website www.cs.toronto.edu

This is already integrated with Pytorch to allow us to perform certain tests.

Exemple (problème de catégorisation) :

Preparation of the dataset:

import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

We then define a function which allows to visualize the imshow image and we display for each image the unique label associated with it:

import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

Create a convolutional neural network (more information here: Website towardsdatascience.com)

A convolutional neural network (ConvNet or CNN) is a DeepLearning algorithm that can take an image as input, it sets a score (weight and bias which are learnable parameters) to various aspects / objects of the image and be able to differentiate one from the other.

Source https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png

The architecture of a convolutional neuron network is close to that of the model of neuron connectivity in the human brain and was inspired by the organization of the visual cortex.

Individual neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. A collection of these fields overlap to cover the entire visual area.

In our case we are going to use with Pytorch a Conv2d layer and a pooling layer.

Website pytorch.org : The Conv2d layer is the 2D convolution layer.

Pytorch tutorial deeplearning with python and pytorch mnist tutorial
Source : https://cdn-media-1.freecodecamp.org/images/gb08-2i83P5wPzs3SL-vosNb6Iur5kb5ZH43

Website www.freecodecamp.org

A filter in a conv2D layer has a height and a width. They are often smaller than the input image. This filter therefore moves over the entire image during training (this area is called the receptive field).

The Max Pooling layer is a sampling process. The objective is to sub-sample an input representation (image for example), by reducing its size and by making assumptions on the characteristics contained in the grouped sub-regions.

In my example with PyTorch the declaration is made :

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

Define the criterion of the cost function with CrossEntropyLoss, here we use the SGD opimizer rather than adam (more info here on the comparison between optimizer Website ruder.io)

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

The training loop:

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

The Pytorch neuron network is saved

PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

We load a test and use the network to predict the outcome.

dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
Pytorch tutorial deeplearning with python and pytorch mnist tutorial

In this example the « true » labels are « Cat, ship, ship, plane », we then launch the network to make a prediction:

net = Net()
net.load_state_dict(torch.load(PATH))
outputs = net(images)
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

Here is an overview of the architecture of this convolution network.

import hiddenlayer as hl
from torchviz import make_dot, make_dot_from_trace
make_dot(net(images), params=dict(net.named_parameters()))
Pytorch tutorial deeplearning with python and pytorch mnist tutorial

Categorization problem (predict several class among several classes possible) – multiple-label classifier with pytorch – Pytorch tutorial

Overall, it is about predicting several probabilities for each of the classes to indicate their probabilities of presence in the entry. One possible use is to indicate the presence of an object in an image.

Pytorch tutorial deeplearning with python and pytorch mnist tutorial

The problem then comes back to a problem of binary classification for n classes.

The final activation function is sigmoid and the loss function is Binary cross entropy

This internet example perfectly illustrates the use of BCELoss in the case of the prediction of several classes among several possible classes.
Website medium.com

In this example: The image dataset used is the CelebFaces Large Scale Attribute Dataset (CelebA).
Website mmlab.ie.cuhk.edu.hk

In this data dataset there are 200K images with 40 different class labels and each image has a different background footprint and there are a lot of different variations making it difficult for a model to classify each label effectively class.

I suggest you follow the tutorial from the articleWebsite medium.com:

  • Step1: Download the file img_align_celeba.zip hard site Website mmlab.ie.cuhk.edu.hk (it is in a google drive)
  • Step2: Download the list_attr_celeba.txt file which contains the annotations for each image on the same site Website mmlab.ie.cuhk.edu.hk
  • Step3: Opening the file annotations you can see the 40 labels: 5_o_Clock_Shadow Arched_Eyebrows Attractive Bags_Under_Eyes Bald Bangs Big_Lips Big_Nose Black_Hair Blond_Hair Blurry Brown_Hair Bushy_Eyebrows Chubby Double_Chin Eyeglasses Goatee Gray_Hair Heavy_Makeup High_Cheekbones Male Mouth_Slightly_Open Mustache Narrow_Eyes No_Beard Oval_Face Pale_Skin Pointy_Nose Receding_Hairline Rosy_Cheeks Sideburns Smiling Straight_Hair Wavy_Hair Wearing_Earrings Wearing_Hat Wearing_Lipstick Wearing_Necklace Wearing_Necktie Young
    For each of the images present in the test set there is a value of 1 or -1 specifying for image whether the class is present in the image.
    The objective here will be to predict for each of the classes the probability of its presence in the image.
  • Step4: you can load the notebook which is present on Website github.com

Here is the results:

Load of the dataset :


Use of the imshow function (see previous example for single label prediction)

Pytorch tutorial deeplearning with python and pytorch mnist tutorial

Architecture of the convolutional neuron network:

import torch.nn.functional as F
class MultiClassifier(nn.Module):
    def __init__(self):
        super(MultiClassifier, self).__init__()
        self.ConvLayer1 = nn.Sequential(
            nn.Conv2d(3, 64, 3), # 3, 256, 256
            nn.MaxPool2d(2), # op: 16, 127, 127
            nn.ReLU(), # op: 64, 127, 127
        )
        self.ConvLayer2 = nn.Sequential(
            nn.Conv2d(64, 128, 3), # 64, 127, 127   
            nn.MaxPool2d(2), #op: 128, 63, 63
            nn.ReLU() # op: 128, 63, 63
        )
        self.ConvLayer3 = nn.Sequential(
            nn.Conv2d(128, 256, 3), # 128, 63, 63
            nn.MaxPool2d(2), #op: 256, 30, 30
            nn.ReLU() #op: 256, 30, 30
        )
        self.ConvLayer4 = nn.Sequential(
            nn.Conv2d(256, 512, 3), # 256, 30, 30
            nn.MaxPool2d(2), #op: 512, 14, 14
            nn.ReLU(), #op: 512, 14, 14
            nn.Dropout(0.2)
        )
        self.Linear1 = nn.Linear(512 * 14 * 14, 1024)
        self.Linear2 = nn.Linear(1024, 256)
        self.Linear3 = nn.Linear(256, 40)
        
        
    def forward(self, x):
        x = self.ConvLayer1(x)
        x = self.ConvLayer2(x)
        x = self.ConvLayer3(x)
        x = self.ConvLayer4(x)
        x = x.view(x.size(0), -1)
        x = self.Linear1(x)
        x = self.Linear2(x)
        x = self.Linear3(x)
        return F.sigmoid(x)

An overview of the network architecture:

Pytorch tutorial deeplearning with python and pytorch mnist tutorial

SUMMARY – Pytorch tutorial :

Here is a summary of my pytorch tutorial : sheet that I created to allow you to choose the right activation function and the right cost function more quickly according to your problem to be solved.

Pytorch tutorial deeplearning with python and pytorch mnist tutorial

Pytorch tutorial : THE 5 BEST DEEP LEARNING LINKS

Website fr.wikipedia.org : Dree courses Yann LeCun

Website pytorch.org : Official web site PyToch

Website machinelearningmastery.com : Advanced on deep learning

Website www.fun-mooc.fr : Free MOOC

Website dataanalyticspost.com

Pytorch tutorial – internal links

https://128mots.com/index.php/en/category/non-classe-en/

https://128mots.com/index.php/category/python/

https://128mots.com/index.php/2020/10/09/bcewithlogitsloss-pytorch/
Retour en haut