Sentiment Analysis Part 2 (Trainer)

In the previous article, we finished the first part of our example project, now we have SentimentAnalyst class which we can use for training data and making a prediction by passing real data to it.
 
Before starting Trainer Project, I want to talk a bit about the data which we are going to use for training and for testing. This dataset downloaded from kaggle.
 
You can download to dataset from this link.

You can download the source code from here.

 
Below you can see the details of the dataset, it is a .csv format which is possible to use by the ML.Net.
 
There is one point which we need to pay attention, as you can see below, sentiment fields are "String" data, but we need numeric value instead of it, because we are going to make a prediction as "Positive" or "Negative" we need two "Class" as "0" for negative and "1" for positive, that's why I updated all "negative" word as "0", and all "positive" words as "1", the dataset with the new values is ready to use.


 
Now after we talked about the dataset which we are going to use to train our ML.Net, we can start to talk about trainer project.
 
The Trainer project is the second part of our example project, which is very critical, all the predictions which our example project is going to make depend on the quality of the training data on this stage.
 
Let's start by adding a new project our solution, we are going to add Console App to our solution. Please select "Console App" and then click "Next"


And we are naming it as "Trainer" and click "Create" to create the project.


After our project created and added to our solution, we are going to create a "Data" folder in the "Trainer" project, and we are going to copy our dataset which we are going to use for training and testing ML.Net, project should looks below.


 
Please click the IMDBDataset.csv and the on the "Properties" window, change the "Copy to Output Directory" settings to "Copy if newer", this part is important, otherwise, the trainer is not going to find dataset to training ML.Net.


 
Now, we can start coding, as we already talked, this project is going to handle the "Training" process, actually we already have the class which is called SentimentAnalyst, and it is going to do all the process about training, this trainer project is the layer which is going to pass all necessary parameters to the SentimentAnalyst and then displays training result which will be returned by the SentimentAnalyst.
 
We can start by defining a variable like below, this is going to help us to determine where to output folder should be creating for the training model
 
private static readonly bool _debugMode = true;
So firstly let's create our helper functions as below, "GetParentDirectory" function is the function which we are going to use to understand what path our trainer project is running because we want our trainer to save the model after training automatically, by that our example application "Movie Reviews" can find the trained model easily without manual copying
 
        //Gets solution path
        private static string GetParentDirectory()
        {
            var directoryInfo =  Directory.GetParent(Directory.GetCurrentDirectory()).Parent;
            if (directoryInfo?.Parent?.Parent != null)
                return directoryInfo.Parent.Parent
                    .FullName;
            return string.Empty;
        }

"Train" function is the function which we are going to call for to make ML.Net use our dataset to train the machine learning model for us, by the ML.Net finishes training the model, we will display to result as shown below to understand how our model is accurate

 
        private static void Train(SentimentAnalyst sentimentAnalyst)
        {
            //Starts trainer
            var trainingResult = sentimentAnalyst.Train();
            //Displays results of the training
            Console.WriteLine("===============================================");
            Console.WriteLine("Accuracy:{0}", trainingResult.Accuracy);
            Console.WriteLine("AreaUnderRocCurve:{0}",  trainingResult.AreaUnderRocCurve);
            Console.WriteLine("F1Score:{0}", trainingResult.F1Score);
            Console.WriteLine("===============================================");
        }

"TrainMultiple" function is an optional function which is possible to use all learning model like LbfgsLogisticRegression, SgdCalibrated, SdcaLogisticRegression, AveragedPerceptron, LinearSvm at once and displays results, by using this function we can understand that which learning model is fitting better to our dataset.

 
 private static void TrainMultiple(SentimentAnalyst sentimentAnalyst)
        {
            Console.WriteLine("Multiple Training");
            //Starts trainer
            var trainingResults = sentimentAnalyst.TrainMultiple();
            //Displays results of the training
            Console.WriteLine(
                 "*************************************************************************************************************");
            Console.WriteLine("*       Training Results      ");
            Console.WriteLine(
                 "*------------------------------------------------------------------------------------------------------------");
            foreach (var trainingResult in trainingResults.OrderBy(x =>  x.Accuracy))
            {
                Console.WriteLine($"* Trainer: {trainingResult.Trainer}");
                Console.WriteLine(
                    $"* Accuracy:    {trainingResult.Accuracy:0.###}  - Area Under  Roc Curve: ({trainingResult.AreaUnderRocCurve:#.###})  - F1                                                     Score:  ({trainingResult.F1Score:#.###})");
            }
            Console.WriteLine(
                 "*************************************************************************************************************");
        }

And the below you can find the second optional function "CrossValidation" which is actually very useful to use, this function is checking our model to understand if it is over fit or not, it can be good to use it before moving the production.


  private static void CrossValidation(SentimentAnalyst sentimentAnalyst)
        {
            Console.WriteLine("Cross Validating");
            //Starts Validating
            var validationResult = sentimentAnalyst.CrossValidate();
            Console.WriteLine(
                 "*************************************************************************************************************");
            Console.WriteLine("*       Metrics for Cross Validation     ");
            Console.WriteLine(
                 "*------------------------------------------------------------------------------------------------------------");
            Console.WriteLine($"Trainer: {validationResult.Trainer}");
            Console.WriteLine(
                $"*       Average Accuracy:     {validationResult.AccuracyAverage:0.###}  - Standard deviation:  ({validationResult.AccuraciesStdDeviation:#.###})  -                     Confidence Interval 95%:  ({validationResult.AccuraciesConfidenceInterval95:#.###})");
            Console.WriteLine(
                 "*************************************************************************************************************");
        }
Here you can find the main code, what we do is here we are setting dataset and model filenames and then defining new Sentiment Analyst class by passing our training data filename and the model filename which we want to our training model be saved.
 
After those initializations, we call "Train" function and then displays the training result to understand how the training process was
 
  private static void Main(string[] args)
        {
            var trainingDataFile =  Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Data", "IMDBDataset.csv");
            var modelDataFile = Path.Combine(GetParentDirectory(),
                _debugMode
                    ? $@"Movie Reviews\\Movie Reviews\\bin\\{"Debug"}\\Data"
                    : $@"Movie Reviews\\Movie Reviews\\bin\\{"Release"}\\Data",
                "model.zip");
            
            var sentimentAnalyst = new SentimentAnalyst(trainingDataFile,  modelDataFile);
            Console.WriteLine("Training");
            Train(sentimentAnalyst);
            
            //If you want to see how other models perform
            //TrainMultiple(sentimentAnalyst);
            
            //If you want to validation
            //CrossValidation(sentimentAnalyst);
            
            Console.WriteLine("Competed");
            Console.ReadLine();
        }


The final code of the "Trainer" project can be found below.
 internal class Program
    {
        private static readonly bool _debugMode = true;
        private static void Main(string[] args)
        {
            var trainingDataFile =  Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Data", "IMDBDataset.csv");
            var modelDataFile = Path.Combine(GetParentDirectory(),
                _debugMode
                    ? $@"Movie Reviews\\Movie Reviews\\bin\\{"Debug"}\\Data"
                    : $@"Movie Reviews\\Movie Reviews\\bin\\{"Release"}\\Data",
                "model.zip");
          
            var sentimentAnalyst = new SentimentAnalyst(trainingDataFile,  modelDataFile);
            Console.WriteLine("Training");
            
            Train(sentimentAnalyst);
            //If you want to see how other models perform
            //TrainMultiple(sentimentAnalyst);
            
            //If you want to validation
            //CrossValidation(sentimentAnalyst);
            Console.WriteLine("Competed");
            Console.ReadLine();
        }
        private static void Train(SentimentAnalyst sentimentAnalyst)
        {
            //Starts trainer
            var trainingResult = sentimentAnalyst.Train();

            //Displays results of the training
            Console.WriteLine("===============================================");
            Console.WriteLine("Accuracy:{0}", trainingResult.Accuracy);
            Console.WriteLine("AreaUnderRocCurve:{0}",  trainingResult.AreaUnderRocCurve);
            Console.WriteLine("F1Score:{0}", trainingResult.F1Score);
            Console.WriteLine("===============================================");
        }
        private static void TrainMultiple(SentimentAnalyst sentimentAnalyst)
        {
            Console.WriteLine("Multiple Training");

            //Starts trainer
            var trainingResults = sentimentAnalyst.TrainMultiple();

            //Displays results of the training
            Console.WriteLine(
                 "*************************************************************************************************************");
            Console.WriteLine("*       Training Results      ");
            Console.WriteLine(
                 "*------------------------------------------------------------------------------------------------------------");
            foreach (var trainingResult in trainingResults.OrderBy(x =>  x.Accuracy))
            {
                Console.WriteLine($"* Trainer: {trainingResult.Trainer}");
                Console.WriteLine(
                    $"* Accuracy:    {trainingResult.Accuracy:0.###}  - Area Under  Roc Curve: ({trainingResult.AreaUnderRocCurve:#.###})  - F1 Score:  ({trainingResult.F1Score:#.###})");
            }
            Console.WriteLine(
                 "*************************************************************************************************************");
        }
        private static void CrossValidation(SentimentAnalyst sentimentAnalyst)
        {
            Console.WriteLine("Cross Validating");

            //Starts Validating
            var validationResult = sentimentAnalyst.CrossValidate();
            Console.WriteLine(
                 "*************************************************************************************************************");
            Console.WriteLine("*       Metrics for Cross Validation     ");
            Console.WriteLine(
                 "*------------------------------------------------------------------------------------------------------------");
            Console.WriteLine($"Trainer: {validationResult.Trainer}");
            Console.WriteLine(
                $"*       Average Accuracy:     {validationResult.AccuracyAverage:0.###}  - Standard deviation:  ({validationResult.AccuraciesStdDeviation:#.###})  - Confidence Interval 95%:  ({validationResult.AccuraciesConfidenceInterval95:#.###})");
            Console.WriteLine(
                 "*************************************************************************************************************");
        }
        //Gets solution path
        private static string GetParentDirectory()
        {
            var directoryInfo =  Directory.GetParent(Directory.GetCurrentDirectory()).Parent;
            if (directoryInfo?.Parent?.Parent != null)
                return directoryInfo.Parent.Parent
                    .FullName;
            return string.Empty;
        }
    }


Please continue with the Sentiment Analysis Part 3 (Interface) for "Trainer" project.

Recommended Articles

Check out some recommended articles based on the article which you read

K-Nearest Neighbor
K-Nearest Neighbor

K-Nearest Neighbor

Classification

K-Nearest Neighbor is the one of the well-known and easy machine algorithm ...

K-Nearest Neighbor

K-Nearest Neighbor is the one of the well-known and easy machine algorithm which is very suitable for a lot of real world problems such product recommendation, social media friend recommendation based on interest or social network of person.

Naive Bayes
Naive Bayes

Naive Bayes

Classification

Naive Bayes Algorithm is the algorithm which makes machines to be able to m...

Naive Bayes

Naive Bayes Algorithm is the algorithm which makes machines to be able to make predictions about the events which they don't have any knowledge about only by looking priors knowledge.

Sentiment Analysis Part 1
Sentiment Analysis Part 1

Sentiment Analysis Part 1

Classification

The super strong wind of the Machine Learning is turning our heads incredib...

Sentiment Analysis Part 1

The super strong wind of the Machine Learning is turning our heads incredibly, we see an example of the machine learning usage almost every area of the technology. Face detection, voice recognition, text recognition, etc. there are plenty of them, and each of them has a different type of approach to machine learning.