Naive Bayes

Naive Bayes is one of those algorithms that has stood the test of time for good reason. Despite its simplicity, it's widely used in production systems, performs surprisingly well across a range of problems, and scales to millions of records without breaking a sweat. It belongs to the supervised learning family and is fundamentally a classifier: its job is to look at a set of input features and predict which category the input belongs to.

You might come across it referred to as "idiot Bayes" as well as Naive Bayes. The name isn't a slight: it's a nod to the algorithm's core assumption, which is deliberately simple to the point of being naive: it assumes that every feature in the dataset is conditionally independent of every other feature. In reality, this is almost never true. Features in real-world data are rarely fully independent of one another. And yet, despite this theoretically flawed assumption, the algorithm performs remarkably well in practice: often competitive with far more complex approaches.

Real-world applications include text classification, spam filtering, and recommendation systems, among others.

Bayes' Theorem

Bayes' theorem is named after Reverend Thomas Bayes (1701–1761), and it sits at the heart of a surprisingly large number of modern machine learning algorithms. The core idea is conditional probability: the probability that something is true given that something else has already happened. Rather than asking "what is the probability of X?", Bayes' theorem lets us ask the more useful question: "what is the probability of X, given what we already know?"

The formula is:

P(T∣E)=P(E∣T)×P(T)P(E)P(T∣E)=P(E)P(E∣T)×P(T)

Where:

P(T) is the probability that the thesis is true, independent of any evidence: also known as the prior probability
P(E) is the overall probability of observing the evidence
P(E|T) is the probability of observing the evidence given that the thesis is true
P(T|E) is what we're actually trying to calculate: the probability that the thesis is true given that the evidence exists, also known as the posterior probability

Naive Bayes in Action: Sentiment Classification

The best way to understand how Naive Bayes works is to walk through a concrete example. Let's tackle a classic text classification problem: given a user comment, predict whether it's positive or negative.

To do that, we need to give the algorithm some prior knowledge to learn from: a labeled dataset of comments where we already know the correct classification. That's our training data.

The dataset we'll be working with contains two columns: the user comment itself, and a class label where 1 means positive and 0 means negative. You can download the dataset from the link above.

At a glance, the structure is simple, but that's exactly the point. Naive Bayes doesn't need complex, richly engineered features to perform well on text. It works directly with the words themselves, calculating the probability that a given word appears in positive comments versus negative ones, and using those probabilities to classify new comments it hasn't seen before.

With the dataset in hand, let's walk through how the algorithm processes it step by step.

Before we get into the code, there's an important caveat worth stating clearly: the dataset we're using here is for illustration purposes only. In a real production system, you'd need significantly more training data to get reliable predictions. The size of your training set directly determines how well the algorithm can estimate word probabilities: too little data and the probabilities become noisy and unrepresentative of the real world. What we have here is enough to understand the mechanics, not enough to ship.

With that said, let's talk about what the algorithm actually needs to compute before it can make any predictions.

The core of Naive Bayes for text classification comes down to word probabilities. For every word in our training data, we need to calculate how likely it is to appear in a positive comment versus a negative one. Put another way, we're building a probability profile for each word based on the type of comments it shows up in.

Once we have those probabilities, classifying a new comment becomes straightforward: we look at each word in the comment, pull its probability for positive and negative classes, multiply them together, and whichever class produces the higher probability wins. That's the prediction.

The calculations we need to make upfront are:

The overall probability of a comment being positive or negative in the training set
For every unique word, the probability of that word appearing in a positive comment
For every unique word, the probability of that word appearing in a negative comment

With those numbers in hand, the algorithm is ready to classify anything we throw at it. Let's start building it.

Let's Write the Code

As we did with KNN, we're building this from scratch. No ML libraries, no black boxes: just the algorithm itself, implemented in a way that maps directly onto the theory we've already covered.

The first thing we need is a way to represent our training data in code. Each entry in our dataset is a user comment paired with a class label, so we'll define a model class that captures exactly that: a Sentence model with two properties: the comment text itself, and its classification (positive or negative).

This model is the basic unit our algorithm will work with throughout. The training phase will consume a list of these, and everything from word frequency counts to probability calculations will be built on top of it.

Let's define it.

1public class SentenceModel
2    {
3        public SentenceModel(string text)
4        {
5            Text = text;
6        }
7
8        public SentenceModel(string text, double category)
9        {
10            Text = text;
11            Category = category;
12        }
13
14        public string Text { get; private set; }
15
16        public double Category { get; private set; }
17    }

With the model defined, the next step is getting our dataset into memory. We'll write a simple helper function that reads the training file line by line, parses each entry into a SentenceModel instance, and loads everything into a list.

In a production system this would typically pull from a database or an API, but for our purposes a file-based loader is clean and straightforward: it keeps the focus on the algorithm rather than the data access layer.

Once the loader is in place, we define our list of SentenceModel objects and populate it by calling the function. From this point on, that list is our training dataset: everything the algorithm knows about positive and negative comments lives in there, ready to be processed into probabilities.

Let's write it.

1/// 
2
3        ///     Loads data from the file
4        /// 
5
6        /// Filename
7        /// Name of the sheet
8        /// Columns which needs to fetch,if it is null all columns will be fetched 
9        public DataSet LoadData(string fileName, string sheet, string[] columns = null)
10        {
11            if (!File.Exists(fileName))
12                throw new FileNotFoundException();
13
14            var connStr =
15                string.Format(
16                    "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=YES;IMEX=1\";",
17                    fileName);
18
19            //Create dataset
20            var dataSet = new DataSet();
21            dataSet.Tables.Add(new DataTable(sheet));
22            var table = dataSet.Tables[0];
23
24            if (columns != null)
25                for (var i = 0; i <= columns.Length - 1; i++)
26                    table.Columns.Add(columns[i]);
27
28            var items = new List();
29            using (var conn = new OleDbConnection(connStr))
30            {
31                conn.Open();
32                var sql = string.Format(@"SELECT * FROM [{0}$]", sheet);
33                var cmd = new OleDbCommand(sql, conn);
34                var reader = cmd.ExecuteReader();
35
36                if (columns == null)
37                    for (var i = 0; i <= reader.FieldCount - 1; i++)
38                        table.Columns.Add(reader.GetName(i));
39
40                while (reader != null && reader.Read())
41                    if (columns != null)
42                    {
43                        items.Clear();
44                        for (var d = 0; d <= table.Columns.Count - 1; d++)
45                            items.Add(reader[d]);
46
47                        var dataRow = table.NewRow();
48                        dataRow.ItemArray = items.ToArray();
49                        table.Rows.Add(dataRow);
50                    }
51                    else
52                    {
53                        items.Clear();
54                        for (var d = 0; d <= reader.FieldCount - 1; d++)
55                            items.Add(reader[d]);
56
57                        var dataRow = table.NewRow();
58                        dataRow.ItemArray = items.ToArray();
59                        table.Rows.Add(dataRow);
60                    }
61
62                return dataSet;
63            }
64        }

Straightforward enough: the function reads through the file row by row, splits each line into its comment and class label, maps them onto a SentenceModel, and returns the full list. Nothing fancy, just clean data loading.

With that in place, we can now call the function and pipe the result directly into our SentenceModel list. One line to load the file, and our entire training dataset is in memory, structured, and ready to feed into the algorithm.

Next up, we start doing the actual work: calculating the word probabilities that Naive Bayes will use to make its predictions.

1  //Loads dataset from the dataset file
2            var data = LoadData("Filename of the dataset file", "Book1", new[] { "Text", "Category" });
3            //Define list model for stencemodel
4            var sentences = new List();
5            //Insert all comments to the list
6            foreach (DataRow row in data.Tables[0].Rows)
7            {
8                var sentences = new SentenceModel(row["Text"].ToString(), Convert.ToDouble(row["Category"].ToString()));
9                sentences.Add(sentences);
10            }

Before we can feed any data into the algorithm, two preprocessing steps need to happen first.

The first is filtering. Not every word in a comment carries meaningful signal for classification. Stop words, common words like "the", "he", "she", "to", "and", appear constantly across both positive and negative comments and tell us nothing about sentiment. Including them would only add noise to our probability calculations, so we strip them out before doing anything else.

The second is building a Bag of Words. In most text classification contexts, a bag of words tracks how frequently each word appears. Here we're using it slightly differently: we don't care about frequency, we care about identity. Our algorithm works with numeric data, not strings, so we need a way to assign a unique numeric ID to every distinct word in the training set. The bag of words gives us that mapping: each unique word gets an ID, and from that point on the algorithm works with numbers rather than text.

To support this, we define a simple WordBagItem model with two fields:

Word: the string value of the word
Id: its unique numeric identifier

Clean and minimal: exactly what we need. Let's write it.

1   public class WordBagItem
2    {
3        public WordBagItem(string word, int id)
4        {
5            Word = word;
6            Id = id;
7        }
8
9        public string Word { get; private set; }
10
11        public int Id { get; private set; }
12    }
13

With the WordBagItem model defined, we can now write the two functions that handle our preprocessing pipeline.

The first is a filtering function, or more cleanly implemented as an extension method, that takes a list of words and strips out all the stop words. The stop word list doesn't need to be exhaustive, but it should cover the most common ones that carry no sentiment signal: articles, pronouns, prepositions, conjunctions and similar. Anything that would appear with roughly equal frequency in positive and negative comments is noise, and we want it gone before the algorithm sees the data.

The second is the bag of words builder. This function walks through every comment in the training set, tokenizes each one into individual words, runs them through the filter, and builds up a deduplicated list of WordBagItem objects: one entry per unique word, each assigned a numeric ID. By the end of this function, every meaningful word in our training data has a stable numeric identity that the algorithm can work with.

Together, these two functions form the complete preprocessing pipeline. Raw comment text goes in, a clean numeric word map comes out.

Let's write them.

1 private void PrepareWordBag()
2        {
3            for (var i = 0; i < sentences.Count(); i++)
4            {
5                var words = (sentences[i].Text.ExtractFeatures().FilterFeatures());
6                foreach (var t in
7                    from t in words
8                    let wrd = wordBag.FirstOrDefault(x => x.Word.ToLower() == t)
9                    where wrd == null
10                    select t)
11                    wordBag.Add(new WordBagItem(t.Trim().ToLower(), wordBag.Count + 1));
12            }
13        }

The extension method sits on top of our string and list types, keeping the preprocessing logic clean and reusable throughout the codebase.

The first extension handles tokenization: splitting a raw comment string into individual words. This means stripping punctuation, handling whitespace, and returning a clean list of word tokens that we can work with programmatically.

The second extension handles filtering: taking that list of tokens and removing anything that appears in our stop word list. Rather than hardcoding the filtering logic everywhere it's needed, wrapping it as an extension method means we can call it in a single line anywhere in the pipeline, and swap out or expand the stop word list in one place if we ever need to.

Chaining these two extensions together gives us a clean one-liner that goes from a raw comment string to a filtered list of meaningful word tokens: ready to be mapped against the bag of words and fed into the algorithm.

Let's write them.

1    public static class Helper
2    {
3        public static IEnumerable ExtractFeatures(this string text)
4        {
5            return Regex.Replace(text, "\\p{P}+", "").Split(' ').ToList();
6        }
7
8        public static IEnumerable FilterFeatures(this IEnumerable list)
9        {
10            var filters = new[]
11            {
12                ".", ",", "!", "?", ":", ";", "_", "+", "/", @"\", "*", "the", "of", "on", "is", "a", "...", "1", "2", "3",
13                "4", "5", "6", "7", "8", "9", "0", " ", "for", "an", "", "it", "she", "he", "they", "we", "them", "our",
14                "his", "her"
15            };
16
17            return list.Where(x => !filters.Contains(x)).ToList();
18        }
19    }

Before we can feed the training data into the algorithm, we need one more model to act as the container that carries everything together.

This class serves as the structured package we pass to the algorithm: it bundles the processed comments, their class labels, and the bag of words into a single object. Rather than passing multiple separate lists and parameters into the algorithm independently, this model keeps everything organized and makes the interface of our Naive Bayes class clean and straightforward.

Let's define it.

1  public class NaiveBayesTextModel
2    {
3        [AiLabel]
4        public double Label { get; set; }
5
6        [AiField]
7        public double WordId { get; set; }
8
9        public string Word { get; set; }
10    }

With the model defined and the attributes in place, the picture is now complete. AiLabel tells the algorithm which field carries the class label: in our case, whether a comment is positive or negative. AiField marks the fields that represent the actual features the algorithm should learn from. Everything else gets ignored.

This attribute-based approach keeps the model clean and the algorithm flexible. Rather than hardcoding which fields to use inside the algorithm itself, the model declares its own structure: making it straightforward to swap in a different dataset or add new features without touching the core algorithm logic.

With the dataset model defined, the preprocessing pipeline in place, and the bag of words ready to go, we have everything the algorithm needs. It's time to feed it and start making predictions.

1      var dataSet = new List();
2            foreach (var t in sentences)
3            {
4                var words = t.Text.ExtractFeatures();
5
6                foreach (var t1 in words)
7                {
8                    var naiveBayesTextModel = new NaiveBayesTextModel();
9                    var firstOrDefault = wordBag.FirstOrDefault(x => x.Word == t1.ToLower());
10                    if (firstOrDefault == null) continue;
11                    naiveBayesTextModel.WordId = firstOrDefault.Id;
12                    naiveBayesTextModel.Label = t.Category;
13                    naiveBayesTextModel.Word = t1.ToLower();
14                    dataSet.Add(naiveBayesTextModel);
15                }
16            }

Naive Bayes Multinomial

Naive Bayes comes in several variants, and the right one depends on the nature of your data. For text and document classification, the Multinomial technique is the go-to choice: it's well-established, efficient, and consistently performs well on exactly the kind of problem we're solving here.

The Multinomial technique works by calculating a small set of key probability values from the training data upfront, then combining them at prediction time to determine which class a new comment most likely belongs to.

The two key calculations are:

Probability of each label:

P(Positive)=Total Positive WordsTotal WordsP(Positive)=Total WordsTotal Positive WordsP(Negative)=Total Negative WordsTotal WordsP(Negative)=Total WordsTotal Negative Words

Probability of a word given each label:

P(Word∣Positive)=Number of occurrences of the word in Positive comments+1Total words in Positive comments+Total unique words across all comments
P(Word∣Positive)=Total words in Positive comments+Total unique words across all commentsNumber of occurrences of the word in Positive comments+1

The +1 in the numerator is worth calling out: this is Laplace smoothing, a technique used to handle words that appear in the test data but never appeared in the training data for a given class. Without it, a single unseen word would zero out the entire probability calculation, which would make the algorithm brittle and unreliable on real-world data.

In our implementation, we work with word IDs rather than the word strings themselves: so wherever you see "Word" in the formula, think of it as the numeric ID we generated for that word in the bag of words step.

With the theory clear, we're ready to build the implementation. As with KNN, the code below is taken from my open source Ellipses library. The Converter class that handles data transformation is available on the Ellipses project page: I've left it out here to keep the focus on the algorithm itself.

We start by defining the interface for our Naive Bayes class. The design keeps the technique interchangeable: Multinomial is what we're using today, but the interface is structured so that other techniques can be plugged in later without changing the core class.

Let's write it.

1namespace Ellipses.Interfaces
2{
3    public interface INaiveBayes
4    {
5        /// 
6        ///     Load data set
7        /// 
8        /// Data set
9        /// Normalize data
10        void LoadDataSet(T[] models, bool normalization = false);
11
12        /// 
13        ///     Trains model for prediction
14        /// 
15        INaiveBayesPredicter Fit();
16    }
17}
18

1/* ========================================================================
2 * Ellipses Machine Learning Library 1.0
3 * https://www.ellipsesai.com
4 * ========================================================================
5 * 
6 * Copyright Ali Gulum
7 *
8 * ========================================================================
9 * Licensed under the Creative Commons Attribution-NonCommercial 4.0 International License;
10 * you may not use this file except in compliance with the License.
11 * You may obtain a copy of the License at
12 *
13 *     https://creativecommons.org/licenses/by-nc/4.0
14 *
15 * Unless required by applicable law or agreed to in writing, software
16 * distributed under the License is distributed on an "AS IS" BASIS,
17 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18 * See the License for the specific language governing permissions and
19 * limitations under the License.
20 * ========================================================================
21 */
22
23using System.Collections.Generic;
24using System.Linq;
25using Ellipses.Helpers;
26using Ellipses.Interfaces;
27using Ellipses.Metrics;
28
29namespace Ellipses.Algorithms.Nb
30{
31    public class NaiveBayes : INaiveBayes
32    {
33        //Trainer
34        private readonly INaiveBayesAlgorithm _algorithm;
35
36        //Helper class for converting models
37        private readonly IConverter _converter;
38
39        //Normalizer
40        private readonly INormalizer _normalizer;
41
42        //Data set as double array list 
43        private double[][] _dataSet;
44
45        //Data set as matrix
46        private Matrix _matrix;
47
48        //Labels as matrix
49        private Matrix _matrixLabel;
50
51        /// 
52        ///     Naive Bayes
53        /// 
54        /// Algorithm for the naive bayes algorithm
55        /// Normalizer
56        /// Converter for the models
57        public NaiveBayes(INaiveBayesAlgorithm algorithm = null, INormalizer normalizer = null,
58            IConverter converter = null)
59        {
60            _algorithm = algorithm ?? new NaiveBayesBinaryAlgorithm();
61            _normalizer = normalizer ?? new Normalizer();
62            _converter = converter ?? new Converter();
63        }
64
65        /// 
66        ///     Load data set
67        /// 
68        /// Data set
69        /// Normalize data
70        public void LoadDataSet(T[] models, bool normalization = false)
71        {
72            var isDimensional = _converter.IsDimensionalFieldExist(models);
73            _dataSet = _converter.ConvertModels(models);
74
75            if (normalization)
76                _dataSet = _normalizer.Normalize(_dataSet);
77
78            _matrix = new Matrix(_dataSet);
79            _matrixLabel = _converter.ConvertLabelsToMatrix(models);
80
81            if (isDimensional)
82                PrepareDimensionalData(_converter.GetDimensionalData(models, normalization));
83        }
84
85        /// 
86        ///     Trains model for prediction
87        /// 
88        public INaiveBayesPredicter Fit()
89        {
90            //Calculate probabilities of the labels
91            _algorithm.ComputeProbabilityOfLabels(_matrixLabel);
92
93            //Calculate conditional probabilities of the features
94            _algorithm.ComputeProbabilityOfFeatures(_matrix, _matrixLabel);
95
96            //Prepeare and return trained model for prediction
97            var naiveBayesPredicter = _algorithm.GetPredictor();
98            return naiveBayesPredicter;
99        }
100
101        #region Helpers
102
103        /// 
104        ///     Prepares dimensional data
105        /// 
106        /// Matrix of dimensional data
107        private void PrepareDimensionalData(Matrix dimensionalMatrix)
108        {
109            var matrixValues = new List();
110            var dimensionalMatrixValues = new List();
111            var newMatrix = new List();
112            var newLabelList = new List();
113
114            for (var dRow = 0; dRow < dimensionalMatrix.Rows; dRow++)
115                dimensionalMatrixValues.Add(dimensionalMatrix[dRow].ToArray());
116
117            for (var mRow = 0; mRow < _matrix.Rows; mRow++)
118                matrixValues.Add(_matrix[mRow].ToArray());
119
120            for (var dVal = 0; dVal < dimensionalMatrixValues.Count; dVal++)
121            {
122                newMatrix.Add(matrixValues[dVal].Concat(dimensionalMatrixValues[dVal]).ToArray());
123                newLabelList.Add(_matrixLabel[dVal].ToArray());
124            }
125
126            _matrix = new Matrix(newMatrix.ToArray());
127            _matrixLabel = new Matrix(newLabelList.ToArray());
128        }
129
130        #endregion
131    }
132}

The Naive Bayes class itself stays deliberately thin. It receives the dataset model, hands it off to the Converter to transform it into the dimensional array the algorithm can work with, and then delegates to whichever technique is currently configured: in our case, Multinomial. Once the technique has processed the training data, it generates a predictor object that encapsulates everything needed to classify new comments going forward.

This separation of concerns is intentional. The Naive Bayes class doesn't need to know anything about how Multinomial works internally: it just knows how to pass data in and get a predictor out. If we ever wanted to swap in a different technique, we'd simply swap the implementation without touching anything else.

Now let's define the Multinomial technique itself. This is where the actual probability calculations happen: the label probabilities, the per-word conditional probabilities, and the Laplace smoothing we covered earlier. Everything the predictor needs to make accurate classifications gets computed here during the training phase.

Let's write it.

1using Ellipses.Metrics;
2
3namespace Ellipses.Interfaces
4{
5    public interface INaiveBayesAlgorithm
6    {
7        /// 
8        ///     Computes probability of labels
9        /// 
10        /// Referance of Label Matrix
11        void ComputeProbabilityOfLabels(Matrix labelMatrix);
12
13        /// 
14        ///     Computes probability of features
15        /// 
16        /// Matrix of data
17        /// Referance of Label Matrix
18        void ComputeProbabilityOfFeatures(Matrix matrix, Matrix labelMatrix);
19
20        /// 
21        ///     Returns predictor for the algorithm
22        /// 
23        INaiveBayesPredicter GetPredictor();
24    }
25}

1/* ========================================================================
2 * Ellipses Machine Learning Library 1.0
3 * https://www.ellipsesai.com
4 * ========================================================================
5 * 
6 * Copyright Ali Gulum
7 *
8 * ========================================================================
9 * Licensed under the Creative Commons Attribution-NonCommercial 4.0 International License;
10 * you may not use this file except in compliance with the License.
11 * You may obtain a copy of the License at
12 *
13 *     https://creativecommons.org/licenses/by-nc/4.0
14 *
15 * Unless required by applicable law or agreed to in writing, software
16 * distributed under the License is distributed on an "AS IS" BASIS,
17 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18 * See the License for the specific language governing permissions and
19 * limitations under the License.
20 * ========================================================================
21 */
22
23using System;
24using System.Collections.Concurrent;
25using System.Linq;
26using System.Threading.Tasks;
27using Ellipses.Helpers;
28using Ellipses.Interfaces;
29using Ellipses.Metrics;
30using Ellipses.Models;
31
32namespace Ellipses.Algorithms.Nb
33{
34    public class NaiveBayesMultinomialAlgorithm : INaiveBayesAlgorithm
35    {
36        private const double TOLERANCE = 0.1;
37
38        //Helper for the various operations
39        private readonly IHelper _helper;
40
41        //Probabilities of the features
42        private ConcurrentBag _featureProbabilities;
43
44        //Matrix of labels
45        private Matrix _labelMatrix;
46
47        //Probabilities of the labels
48        private ConcurrentDictionary _labelProbabilities;
49
50        public NaiveBayesMultinomialAlgorithm()
51        {
52            _helper = new Helper();
53        }
54
55        /// 
56        ///     Computes probability of labels
57        /// 
58        /// Referance of Label Matrix
59        public void ComputeProbabilityOfLabels(Matrix labelMatrix)
60        {
61            var labelProbabilities = new ConcurrentDictionary();
62            var labelList = labelMatrix.GetRows().Select(x => x[0]).Distinct().ToList();
63            for (var i = 0; i < labelList.Count(); i++)
64            {
65                var lbl = labelList[i];
66                var tLbl = labelMatrix.GetRows().Where(x => Math.Abs(x[0] - lbl) < TOLERANCE).ToList().Count();
67                var pLabel = (double) tLbl/labelMatrix.Rows;
68                labelProbabilities.TryAdd(lbl, pLabel);
69            }
70            _labelProbabilities = labelProbabilities;
71        }
72
73        /// 
74        ///     Computes probability of features
75        /// 
76        /// Matrix of data
77        /// Referance of Label Matrix
78        public void ComputeProbabilityOfFeatures(Matrix matrix, Matrix labelMatrix)
79        {
80            var featureProbabilities = new ConcurrentBag();
81
82            //Connect matrix
83            var connectedMatrix = matrix.ConnectMatrix(labelMatrix);
84            var labelList = labelMatrix.GetRows().Select(x => x[0]).Distinct().ToList();
85            var featureLength = connectedMatrix.Cols;
86            var matrixRows = connectedMatrix.GetRows();
87
88            var vectors = matrixRows as Vector[] ?? matrixRows.ToArray();
89            Parallel.For(0, labelList.Count, l =>
90            {
91                Parallel.For(0, connectedMatrix.Rows, i =>
92                {
93                    var fTotalAccordingToLabel =
94                        vectors.Count(x => Math.Abs(x[featureLength - 1] - labelList[l]) < TOLERANCE);
95                    var lUnique = connectedMatrix[i, featureLength - 1];
96                    Parallel.For(0, matrix.Cols, j =>
97                    {
98                        var f = connectedMatrix[i, j];
99                        var fTotal =
100                            vectors.Count(
101                                x =>
102                                    Math.Abs(x[featureLength - 1] - labelList[l]) < TOLERANCE &&
103                                    Math.Abs(x[j] - f) < TOLERANCE);
104                        var fUniqueTotal = matrix.Select(x => x[j]).Distinct().ToList().Count;
105                        var fProbability = ((double) fTotal + 1)/((double) fTotalAccordingToLabel + fUniqueTotal);
106
107                        var probability = new NaiveBayesProbability
108                        {
109                            FeatureIndex = j,
110                            FeatureProbability = Math.Abs(fProbability),
111                            Feature = f,
112                            FeatureTotal = fTotal,
113                            Label = labelList[l],
114                            LabelUnique = lUnique
115                        };
116                        featureProbabilities.Add(probability);
117                    });
118                });
119            });
120            _labelMatrix = labelMatrix;
121            _featureProbabilities = featureProbabilities;
122        }
123
124        /// 
125        ///     Returns predictor for the algorithm
126        /// 
127        public INaiveBayesPredicter GetPredictor()
128        {
129            return new NaiveBayesMultinomialPredicter(_labelProbabilities, _featureProbabilities, _labelMatrix);
130        }
131
132    }
133}

The predictor is the end product of the training phase: it's the object that gets handed back once all the probability calculations are done, and it's what we actually use to classify new comments.

We start with the interface, which keeps things clean and consistent. The predictor exposes two methods: Predict, which takes a comment and returns the single most likely class label, and PredictWithProbabilities, which returns the full probability breakdown across all labels: useful when you want to know not just what the algorithm decided, but how confident it was.

The implementation receives three things in its constructor: the label probabilities, the feature probabilities, and the label matrix. These are everything computed during the training phase, passed in and stored privately so the predictor can reference them on every classification call.

At prediction time, the model is first converted into a numeric array using the Converter class. If the model contains dimensional features, those get expanded and merged with the base feature array before any probability calculations begin.

From there, the algorithm iterates over every label in parallel using Parallel.For. For each label, it walks through every feature in the converted model and looks up its pre-computed probability. If the feature exists in the training data for that label, its probability gets multiplied directly into a running total called totalDot. If the feature wasn't seen during training for that label, which will happen with real-world data, Laplace smoothing kicks in: the algorithm calculates a fallback probability using (0 + 1) / (featureCountForLabel + totalUniqueFeatures), ensuring that a single unseen word never zeros out the entire prediction.

Once all features have been processed, the final probability for that label is computed by multiplying totalDot by the label's own prior probability. Whichever label ends up with the highest value is the prediction.

Let's write the interface and the class.

1using System.Collections.Generic;
2using Ellipses.Models;
3
4namespace Ellipses.Interfaces
5{
6    public interface INaiveBayesPredicter
7    {
8        /// 
9        ///     Predicts according to model
10        /// 
11        /// Model for prediction label
12        double Predict(T model);
13
14        /// 
15        ///     Predicts according to model and return all probabilities for all labels
16        /// 
17        /// Model for prediction label
18        List PredictWithProbabilities(T model);
19    }
20}

1/* ========================================================================
2 * Ellipses Machine Learning Library 1.0
3 * https://www.ellipsesai.com
4 * ========================================================================
5 * 
6 * Copyright Ali Gulum
7 *
8 * ========================================================================
9 * Licensed under the Creative Commons Attribution-NonCommercial 4.0 International License;
10 * you may not use this file except in compliance with the License.
11 * You may obtain a copy of the License at
12 *
13 *     https://creativecommons.org/licenses/by-nc/4.0
14 *
15 * Unless required by applicable law or agreed to in writing, software
16 * distributed under the License is distributed on an "AS IS" BASIS,
17 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18 * See the License for the specific language governing permissions and
19 * limitations under the License.
20 * ========================================================================
21 */
22
23using System;
24using System.Collections.Concurrent;
25using System.Collections.Generic;
26using System.Linq;
27using System.Threading.Tasks;
28using Ellipses.Helpers;
29using Ellipses.Interfaces;
30using Ellipses.Metrics;
31using Ellipses.Models;
32
33namespace Ellipses.Algorithms.Nb
34{
35    internal class NaiveBayesMultinomialPredicter : INaiveBayesPredicter
36    {
37        private const double TOLERANCE = 0.1;
38
39        //Helper class for converting models
40        private readonly IConverter _converter;
41
42        //Probabilities of the features
43        private readonly ConcurrentBag _featureProbabilities;
44
45        //Probabilities of the labels
46        private readonly ConcurrentDictionary _labelProbabilities;
47
48        //Labels
49        private readonly List _lbls;
50
51        /// 
52        ///     Naive Bayes Predicter
53        /// 
54        /// Probability set for the labels
55        /// Probability set for the features
56        /// Matrix of label
57        public NaiveBayesMultinomialPredicter(ConcurrentDictionary labelProbabilities,
58            ConcurrentBag featureProbabilities, Matrix labelMatrix)
59        {
60            _converter = new Converter();
61            _labelProbabilities = labelProbabilities;
62            _featureProbabilities = featureProbabilities;
63            _lbls = labelMatrix.GetRows().Select(x => x[0]).Distinct().ToList();
64        }
65
66        /// 
67        ///     Predicts according to model
68        /// 
69        /// Model for prediction label
70        public double Predict(T model)
71        {
72            return GetPrediction(model).Aggregate((l, r) => l.Value > r.Value ? l : r).Key;
73        }
74
75        /// 
76        ///     Predicts according to model and return all probabilities for all labels
77        /// 
78        /// Model for prediction label
79        public List PredictWithProbabilities(T model)
80        {
81            return
82                GetPrediction(model).Select(v => new NaiveBayesResult {Label = v.Key, Probability = v.Value}).ToList();
83        }
84
85        /// 
86        ///     Predicts according to model and return all probabilities for all labels
87        /// 
88        /// Model for prediction label
89        private ConcurrentDictionary GetPrediction(T model)
90        {
91            var probabilities = new ConcurrentDictionary();
92            var modelConverted = _converter.ConvertModel(model);
93            var dimensionalProcess = _converter.IsDimensional(model);
94
95            if (dimensionalProcess)
96            {
97                var newList = new List();
98                var dimensionalFeatures = _converter.ConvertDimensionalModel(model);
99
100                var orjModelValues = modelConverted.Select(t => t.ToArray()).ToList();
101
102                for (var dimensionalRow = 0; dimensionalRow < dimensionalFeatures.Length; dimensionalRow++)
103                    newList.AddRange(orjModelValues.Select(t => t.Concat(dimensionalFeatures[dimensionalRow]).ToArray()));
104
105                modelConverted = newList.ToArray();
106            }
107            Parallel.For(0, _labelProbabilities.Count, l =>
108            {
109                var label = _labelProbabilities.Keys.ElementAt(l);
110                var fTotalAccordingToLabel =
111                    _featureProbabilities.Where(x => Math.Abs(x.LabelUnique - label) < TOLERANCE).Distinct().Count()/
112                    _labelProbabilities.Count;
113
114                var totalDot = 1.0;
115
116                Parallel.For(0, modelConverted[0].Count(), i =>
117                {
118                    var f = modelConverted[0][i];
119                    var fVal =
120                        _featureProbabilities.FirstOrDefault(
121                            x => Math.Abs(x.Feature - f) < TOLERANCE && Math.Abs(x.Label - label) < TOLERANCE);
122                    if (fVal != null)
123                    {
124                        totalDot *= fVal.FeatureProbability;
125                    }
126                    else
127                    {
128                        const int fTotal = 0;
129                        var fUniqueTotal = _featureProbabilities.Select(x => x.Feature).Distinct().ToList().Count;
130                        var fProbability = ((double) fTotal + 1)/((double) fTotalAccordingToLabel + fUniqueTotal);
131                        totalDot *= fProbability;
132                    }
133                });
134
135                var probability = _labelProbabilities[label]*totalDot;
136                probabilities.TryAdd(label, probability);
137            });
138            return probabilities;
139        }
140    }
141}

With all the pieces in place, we're ready to put everything together and see Naive Bayes with the Multinomial technique running end to end. Let's write the final implementation.

1 var naiveBayes = new NaiveBayes(new NaiveBayesMultinomialAlgorithm());
2     naiveBayes.LoadDataSet(dataSet.ToArray());
3     var predicter = naiveBayes.Fit();

Putting It All Together

To recap what we've built and the order in which everything happens:

Prepare and clean the dataset
Generate unique IDs for each word
Pass the data to the algorithm
The algorithm converts the models into numerical arrays
The algorithm applies the Multinomial probability rules across the data
A predictor is returned, ready to classify new input

Now let's see it in action. We'll ask the algorithm to classify the following sentence:

"It is a really great product, I like it!"

Below you'll find the code for passing a new sentence to the fitted model and getting a prediction back.

1   var sentences = new SentenceModel("It is a really great product, I like it!");
2            var words = (sentences.Text.ExtractFeatures());
3            var values = new List();
4            var naiveBayesTextModel = new NaiveBayesTextTextModel();
5            foreach (var t in words)
6            {
7                var wrd = wordBag.FirstOrDefault(x => x.Word.ToLower() == t);
8                if (wrd == null)
9                    wordBag.Add(new WordBagItem(t.Trim().ToLower(), wordBag.Count + 1));
10
11                var firstOrDefault = wordBag.FirstOrDefault(x => x.Word == t.ToLower());
12                if (firstOrDefault == null) continue;
13
14                values.Add(firstOrDefault.Id);
15            }
16            naiveBayesTextModel.Values = new double[1][];
17            naiveBayesTextModel.Values[0] = values.ToArray();
18            var res = predicter.PredictWithProbabilities(naiveBayesTextModel);

As you can see in the output above, the algorithm returns a Positive label with a higher probability, and since 1 represents positive in our dataset, that's exactly the right answer for "It is a really great product, I like it!"

Let's push it a bit further with a negative example: "It doesn't cost, exactly terrible, I don't recommend it."

This time the result flips: the Negative label carries the higher probability, and the algorithm classifies the comment correctly once again.

Two for two, on a model trained with a minimal dataset and built entirely from scratch.

Before wrapping up, it's worth reiterating: Naive Bayes is a genuinely useful algorithm, and the Multinomial technique in particular is one of the most reliable approaches available for text and document classification. The example we've built here is intentionally simple, but the same foundations scale well. A natural next step would be adding a third label, "Neutral", to the training data. The algorithm handles multiple classes without any structural changes; you'd simply check for the highest probability across all three labels when making a prediction, and the rest of the logic stays exactly the same.