Ali Gulum

Naive Bayes

Ali Gulum

Naive Bayes

Naive Bayes is one of those algorithms that has stood the test of time for good reason. Despite its simplicity, it's widely used in production systems, performs surprisingly well across a range of problems, and scales to millions of records without breaking a sweat. It belongs to the supervised learning family and is fundamentally a classifier: its job is to look at a set of input features and predict which category the input belongs to.

You might come across it referred to as "idiot Bayes" as well as Naive Bayes. The name isn't a slight: it's a nod to the algorithm's core assumption, which is deliberately simple to the point of being naive: it assumes that every feature in the dataset is conditionally independent of every other feature. In reality, this is almost never true. Features in real-world data are rarely fully independent of one another. And yet, despite this theoretically flawed assumption, the algorithm performs remarkably well in practice: often competitive with far more complex approaches.

Real-world applications include text classification, spam filtering, and recommendation systems, among others.

Bayes' Theorem

Bayes' theorem is named after Reverend Thomas Bayes (1701–1761), and it sits at the heart of a surprisingly large number of modern machine learning algorithms. The core idea is conditional probability: the probability that something is true given that something else has already happened. Rather than asking "what is the probability of X?", Bayes' theorem lets us ask the more useful question: "what is the probability of X, given what we already know?"

The formula is:

P(T∣E)=P(E∣T)×P(T)P(E)P(T∣E)=P(E)P(E∣T)×P(T)​

Where:

  • P(T) is the probability that the thesis is true, independent of any evidence: also known as the prior probability
  • P(E) is the overall probability of observing the evidence
  • P(E|T) is the probability of observing the evidence given that the thesis is true
  • P(T|E) is what we're actually trying to calculate: the probability that the thesis is true given that the evidence exists, also known as the posterior probability

Naive Bayes in Action: Sentiment Classification

The best way to understand how Naive Bayes works is to walk through a concrete example. Let's tackle a classic text classification problem: given a user comment, predict whether it's positive or negative.

To do that, we need to give the algorithm some prior knowledge to learn from: a labeled dataset of comments where we already know the correct classification. That's our training data.

The dataset we'll be working with contains two columns: the user comment itself, and a class label where 1 means positive and 0 means negative. You can download the dataset from the link above.

At a glance, the structure is simple, but that's exactly the point. Naive Bayes doesn't need complex, richly engineered features to perform well on text. It works directly with the words themselves, calculating the probability that a given word appears in positive comments versus negative ones, and using those probabilities to classify new comments it hasn't seen before.

With the dataset in hand, let's walk through how the algorithm processes it step by step.

dataset

Before we get into the code, there's an important caveat worth stating clearly: the dataset we're using here is for illustration purposes only. In a real production system, you'd need significantly more training data to get reliable predictions. The size of your training set directly determines how well the algorithm can estimate word probabilities: too little data and the probabilities become noisy and unrepresentative of the real world. What we have here is enough to understand the mechanics, not enough to ship.

With that said, let's talk about what the algorithm actually needs to compute before it can make any predictions.

The core of Naive Bayes for text classification comes down to word probabilities. For every word in our training data, we need to calculate how likely it is to appear in a positive comment versus a negative one. Put another way, we're building a probability profile for each word based on the type of comments it shows up in.

Once we have those probabilities, classifying a new comment becomes straightforward: we look at each word in the comment, pull its probability for positive and negative classes, multiply them together, and whichever class produces the higher probability wins. That's the prediction.

The calculations we need to make upfront are:

  • The overall probability of a comment being positive or negative in the training set
  • For every unique word, the probability of that word appearing in a positive comment
  • For every unique word, the probability of that word appearing in a negative comment

With those numbers in hand, the algorithm is ready to classify anything we throw at it. Let's start building it.

Let's Write the Code

As we did with KNN, we're building this from scratch. No ML libraries, no black boxes: just the algorithm itself, implemented in a way that maps directly onto the theory we've already covered.

The first thing we need is a way to represent our training data in code. Each entry in our dataset is a user comment paired with a class label, so we'll define a model class that captures exactly that: a Sentence model with two properties: the comment text itself, and its classification (positive or negative).

This model is the basic unit our algorithm will work with throughout. The training phase will consume a list of these, and everything from word frequency counts to probability calculations will be built on top of it.

Let's define it.

1public class SentenceModel
2 {
3 public SentenceModel(string text)
4 {
5 Text = text;
6 }
7
8 public SentenceModel(string text, double category)
9 {
10 Text = text;
11 Category = category;
12 }
13
14 public string Text { get; private set; }
15
16 public double Category { get; private set; }
17 }

With the model defined, the next step is getting our dataset into memory. We'll write a simple helper function that reads the training file line by line, parses each entry into a SentenceModel instance, and loads everything into a list.

In a production system this would typically pull from a database or an API, but for our purposes a file-based loader is clean and straightforward: it keeps the focus on the algorithm rather than the data access layer.

Once the loader is in place, we define our list of SentenceModel objects and populate it by calling the function. From this point on, that list is our training dataset: everything the algorithm knows about positive and negative comments lives in there, ready to be processed into probabilities.

Let's write it.

1///
2
3 /// Loads data from the file
4 ///
5
6 /// Filename
7 /// Name of the sheet
8 /// Columns which needs to fetch,if it is null all columns will be fetched
9 public DataSet LoadData(string fileName, string sheet, string[] columns = null)
10 {
11 if (!File.Exists(fileName))
12 throw new FileNotFoundException();
13
14 var connStr =
15 string.Format(
16 "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=YES;IMEX=1\";",
17 fileName);
18
19 //Create dataset
20 var dataSet = new DataSet();
21 dataSet.Tables.Add(new DataTable(sheet));
22 var table = dataSet.Tables[0];
23
24 if (columns != null)
25 for (var i = 0; i <= columns.Length - 1; i++)
26 table.Columns.Add(columns[i]);
27
28 var items = new List();
29 using (var conn = new OleDbConnection(connStr))
30 {
31 conn.Open();
32 var sql = string.Format(@"SELECT * FROM [{0}$]", sheet);
33 var cmd = new OleDbCommand(sql, conn);
34 var reader = cmd.ExecuteReader();
35
36 if (columns == null)
37 for (var i = 0; i <= reader.FieldCount - 1; i++)
38 table.Columns.Add(reader.GetName(i));
39
40 while (reader != null && reader.Read())
41 if (columns != null)
42 {
43 items.Clear();
44 for (var d = 0; d <= table.Columns.Count - 1; d++)
45 items.Add(reader[d]);
46
47 var dataRow = table.NewRow();
48 dataRow.ItemArray = items.ToArray();
49 table.Rows.Add(dataRow);
50 }
51 else
52 {
53 items.Clear();
54 for (var d = 0; d <= reader.FieldCount - 1; d++)
55 items.Add(reader[d]);
56
57 var dataRow = table.NewRow();
58 dataRow.ItemArray = items.ToArray();
59 table.Rows.Add(dataRow);
60 }
61
62 return dataSet;
63 }
64 }

Straightforward enough: the function reads through the file row by row, splits each line into its comment and class label, maps them onto a SentenceModel, and returns the full list. Nothing fancy, just clean data loading.

With that in place, we can now call the function and pipe the result directly into our SentenceModel list. One line to load the file, and our entire training dataset is in memory, structured, and ready to feed into the algorithm.

Next up, we start doing the actual work: calculating the word probabilities that Naive Bayes will use to make its predictions.

1 //Loads dataset from the dataset file
2 var data = LoadData("Filename of the dataset file", "Book1", new[] { "Text", "Category" });
3 //Define list model for stencemodel
4 var sentences = new List();
5 //Insert all comments to the list
6 foreach (DataRow row in data.Tables[0].Rows)
7 {
8 var sentences = new SentenceModel(row["Text"].ToString(), Convert.ToDouble(row["Category"].ToString()));
9 sentences.Add(sentences);
10 }

Before we can feed any data into the algorithm, two preprocessing steps need to happen first.

The first is filtering. Not every word in a comment carries meaningful signal for classification. Stop words, common words like "the", "he", "she", "to", "and", appear constantly across both positive and negative comments and tell us nothing about sentiment. Including them would only add noise to our probability calculations, so we strip them out before doing anything else.

The second is building a Bag of Words. In most text classification contexts, a bag of words tracks how frequently each word appears. Here we're using it slightly differently: we don't care about frequency, we care about identity. Our algorithm works with numeric data, not strings, so we need a way to assign a unique numeric ID to every distinct word in the training set. The bag of words gives us that mapping: each unique word gets an ID, and from that point on the algorithm works with numbers rather than text.

To support this, we define a simple WordBagItem model with two fields:

  • Word: the string value of the word
  • Id: its unique numeric identifier

Clean and minimal: exactly what we need. Let's write it.

1 public class WordBagItem
2 {
3 public WordBagItem(string word, int id)
4 {
5 Word = word;
6 Id = id;
7 }
8
9 public string Word { get; private set; }
10
11 public int Id { get; private set; }
12 }
13

With the WordBagItem model defined, we can now write the two functions that handle our preprocessing pipeline.

The first is a filtering function, or more cleanly implemented as an extension method, that takes a list of words and strips out all the stop words. The stop word list doesn't need to be exhaustive, but it should cover the most common ones that carry no sentiment signal: articles, pronouns, prepositions, conjunctions and similar. Anything that would appear with roughly equal frequency in positive and negative comments is noise, and we want it gone before the algorithm sees the data.

The second is the bag of words builder. This function walks through every comment in the training set, tokenizes each one into individual words, runs them through the filter, and builds up a deduplicated list of WordBagItem objects: one entry per unique word, each assigned a numeric ID. By the end of this function, every meaningful word in our training data has a stable numeric identity that the algorithm can work with.

Together, these two functions form the complete preprocessing pipeline. Raw comment text goes in, a clean numeric word map comes out.

Let's write them.

1 private void PrepareWordBag()
2 {
3 for (var i = 0; i < sentences.Count(); i++)
4 {
5 var words = (sentences[i].Text.ExtractFeatures().FilterFeatures());
6 foreach (var t in
7 from t in words
8 let wrd = wordBag.FirstOrDefault(x => x.Word.ToLower() == t)
9 where wrd == null
10 select t)
11 wordBag.Add(new WordBagItem(t.Trim().ToLower(), wordBag.Count + 1));
12 }
13 }

The extension method sits on top of our string and list types, keeping the preprocessing logic clean and reusable throughout the codebase.

The first extension handles tokenization: splitting a raw comment string into individual words. This means stripping punctuation, handling whitespace, and returning a clean list of word tokens that we can work with programmatically.

The second extension handles filtering: taking that list of tokens and removing anything that appears in our stop word list. Rather than hardcoding the filtering logic everywhere it's needed, wrapping it as an extension method means we can call it in a single line anywhere in the pipeline, and swap out or expand the stop word list in one place if we ever need to.

Chaining these two extensions together gives us a clean one-liner that goes from a raw comment string to a filtered list of meaningful word tokens: ready to be mapped against the bag of words and fed into the algorithm.

Let's write them.

1 public static class Helper
2 {
3 public static IEnumerable ExtractFeatures(this string text)
4 {
5 return Regex.Replace(text, "\\p{P}+", "").Split(' ').ToList();
6 }
7
8 public static IEnumerable FilterFeatures(this IEnumerable list)
9 {
10 var filters = new[]
11 {
12 ".", ",", "!", "?", ":", ";", "_", "+", "/", @"\", "*", "the", "of", "on", "is", "a", "...", "1", "2", "3",
13 "4", "5", "6", "7", "8", "9", "0", " ", "for", "an", "", "it", "she", "he", "they", "we", "them", "our",
14 "his", "her"
15 };
16
17 return list.Where(x => !filters.Contains(x)).ToList();
18 }
19 }

Before we can feed the training data into the algorithm, we need one more model to act as the container that carries everything together.

This class serves as the structured package we pass to the algorithm: it bundles the processed comments, their class labels, and the bag of words into a single object. Rather than passing multiple separate lists and parameters into the algorithm independently, this model keeps everything organized and makes the interface of our Naive Bayes class clean and straightforward.

Let's define it.

1 public class NaiveBayesTextModel
2 {
3 [AiLabel]
4 public double Label { get; set; }
5
6 [AiField]
7 public double WordId { get; set; }
8
9 public string Word { get; set; }
10 }

With the model defined and the attributes in place, the picture is now complete. AiLabel tells the algorithm which field carries the class label: in our case, whether a comment is positive or negative. AiField marks the fields that represent the actual features the algorithm should learn from. Everything else gets ignored.

This attribute-based approach keeps the model clean and the algorithm flexible. Rather than hardcoding which fields to use inside the algorithm itself, the model declares its own structure: making it straightforward to swap in a different dataset or add new features without touching the core algorithm logic.

With the dataset model defined, the preprocessing pipeline in place, and the bag of words ready to go, we have everything the algorithm needs. It's time to feed it and start making predictions.

1 var dataSet = new List();
2 foreach (var t in sentences)
3 {
4 var words = t.Text.ExtractFeatures();
5
6 foreach (var t1 in words)
7 {
8 var naiveBayesTextModel = new NaiveBayesTextModel();
9 var firstOrDefault = wordBag.FirstOrDefault(x => x.Word == t1.ToLower());
10 if (firstOrDefault == null) continue;
11 naiveBayesTextModel.WordId = firstOrDefault.Id;
12 naiveBayesTextModel.Label = t.Category;
13 naiveBayesTextModel.Word = t1.ToLower();
14 dataSet.Add(naiveBayesTextModel);
15 }
16 }

Naive Bayes Multinomial

Naive Bayes comes in several variants, and the right one depends on the nature of your data. For text and document classification, the Multinomial technique is the go-to choice: it's well-established, efficient, and consistently performs well on exactly the kind of problem we're solving here.

The Multinomial technique works by calculating a small set of key probability values from the training data upfront, then combining them at prediction time to determine which class a new comment most likely belongs to.

The two key calculations are:

Probability of each label:

P(Positive)=Total Positive WordsTotal WordsP(Positive)=Total WordsTotal Positive Words​P(Negative)=Total Negative WordsTotal WordsP(Negative)=Total WordsTotal Negative Words​

Probability of a word given each label:

P(Word∣Positive)=Number of occurrences of the word in Positive comments+1Total words in Positive comments+Total unique words across all comments
P(Word∣Positive)=Total words in Positive comments+Total unique words across all commentsNumber of occurrences of the word in Positive comments+1​

The +1 in the numerator is worth calling out: this is Laplace smoothing, a technique used to handle words that appear in the test data but never appeared in the training data for a given class. Without it, a single unseen word would zero out the entire probability calculation, which would make the algorithm brittle and unreliable on real-world data.

In our implementation, we work with word IDs rather than the word strings themselves: so wherever you see "Word" in the formula, think of it as the numeric ID we generated for that word in the bag of words step.

With the theory clear, we're ready to build the implementation. As with KNN, the code below is taken from my open source Ellipses library. The Converter class that handles data transformation is available on the Ellipses project page: I've left it out here to keep the focus on the algorithm itself.

We start by defining the interface for our Naive Bayes class. The design keeps the technique interchangeable: Multinomial is what we're using today, but the interface is structured so that other techniques can be plugged in later without changing the core class.

Let's write it.

1namespace Ellipses.Interfaces
2{
3 public interface INaiveBayes
4 {
5 ///
6 /// Load data set
7 ///
8 /// Data set
9 /// Normalize data
10 void LoadDataSet(T[] models, bool normalization = false);
11
12 ///
13 /// Trains model for prediction
14 ///
15 INaiveBayesPredicter Fit();
16 }
17}
18


1/* ========================================================================
2 * Ellipses Machine Learning Library 1.0
3 * https://www.ellipsesai.com
4 * ========================================================================
5 *
6 * Copyright Ali Gulum
7 *
8 * ========================================================================
9 * Licensed under the Creative Commons Attribution-NonCommercial 4.0 International License;
10 * you may not use this file except in compliance with the License.
11 * You may obtain a copy of the License at
12 *
13 * https://creativecommons.org/licenses/by-nc/4.0
14 *
15 * Unless required by applicable law or agreed to in writing, software
16 * distributed under the License is distributed on an "AS IS" BASIS,
17 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18 * See the License for the specific language governing permissions and
19 * limitations under the License.
20 * ========================================================================
21 */
22
23using System.Collections.Generic;
24using System.Linq;
25using Ellipses.Helpers;
26using Ellipses.Interfaces;
27using Ellipses.Metrics;
28
29namespace Ellipses.Algorithms.Nb
30{
31 public class NaiveBayes : INaiveBayes
32 {
33 //Trainer
34 private readonly INaiveBayesAlgorithm _algorithm;
35
36 //Helper class for converting models
37 private readonly IConverter _converter;
38
39 //Normalizer
40 private readonly INormalizer _normalizer;
41
42 //Data set as double array list
43 private double[][] _dataSet;
44
45 //Data set as matrix
46 private Matrix _matrix;
47
48 //Labels as matrix
49 private Matrix _matrixLabel;
50
51 ///
52 /// Naive Bayes
53 ///
54 /// Algorithm for the naive bayes algorithm
55 /// Normalizer
56 /// Converter for the models
57 public NaiveBayes(INaiveBayesAlgorithm algorithm = null, INormalizer normalizer = null,
58 IConverter converter = null)
59 {
60 _algorithm = algorithm ?? new NaiveBayesBinaryAlgorithm();
61 _normalizer = normalizer ?? new Normalizer();
62 _converter = converter ?? new Converter();
63 }
64
65 ///
66 /// Load data set
67 ///
68 /// Data set
69 /// Normalize data
70 public void LoadDataSet(T[] models, bool normalization = false)
71 {
72 var isDimensional = _converter.IsDimensionalFieldExist(models);
73 _dataSet = _converter.ConvertModels(models);
74
75 if (normalization)
76 _dataSet = _normalizer.Normalize(_dataSet);
77
78 _matrix = new Matrix(_dataSet);
79 _matrixLabel = _converter.ConvertLabelsToMatrix(models);
80
81 if (isDimensional)
82 PrepareDimensionalData(_converter.GetDimensionalData(models, normalization));
83 }
84
85 ///
86 /// Trains model for prediction
87 ///
88 public INaiveBayesPredicter Fit()
89 {
90 //Calculate probabilities of the labels
91 _algorithm.ComputeProbabilityOfLabels(_matrixLabel);
92
93 //Calculate conditional probabilities of the features
94 _algorithm.ComputeProbabilityOfFeatures(_matrix, _matrixLabel);
95
96 //Prepeare and return trained model for prediction
97 var naiveBayesPredicter = _algorithm.GetPredictor();
98 return naiveBayesPredicter;
99 }
100
101 #region Helpers
102
103 ///
104 /// Prepares dimensional data
105 ///
106 /// Matrix of dimensional data
107 private void PrepareDimensionalData(Matrix dimensionalMatrix)
108 {
109 var matrixValues = new List();
110 var dimensionalMatrixValues = new List();
111 var newMatrix = new List();
112 var newLabelList = new List();
113
114 for (var dRow = 0; dRow < dimensionalMatrix.Rows; dRow++)
115 dimensionalMatrixValues.Add(dimensionalMatrix[dRow].ToArray());
116
117 for (var mRow = 0; mRow < _matrix.Rows; mRow++)
118 matrixValues.Add(_matrix[mRow].ToArray());
119
120 for (var dVal = 0; dVal < dimensionalMatrixValues.Count; dVal++)
121 {
122 newMatrix.Add(matrixValues[dVal].Concat(dimensionalMatrixValues[dVal]).ToArray());
123 newLabelList.Add(_matrixLabel[dVal].ToArray());
124 }
125
126 _matrix = new Matrix(newMatrix.ToArray());
127 _matrixLabel = new Matrix(newLabelList.ToArray());
128 }
129
130 #endregion
131 }
132}

The Naive Bayes class itself stays deliberately thin. It receives the dataset model, hands it off to the Converter to transform it into the dimensional array the algorithm can work with, and then delegates to whichever technique is currently configured: in our case, Multinomial. Once the technique has processed the training data, it generates a predictor object that encapsulates everything needed to classify new comments going forward.

This separation of concerns is intentional. The Naive Bayes class doesn't need to know anything about how Multinomial works internally: it just knows how to pass data in and get a predictor out. If we ever wanted to swap in a different technique, we'd simply swap the implementation without touching anything else.

Now let's define the Multinomial technique itself. This is where the actual probability calculations happen: the label probabilities, the per-word conditional probabilities, and the Laplace smoothing we covered earlier. Everything the predictor needs to make accurate classifications gets computed here during the training phase.

Let's write it.

1using Ellipses.Metrics;
2
3namespace Ellipses.Interfaces
4{
5 public interface INaiveBayesAlgorithm
6 {
7 ///
8 /// Computes probability of labels
9 ///
10 /// Referance of Label Matrix
11 void ComputeProbabilityOfLabels(Matrix labelMatrix);
12
13 ///
14 /// Computes probability of features
15 ///
16 /// Matrix of data
17 /// Referance of Label Matrix
18 void ComputeProbabilityOfFeatures(Matrix matrix, Matrix labelMatrix);
19
20 ///
21 /// Returns predictor for the algorithm
22 ///
23 INaiveBayesPredicter GetPredictor();
24 }
25}


1/* ========================================================================
2 * Ellipses Machine Learning Library 1.0
3 * https://www.ellipsesai.com
4 * ========================================================================
5 *
6 * Copyright Ali Gulum
7 *
8 * ========================================================================
9 * Licensed under the Creative Commons Attribution-NonCommercial 4.0 International License;
10 * you may not use this file except in compliance with the License.
11 * You may obtain a copy of the License at
12 *
13 * https://creativecommons.org/licenses/by-nc/4.0
14 *
15 * Unless required by applicable law or agreed to in writing, software
16 * distributed under the License is distributed on an "AS IS" BASIS,
17 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18 * See the License for the specific language governing permissions and
19 * limitations under the License.
20 * ========================================================================
21 */
22
23using System;
24using System.Collections.Concurrent;
25using System.Linq;
26using System.Threading.Tasks;
27using Ellipses.Helpers;
28using Ellipses.Interfaces;
29using Ellipses.Metrics;
30using Ellipses.Models;
31
32namespace Ellipses.Algorithms.Nb
33{
34 public class NaiveBayesMultinomialAlgorithm : INaiveBayesAlgorithm
35 {
36 private const double TOLERANCE = 0.1;
37
38 //Helper for the various operations
39 private readonly IHelper _helper;
40
41 //Probabilities of the features
42 private ConcurrentBag _featureProbabilities;
43
44 //Matrix of labels
45 private Matrix _labelMatrix;
46
47 //Probabilities of the labels
48 private ConcurrentDictionary _labelProbabilities;
49
50 public NaiveBayesMultinomialAlgorithm()
51 {
52 _helper = new Helper();
53 }
54
55 ///
56 /// Computes probability of labels
57 ///
58 /// Referance of Label Matrix
59 public void ComputeProbabilityOfLabels(Matrix labelMatrix)
60 {
61 var labelProbabilities = new ConcurrentDictionary();
62 var labelList = labelMatrix.GetRows().Select(x => x[0]).Distinct().ToList();
63 for (var i = 0; i < labelList.Count(); i++)
64 {
65 var lbl = labelList[i];
66 var tLbl = labelMatrix.GetRows().Where(x => Math.Abs(x[0] - lbl) < TOLERANCE).ToList().Count();
67 var pLabel = (double) tLbl/labelMatrix.Rows;
68 labelProbabilities.TryAdd(lbl, pLabel);
69 }
70 _labelProbabilities = labelProbabilities;
71 }
72
73 ///
74 /// Computes probability of features
75 ///
76 /// Matrix of data
77 /// Referance of Label Matrix
78 public void ComputeProbabilityOfFeatures(Matrix matrix, Matrix labelMatrix)
79 {
80 var featureProbabilities = new ConcurrentBag();
81
82 //Connect matrix
83 var connectedMatrix = matrix.ConnectMatrix(labelMatrix);
84 var labelList = labelMatrix.GetRows().Select(x => x[0]).Distinct().ToList();
85 var featureLength = connectedMatrix.Cols;
86 var matrixRows = connectedMatrix.GetRows();
87
88 var vectors = matrixRows as Vector[] ?? matrixRows.ToArray();
89 Parallel.For(0, labelList.Count, l =>
90 {
91 Parallel.For(0, connectedMatrix.Rows, i =>
92 {
93 var fTotalAccordingToLabel =
94 vectors.Count(x => Math.Abs(x[featureLength - 1] - labelList[l]) < TOLERANCE);
95 var lUnique = connectedMatrix[i, featureLength - 1];
96 Parallel.For(0, matrix.Cols, j =>
97 {
98 var f = connectedMatrix[i, j];
99 var fTotal =
100 vectors.Count(
101 x =>
102 Math.Abs(x[featureLength - 1] - labelList[l]) < TOLERANCE &&
103 Math.Abs(x[j] - f) < TOLERANCE);
104 var fUniqueTotal = matrix.Select(x => x[j]).Distinct().ToList().Count;
105 var fProbability = ((double) fTotal + 1)/((double) fTotalAccordingToLabel + fUniqueTotal);
106
107 var probability = new NaiveBayesProbability
108 {
109 FeatureIndex = j,
110 FeatureProbability = Math.Abs(fProbability),
111 Feature = f,
112 FeatureTotal = fTotal,
113 Label = labelList[l],
114 LabelUnique = lUnique
115 };
116 featureProbabilities.Add(probability);
117 });
118 });
119 });
120 _labelMatrix = labelMatrix;
121 _featureProbabilities = featureProbabilities;
122 }
123
124 ///
125 /// Returns predictor for the algorithm
126 ///
127 public INaiveBayesPredicter GetPredictor()
128 {
129 return new NaiveBayesMultinomialPredicter(_labelProbabilities, _featureProbabilities, _labelMatrix);
130 }
131
132 }
133}

The predictor is the end product of the training phase: it's the object that gets handed back once all the probability calculations are done, and it's what we actually use to classify new comments.

We start with the interface, which keeps things clean and consistent. The predictor exposes two methods: Predict, which takes a comment and returns the single most likely class label, and PredictWithProbabilities, which returns the full probability breakdown across all labels: useful when you want to know not just what the algorithm decided, but how confident it was.

The implementation receives three things in its constructor: the label probabilities, the feature probabilities, and the label matrix. These are everything computed during the training phase, passed in and stored privately so the predictor can reference them on every classification call.

At prediction time, the model is first converted into a numeric array using the Converter class. If the model contains dimensional features, those get expanded and merged with the base feature array before any probability calculations begin.

From there, the algorithm iterates over every label in parallel using Parallel.For. For each label, it walks through every feature in the converted model and looks up its pre-computed probability. If the feature exists in the training data for that label, its probability gets multiplied directly into a running total called totalDot. If the feature wasn't seen during training for that label, which will happen with real-world data, Laplace smoothing kicks in: the algorithm calculates a fallback probability using (0 + 1) / (featureCountForLabel + totalUniqueFeatures), ensuring that a single unseen word never zeros out the entire prediction.

Once all features have been processed, the final probability for that label is computed by multiplying totalDot by the label's own prior probability. Whichever label ends up with the highest value is the prediction.

Let's write the interface and the class.

1using System.Collections.Generic;
2using Ellipses.Models;
3
4namespace Ellipses.Interfaces
5{
6 public interface INaiveBayesPredicter
7 {
8 ///
9 /// Predicts according to model
10 ///
11 /// Model for prediction label
12 double Predict(T model);
13
14 ///
15 /// Predicts according to model and return all probabilities for all labels
16 ///
17 /// Model for prediction label
18 List PredictWithProbabilities(T model);
19 }
20}


1/* ========================================================================
2 * Ellipses Machine Learning Library 1.0
3 * https://www.ellipsesai.com
4 * ========================================================================
5 *
6 * Copyright Ali Gulum
7 *
8 * ========================================================================
9 * Licensed under the Creative Commons Attribution-NonCommercial 4.0 International License;
10 * you may not use this file except in compliance with the License.
11 * You may obtain a copy of the License at
12 *
13 * https://creativecommons.org/licenses/by-nc/4.0
14 *
15 * Unless required by applicable law or agreed to in writing, software
16 * distributed under the License is distributed on an "AS IS" BASIS,
17 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18 * See the License for the specific language governing permissions and
19 * limitations under the License.
20 * ========================================================================
21 */
22
23using System;
24using System.Collections.Concurrent;
25using System.Collections.Generic;
26using System.Linq;
27using System.Threading.Tasks;
28using Ellipses.Helpers;
29using Ellipses.Interfaces;
30using Ellipses.Metrics;
31using Ellipses.Models;
32
33namespace Ellipses.Algorithms.Nb
34{
35 internal class NaiveBayesMultinomialPredicter : INaiveBayesPredicter
36 {
37 private const double TOLERANCE = 0.1;
38
39 //Helper class for converting models
40 private readonly IConverter _converter;
41
42 //Probabilities of the features
43 private readonly ConcurrentBag _featureProbabilities;
44
45 //Probabilities of the labels
46 private readonly ConcurrentDictionary _labelProbabilities;
47
48 //Labels
49 private readonly List _lbls;
50
51 ///
52 /// Naive Bayes Predicter
53 ///
54 /// Probability set for the labels
55 /// Probability set for the features
56 /// Matrix of label
57 public NaiveBayesMultinomialPredicter(ConcurrentDictionary labelProbabilities,
58 ConcurrentBag featureProbabilities, Matrix labelMatrix)
59 {
60 _converter = new Converter();
61 _labelProbabilities = labelProbabilities;
62 _featureProbabilities = featureProbabilities;
63 _lbls = labelMatrix.GetRows().Select(x => x[0]).Distinct().ToList();
64 }
65
66 ///
67 /// Predicts according to model
68 ///
69 /// Model for prediction label
70 public double Predict(T model)
71 {
72 return GetPrediction(model).Aggregate((l, r) => l.Value > r.Value ? l : r).Key;
73 }
74
75 ///
76 /// Predicts according to model and return all probabilities for all labels
77 ///
78 /// Model for prediction label
79 public List PredictWithProbabilities(T model)
80 {
81 return
82 GetPrediction(model).Select(v => new NaiveBayesResult {Label = v.Key, Probability = v.Value}).ToList();
83 }
84
85 ///
86 /// Predicts according to model and return all probabilities for all labels
87 ///
88 /// Model for prediction label
89 private ConcurrentDictionary GetPrediction(T model)
90 {
91 var probabilities = new ConcurrentDictionary();
92 var modelConverted = _converter.ConvertModel(model);
93 var dimensionalProcess = _converter.IsDimensional(model);
94
95 if (dimensionalProcess)
96 {
97 var newList = new List();
98 var dimensionalFeatures = _converter.ConvertDimensionalModel(model);
99
100 var orjModelValues = modelConverted.Select(t => t.ToArray()).ToList();
101
102 for (var dimensionalRow = 0; dimensionalRow < dimensionalFeatures.Length; dimensionalRow++)
103 newList.AddRange(orjModelValues.Select(t => t.Concat(dimensionalFeatures[dimensionalRow]).ToArray()));
104
105 modelConverted = newList.ToArray();
106 }
107 Parallel.For(0, _labelProbabilities.Count, l =>
108 {
109 var label = _labelProbabilities.Keys.ElementAt(l);
110 var fTotalAccordingToLabel =
111 _featureProbabilities.Where(x => Math.Abs(x.LabelUnique - label) < TOLERANCE).Distinct().Count()/
112 _labelProbabilities.Count;
113
114 var totalDot = 1.0;
115
116 Parallel.For(0, modelConverted[0].Count(), i =>
117 {
118 var f = modelConverted[0][i];
119 var fVal =
120 _featureProbabilities.FirstOrDefault(
121 x => Math.Abs(x.Feature - f) < TOLERANCE && Math.Abs(x.Label - label) < TOLERANCE);
122 if (fVal != null)
123 {
124 totalDot *= fVal.FeatureProbability;
125 }
126 else
127 {
128 const int fTotal = 0;
129 var fUniqueTotal = _featureProbabilities.Select(x => x.Feature).Distinct().ToList().Count;
130 var fProbability = ((double) fTotal + 1)/((double) fTotalAccordingToLabel + fUniqueTotal);
131 totalDot *= fProbability;
132 }
133 });
134
135 var probability = _labelProbabilities[label]*totalDot;
136 probabilities.TryAdd(label, probability);
137 });
138 return probabilities;
139 }
140 }
141}

With all the pieces in place, we're ready to put everything together and see Naive Bayes with the Multinomial technique running end to end. Let's write the final implementation.

1 var naiveBayes = new NaiveBayes(new NaiveBayesMultinomialAlgorithm());
2 naiveBayes.LoadDataSet(dataSet.ToArray());
3 var predicter = naiveBayes.Fit();

Putting It All Together

To recap what we've built and the order in which everything happens:

  1. Prepare and clean the dataset
  2. Generate unique IDs for each word
  3. Pass the data to the algorithm
  4. The algorithm converts the models into numerical arrays
  5. The algorithm applies the Multinomial probability rules across the data
  6. A predictor is returned, ready to classify new input

Now let's see it in action. We'll ask the algorithm to classify the following sentence:

"It is a really great product, I like it!"

Below you'll find the code for passing a new sentence to the fitted model and getting a prediction back.

1 var sentences = new SentenceModel("It is a really great product, I like it!");
2 var words = (sentences.Text.ExtractFeatures());
3 var values = new List();
4 var naiveBayesTextModel = new NaiveBayesTextTextModel();
5 foreach (var t in words)
6 {
7 var wrd = wordBag.FirstOrDefault(x => x.Word.ToLower() == t);
8 if (wrd == null)
9 wordBag.Add(new WordBagItem(t.Trim().ToLower(), wordBag.Count + 1));
10
11 var firstOrDefault = wordBag.FirstOrDefault(x => x.Word == t.ToLower());
12 if (firstOrDefault == null) continue;
13
14 values.Add(firstOrDefault.Id);
15 }
16 naiveBayesTextModel.Values = new double[1][];
17 naiveBayesTextModel.Values[0] = values.ToArray();
18 var res = predicter.PredictWithProbabilities(naiveBayesTextModel);

As you can see in the output above, the algorithm returns a Positive label with a higher probability, and since 1 represents positive in our dataset, that's exactly the right answer for "It is a really great product, I like it!"

Let's push it a bit further with a negative example: "It doesn't cost, exactly terrible, I don't recommend it."

This time the result flips: the Negative label carries the higher probability, and the algorithm classifies the comment correctly once again.

Two for two, on a model trained with a minimal dataset and built entirely from scratch.

Before wrapping up, it's worth reiterating: Naive Bayes is a genuinely useful algorithm, and the Multinomial technique in particular is one of the most reliable approaches available for text and document classification. The example we've built here is intentionally simple, but the same foundations scale well. A natural next step would be adding a third label, "Neutral", to the training data. The algorithm handles multiple classes without any structural changes; you'd simply check for the highest probability across all three labels when making a prediction, and the rest of the logic stays exactly the same.