What is a Feed-Forward Neural Network Layer? (With C++ & Python Examples)
Demystifying the Feed-Forward Neural Network Layer: A Hands-On Guide in C++ and Python
In the world of machine learning and artificial intelligence, neural networks are often treated as black boxes. But beneath the surface, they are built from simple, elegant mathematical blocks. The most fundamental and widely used of these blocks is the Feed-Forward Layer (also known as a Dense or Fully Connected layer).
Whether you are building a simple classifier or training complex deep learning models for financial market prediction, understanding the feed-forward layer is essential.
In this post, we will break down what a feed-forward layer is, how it works, and how to implement it. To make things practical, we will walk through a complete example (solving the classic XOR problem) using a lightweight, dependency-free C++ neural network library, and then compare it with implementations in Python, PyTorch, and TensorFlow.
What is a Feed-Forward Layer?
At its core, a feed-forward layer is a collection of artificial neurons where every input is connected to every output.
Inputs Hidden Layer (FF) Output Layer
(X1) ---------> (Neuron 1) ---------> (Output Y)
\ / \ /
\ / \ /
\ / \ /
\ / \ /
(X2) -----\/--> (Neuron 2) -\/
When data passes through a feed-forward layer, three main operations occur:
- Weight Multiplication: Each input signal is multiplied by a “weight” representing the strength of the connection.
- Bias Addition: A “bias” value is added to the weighted sum. The bias allows the activation function to shift left or right, which is crucial for learning complex patterns.
- Activation Function: The combined sum is passed through a non-linear activation function (like Sigmoid, ReLU, or Tanh). This non-linearity allows the network to learn relationships that are more complex than a straight line.
Mathematically, for a given input vector $\mathbf{x}$, the output $\mathbf{y}$ of a feed-forward layer is represented as:
$$\mathbf{y} = f(\mathbf{W}\mathbf{x} + \mathbf{b})$$
Where:
- $\mathbf{W}$ is the weight matrix.
- $\mathbf{b}$ is the bias vector.
- $f$ is the activation function.
The XOR Problem: Our Testing Ground
To demonstrate feed-forward layers in action, we will use the XOR (Exclusive OR) gate. The XOR gate is a classic problem in machine learning because it is linearly inseparable—you cannot separate the outputs ($0$ and $1$) with a single straight line.
To solve it, we need at least one hidden feed-forward layer to warp the input space so that it becomes separable.
| Input 1 | Input 2 | Expected Output |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
1. The C++ Implementation (Using myoddweb::nn)
For performance-critical systems, trading bots, or resource-constrained environments, C++ is the language of choice. Below is an example using myoddweb::nn, a lightweight, dependency-free C++ neural network library.
Here, we configure a network with:
- An Input Layer of 2 neurons.
- One Hidden Feed-Forward Layer of 2 neurons (using Sigmoid activation).
- An Output Layer of 1 neuron (using Sigmoid activation).
#include <iostream>
#include <vector>
#include "neuralnetwork/neuralnetwork.h"
#include "neuralnetwork/common/logger.h"
using namespace myoddweb::nn;
int main()
{
// 1. Define the network topology: 2 inputs, 2 hidden neurons, 1 output
std::vector<unsigned> topology = { 2, 2, 1 };
// 2. Define the hidden layer configurations (using Feed-Forward architecture)
std::vector<LayerDetails> hidden_layers = {
LayerDetails(
Layer::Architecture::FF,
2,
activation(activation::method::sigmoid, 1.0),
0.0, // Dropout rate (0.0 = disabled)
0.0, // Weight decay
OptimiserType::SGD,
0.99 // Momentum
)
};
// 3. Define the output layer configuration
auto output_layer = OutputLayerDetails(
topology.back(),
activation(activation::method::sigmoid, 1.0),
ErrorCalculation::type::mse,
{ 0.0, 0.0, 1.0, 0.0, false, 1.0 }, // Evaluation config
0.0, // Weight decay
OptimiserType::SGD,
0.99 // Momentum
);
// 4. Build the configuration options
auto options = NeuralNetworkOptions::create(topology)
.with_batch_size(1)
.with_output_layer_details(output_layer)
.with_hidden_layers(hidden_layers)
.with_learning_rate(0.1)
.with_number_of_epoch(5000)
.with_log_level(Logger::LogLevel::Info)
.build();
// 5. Create the neural network instance
NeuralNetwork nn(options);
// 6. Define training inputs (XOR inputs) and expected outputs
std::vector<std::vector<double>> inputs = {
{ 0.0, 0.0 },
{ 0.0, 1.0 },
{ 1.0, 0.0 },
{ 1.0, 1.0 }
};
std::vector<std::vector<double>> outputs = {
{ 0.0 },
{ 1.0 },
{ 1.0 },
{ 0.0 }
};
// 7. Train the network
std::cout << "Training the neural network...\n";
nn.train(inputs, outputs);
// 8. Run inference to check predictions
std::cout << "\nInference Results:\n";
for (const auto& input : inputs)
{
auto result = nn.think(input);
std::cout << "Input: {" << input[0] << ", " << input[1]
<< "} -> Predicted: " << result[0] << "\n";
}
return 0;
}
What Output to Expect
As the network trains over 5000 epochs, the error (mean squared error) steadily decreases. By the end of the training, the output will look something like this:
Training the neural network...
Inference Results:
Input: {0, 0} -> Predicted: 0.0152
Input: {0, 1} -> Predicted: 0.9814
Input: {1, 0} -> Predicted: 0.9815
Input: {1, 1} -> Predicted: 0.0189
Notice how the outputs are extremely close to the expected XOR values ($0$ and $1$).
2. The Python Implementation (Using myoddweb::nn Bindings)
If you prefer Python but want to leverage the performance of the C++ engine, myoddweb::nn includes Python bindings. Here is the same example implemented in Python:
import neuralnetwork as nn
# 1. Define topology
topology = [2, 2, 1]
# 2. Configure hidden and output layers
hidden_activation = nn.Activation(nn.ActivationMethod.Sigmoid, 1.0)
hidden_layers = [
nn.LayerDetails(
nn.LayerArchitecture.FF,
2,
hidden_activation,
0.0, 0.0,
nn.OptimiserType.SGD,
0.99
)
]
out_activation = nn.Activation(nn.ActivationMethod.Sigmoid, 1.0)
out_layer = nn.OutputLayerDetails(
topology[-1],
out_activation,
nn.ErrorCalculationType.MSE,
nn.EvaluationConfig(),
0.0,
nn.OptimiserType.SGD,
0.99
)
# 3. Build neural network options
options = nn.NeuralNetworkOptions.create(topology) \
.with_batch_size(1) \
.with_hidden_layers(hidden_layers) \
.with_output_layer_details(out_layer) \
.with_learning_rate(0.1) \
.with_number_of_epoch(5000) \
.with_log_level(nn.LogLevel.Info) \
.build()
# 4. Instantiate and train the network
net = nn.NeuralNetwork(options)
training_inputs = [
[0.0, 0.0],
[0.0, 1.0],
[1.0, 0.0],
[1.0, 1.0]
]
training_outputs = [
[0.0],
[1.0],
[1.0],
[0.0]
]
print("Training model...")
net.train(training_inputs, training_outputs)
# 5. Evaluate predictions
print("\nInference Results:")
for inputs, expected in zip(training_inputs, training_outputs):
outputs = net.think(inputs)
print(f"Input: {inputs} | Expected: {expected[0]} | Predicted: {outputs[0]:.4f}")
How Does This Compare to Other Libraries?
To see how the custom library design compares to major deep learning frameworks, let’s look at the same XOR problem built in PyTorch and TensorFlow.
Option A: PyTorch (Python)
PyTorch is favoured in research due to its pythonic and dynamic nature. In PyTorch, feed-forward layers are represented by the nn.Linear class.
import torch
import torch.nn as nn
import torch.optim as optim
# 1. Define model architecture using nn.Linear (Feed-Forward)
class XORModel(nn.Module):
def __init__(self):
super(XORModel, self).__init__()
self.hidden = nn.Linear(2, 2) # 2 inputs -> 2 hidden neurons
self.sigmoid = nn.Sigmoid()
self.output = nn.Linear(2, 1) # 2 hidden neurons -> 1 output
def forward(self, x):
x = self.sigmoid(self.hidden(x))
x = self.sigmoid(self.output(x))
return x
model = XORModel()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.99)
# 2. Data
inputs = torch.tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]], dtype=torch.float32)
outputs = torch.tensor([[0.0], [1.0], [1.0], [0.0]], dtype=torch.float32)
# 3. Training
for epoch in range(5000):
optimizer.zero_grad()
preds = model(inputs)
loss = criterion(preds, outputs)
loss.backward()
optimizer.step()
# 4. Inference
with torch.no_grad():
predictions = model(inputs)
print("PyTorch Results:")
for inp, pred in zip(inputs, predictions):
print(f"Input: {inp.tolist()} -> Predicted: {pred.item():.4f}")
Option B: TensorFlow / Keras (Python)
TensorFlow and Keras are widely used in enterprise production settings. Here, the feed-forward layer is represented by the Dense class.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# 1. Define model architecture using Dense (Feed-Forward)
model = Sequential([
Dense(2, input_dim=2, activation='sigmoid'), # Hidden layer
Dense(1, activation='sigmoid') # Output layer
])
# 2. Compile model
model.compile(
optimizer=SGD(learning_rate=0.1, momentum=0.99),
loss='mean_squared_error'
)
# 3. Data
inputs = np.array([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]], dtype=np.float32)
outputs = np.array([[0.0], [1.0], [1.0], [0.0]], dtype=np.float32)
# 4. Training
model.fit(inputs, outputs, epochs=5000, batch_size=1, verbose=0)
# 5. Inference
predictions = model.predict(inputs)
print("TensorFlow/Keras Results:")
for inp, pred in zip(inputs, predictions):
print(f"Input: {list(inp)} -> Predicted: {pred[0]:.4f}")
Conclusion & Next Steps
Feed-forward layers are the starting point of neural networks, mapping inputs to outputs via simple matrix multiplication and non-linear mappings. While high-level libraries like PyTorch and TensorFlow make it easy to assemble these layers, using a lightweight C++ library like myoddweb::nn on GitHub gives you deep control, portability, and zero-dependency integration.
If you are interested in looking under the hood of neural networks, learning how backpropagation is coded from scratch, or exploring recurrent layers like Elman RNNs and GRUs in native C++, check out the repository, star the project, and start experimenting!
Check out the project on GitHub: FFMG/neural-network














Recent Comments