Neural Networks using MLPack

Motivation
Artificial Neural Networks (ANN) [1],[2] are a class of algorithms for supervised learning that were motivated by how neurons in the human brain work. A simple ANN is shown below.

Neural Net
The input data is vector that is transformed in some way and transmitted to the hidden layer which in turn transforms the received data and transmits it to the output layer. There can be multiple hidden layers although a single hidden layer is good enough in many cases.

 

The Algorithm
The transformation that takes place between layers is usually a function of the weighted sum of the input. Learning consists of estimating the correct values for the weights. Prior to that a number of design decisions have to made:

  • The number of hidden layers
  • The number of nodes in each hidden layer
  • The function transforming the weighted input to output at each layer.
  • A performance measure such as sum of squares or cross entropy
  • Whether a bias is used
  • The value of the regularisation parameter

MLPack-ANN, the ANN framework implemented in MLPack[6] allows all of the parameters to be set. The purpose of this blog is to describe the implementation so that a developer will find it easier to use it.

Design of ANN
ANNs can be implemented using the decorator pattern[3] where each layer can be thought of as decorating its input and transmitting its output to the next layer. Another way to implement it would be put each layer in a list and traverse the list. MLPack-ANN uses the second method.

Use of Tuples
A list of items of the same type can be stored in a vector. If different types need to be stored then they must all inherit from a common base class and a pointer (or a smart pointer) can be kept in the vector. A tuple on the other hand is just an ordered list of heterogeneous items. Notice that in a ANN, the input layer has no input and the output layer has no output. While it is always possible to create an interface layer, ANN uses static polymorphism through the use of templates. This avoids virtual functions and class hierarchies, although there are a different issues to address.
A tuple is immutable. It is not possible to add or remove items from a tuple. Hence to access the n-th item, ‘n’ must be known at compile time. Thus if we need to apply a function on every item in a list an iterative loop will not work. The reason for that and a fix for it follows.

 

C++ Idiom using Tuples

Consider the following code that will NOT compile

template< typename... Args>
void for_every(std::tuple<Args...> & t)
{
	const size_t n= sizeof...(Args);
	for(size_t i=0; i<n; ++i )
		SomeAction(std::get<i>(t));
}

This will not compile because ‘i’ needs to be known at compile time, but ‘i’ changes at runtime. There are a few ways of addressing this issue. MLPack-ANN uses this idiom:

  template<size_t I = 0, typename... Tp>
  typename std::enable_if<I == sizeof...(Tp), void>::type
  ResetParameter(std::tuple<Tp...>& /* unused */) 
  { /* Nothing to do here */ }

  template<size_t I = 0, typename... Tp>
  typename std::enable_if<I < sizeof...(Tp), void>::type
  ResetParameter(std::tuple<Tp...>& network)
  {
    ResetDeterministic(std::get<I>(network));
    ResetParameter<I + 1, Tp...>(network);
  }

It is good to be aware of this idiom as it is used many times in MLPack ANN. You would then call it using:

  ResetParameter(network);

where network is of type tuple.

Architecture of MLPack-ANN

Consider a simple three-stage neural network with one hidden layer in addition to an input and an output layer. In addition to the input it is common to add a bias node whose value is always one to the input and/or hidden layer. The weighted sum of the nodes are then transformed using a sigmoid or tanh function. The resulting cascading structure is shown below:

Block Diagram ANN

The code for creating the above block diagram is listed here:


LinearLayer<> inputLayer(inputVectorSize, hiddenLayerSize);
BiasLayer<> inputBiasLayer(hiddenLayerSize);
BaseLayer<LogisticFunction> inputBaseLayer;

LinearLayer<> hiddenLayer1(hiddenLayerSize, outputVectorSize);
BiasLayer<> hiddenBiasLayer1(outputVectorSize);
BaseLayer<LogisticFunction> outputLayer;

BinaryClassificationLayer classOutputLayer;

auto modules = std::tie(inputLayer, inputBiasLayer, inputBaseLayer,
hiddenLayer1, hiddenBiasLayer1, outputLayer);

FFN<decltype(modules),
    decltype(classOutputLayer),
    MeanSquaredErrorFunction> net(modules, classOutputLayer);

There a number of unit tests that illustrate the use of FFN (Feed Forward Network). I hope the gentle reader using MLPack-ANN will have a head start because of this blog.

Thus ANN fulfills an important role but there two features I would like to add.

Added Features
While learning a machine is important, it is also necessary that the machine be saved or transmitted for use on other devices or at different times. Hence I added the archive feature using Boost::Archive [4]. This is available in my Github repository[5]. While it works it is still a work in progress, as I have not added sufficient unit tests.
The other feature that is desirable would be a copy constructor for FFN (feed forward network) and CNN (convolutional neural network). Currently if I want to have two programs one for learning and one for execution, I would need to cut and paste those ten lines of code shown above. However if I have a fuction that returns the model then the code is not duplicated; something along the following lines:

auto CreateFFN(
           size_t inputVectorSize, 
           size_t hiddenLayerSize, 
           size_t outputVectorSize)
{
	LinearLayer<> inputLayer(inputVectorSize, hiddenLayerSize);
	BiasLayer<> inputBiasLayer(hiddenLayerSize);
	BaseLayer<LogisticFunction> inputBaseLayer;

	LinearLayer<> hiddenLayer1(hiddenLayerSize, outputVectorSize);
	BiasLayer<> hiddenBiasLayer1(outputVectorSize);
	BaseLayer<LogisticFunction> outputLayer;

	BinaryClassificationLayer classOutputLayer;

	auto modules = std::make_tuple(inputLayer, inputBiasLayer, inputBaseLayer,
	hiddenLayer1, hiddenBiasLayer1, outputLayer);

	FFN<decltype(modules),
            decltype(classOutputLayer),
            MeanSquaredErrorFunction> net(modules, classOutputLayer);
	return net;
}

Notice that I replaced tie with make_tuple. Hence I added the copy constructor – not really “added” because I just removed the destructor and allowed the compiler to generate the copy constructor.

References

  1. Tom Mitchell, “Machine Learning,” McGraw Hill 1997
  2. Pedro Demingo, “https://www.coursera.org/course/machlearning,&#8221; as of Jan.26, 2016
  3. Erich Gamma et al., “Design Patterns,” Addison Wesley, 1995
  4. Robert Ramey, “http://www.boost.org/doc/libs/1_60_0/libs/serialization/doc/,&#8221; as of Jan.26, 2016
  5. Joe Mariadassou, “https://github.com/theSundayProgrammer/mlpack,&#8221; as of Jan.26, 2016
  6. MLPack, “http://www.mlpack.org&#8221;
Advertisements

About The Sunday Programmer

Joe is an experienced C++/C# developer on Windows. Currently looking out for an opening in C/C++ on Windows or Linux.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s