Machine Learning with C++ MLPack on Windows

In this blog I cover how to install MLPack on Windows using C++. While this not difficult, you could easily go on a wild goose chase if you do not chant the incantations for Cmake in the proper fashion. I went through that drill and hopefully after reading this you will be spared the the hair pulling experience I went through. I used Visual Studio and in the rest of this discussion I assume that you are doing the same. Using MinGW should not be too different except you don’t have the luxury of an IDE.

Prequisites
Statistical techniques invariably land up performing matrix computations. It is a tribute to Open Source that the most popular linear algebra library is BLAS/LAPACK. There are other libraries like MKL for Intel, ACML from AMD or Atlas and all of them provide the the same Application Programming Interface (API). MLPack depends on Armadillo which in turn depends on an LAPack which depends on BLAS.

Armadillo
Hence first download Armdillo from SourceForge and extract the files. You should now add the include folder within armdillo to your ‘include’  path. Armadillo comes with LAPACK and BLAS libraries prebuilt. To ensure that armdillo has been set up correctly, you would do well to build the example1.cpp. Armadillo comes with its own copy of LAPACK and Blas Libraries. If you don’t have the commercial alternatives, you should link with those ‘lib’ files at link, and ensure that the DLL’s are found in the executable path of Example1. Ensure that the example1 runs to satisfaction. There is a ‘wrapper’ project for which I have not yet found any use. There are a few MLPack tests that fail with the LAPack and BLAS libraries that come with Armadillo, which I will list below. Fortunately there are other versions of the library that do not have this problem. One of them can found here

Boost
MLPack depends on Boost for unit testing. Hence install Boost for windows as described here. Notice that in addition to test you need to build

  • program_options
  • serialization.

It is best to build them as static libraries. You want to install the Boost Unit Test Adapter for Visual Studio as well to get you going.

MLPack Download
MlPack can be downloaded as a zip file from Github. Better still pull the project into a local folder. MLPack uses CMake to create the project files. Install CMake if have not already and launch the GUI. Specify the folder for source and build. The source folder must contain the CMakeList.txt file. The build folder need not exist. It is usually specified as a sub-folder ‘build’ of source directory. Click on ‘Advanced’ to see all the gory details. You need to add the following entries

  • ARMADILLO_INCLUDE_DIR
  • BLAS_LIBRARY
  • LAPACK_LIBRARY
  • Boost_DIR
  • Boost_LIBRARYDIR

The variable names are self descriptive. Just remeber that the ‘DIR’ suffix points to a folder and ‘LIBRARY’ suffixe refers to a file. Now click on configure. Ensure that you select the the right compiler  and if it is Visual Studio take care to select 64 bit or 32 bit as the case may be.

If you are not familiar with CMake GUI…
Click on ‘Add Entry’. There enter the name of the variable and select the type as PATH or FILEPATH for folder or file respectiuvely. Now click on Configure. That will create the cache file. Check the output window.  Assuming there are no errors, click on Generate. This will then create the project files.

Command Line
If you are not afraid of command line, cd to your mlpack folder. Create a sub folder called build or any other name, cd into it and try the following batch file:


"C:\Program Files (x86)\CMake\bin\cmake.exe" ^
-G "Visual Studio 14 2015 Win64" ^
-DARMADILLO_INCLUDE_DIR:PATH="C:/usr/include" ^
-DBoost_PROGRAM_OPTIONS_LIBRARY_DEBUG:FILEPATH="C:/usr/lib/x64/boost/libboost_program_options-vc140-mt-1_60.lib" ^
-DBoost_PROGRAM_OPTIONS_LIBRARY_RELEASE:FILEPATH="C:/usr/lib/x64/boost/lib" ^
-DBoost_INCLUDE_DIR:PATH="c:/usr/include" ^
-DBoost_SERIALIZATION_LIBRARY_DEBUG:FILEPATH="C:/usr/lib/x64/boost/libboost_serialization-vc140-mt-1_60.lib" ^
-DACMLMP_LIBRARY:FILEPATH="ACMLMP_LIBRARY-NOTFOUND" ^
-DBoost_LIBRARY_DIR_DEBUG:PATH="C:/usr/lib/x64/boost" ^
-DBLAS_LIBRARY:FILEPATH="C:/usr/lib/x64/Release/libblas.lib" ^
-DLAPACK_LIBRARY:FILEPATH="C:/usr/lib/x64/Release/liblapack.lib" ^
-DBoost_UNIT_TEST_FRAMEWORK_LIBRARY_DEBUG:FILEPATH="C:/usr/lib/x64/boost/libboost_unit_test_framework-vc140-mt-1_60.lib" ..

You need to change the values to suit your machine.

Building and Unit Testing
The ‘.sln’ file is in the build folder. Open it in Visual Studio. The Build will take along time… about half an hour on my slow machine. Assuming there are no errors, set the active project to mlpack_test. Still in Visual Studio click on Test\Windows\Test Explorer. This will list all tests. ‘Run All Tests’ should start the unit tests. If there is a failure, try this:

  • Copy mlpack_test.vcxproj to my_test.vcxproj.
  • Use an editor lite Notepad++ to open my_test and
    • remove the project guid.
    • Locate the CIInclude elements.
    • Delete all of them except mlpack_test.cpp and the specific test file that is failing Eg. load_save_test.cpp.
  • Build and run my_test from within Visual Studio.

This often passes the failing test. I am not sure why, I think this has something to do with the tests being run concurrently.

Failing Tests
The following tests fail when ML pack is used with the LAPack and BLAS libraries supplied with Armadillo:

  • FullRankTest
  • Rank10Test
  • GermanTest

Footnote
MLPack as of June 26, 2016, generates link errors on windows. I had to manually create a mlpack.def file and add it to the mlpack project.

Fix for Issue 524
Issue 524 deals with declaring a static variable static CLIDeleter cliDeleter; inside a header file which is indirectly include in many source files. The static object is thus created many times. It is not noticeable because apart from slowing down startup and shutdown it won’t have any affect. My suggestion was “to move CLI away from core and include it explicitly in all the applications.” While this may not be acceptable from the point of view of usability, for now, I have no other option other than to live with those multiple copies of the static object.

Advertisements

About The Sunday Programmer

Joe is an experienced C++/C# developer on Windows. Currently looking out for an opening in C/C++ on Windows or Linux.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s