The MNIST handwritten digit database is a very popular data set for testing machine learning algorithms. It contains 60,000 labeled training examples and 10,000 examples for testing. The data set can be downloaded from here. On GitHub I have published a repository which contains a file
mnist.mat created from this raw data set which can easily be loaded with Octave or MATLAB so that you can easily use the data set in Octave or MATLAB.
Either you can use this file directly or you can create it with the
mnist.py Python script contained in this repository.
Loading the data set directly in Octave/MATLAB
This step does not cloning the repository. You just have to download one file.
Download the file:
Start Octave and type:
d = load('mnist.mat');
d is a struct which contains the training and testing examples with the corresponding labels:
- d.trainX is a (60000,784) matrix which contains the pixel data for training
- d.trainY is a (1,60000) matrix which contains the labels for the training data
- d.testX is a (10000,784) matrix which contains the pixel data for testing
- d.testY is a (1,10000) matrix which contains the labels for the test set
You could now display a single example. For example, to display the third example of the training set type the following:
X = d.trainX; i = reshape(X(3,:), 28, 28)'; image(i);
Create an Octave/MATLAB file from the raw data set
For this step it is recommended to clone the repository via:
git clone https://github.com/daniel-e/mnist_octave.git
First, you require some Python packages. This step is optional if you already have all packages installed. You can either install the required packages directly into your system with
pip or you can create a virtual environment into which the packages are installed.
Install packages directly
pip3 install scipy numpy matplotlib
Install the packages into a virtual environment
virtualenv -p python3 venv # activate the virtual environment source venv/bin/activate pip3 install --upgrade pip pip3 install scipy numpy matplotlib
Download the raw data set
If you cloned the GitHub repository this step is optional as the repository already contains the files. If you haven’t cloned it you have to download the data set from http://yann.lecun.com/exdb/mnist/.
wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Execute mnist.py to create a matrix for Octave and MATLAB.
A new file
mnist.mat is created which contains the data. This matrix can now be loaded with Octave or MATLAB as described above.