The MNIST handwritten digit database is a very popular data set for testing machine learning algorithms. It contains 60,000 labeled training examples and 10,000 examples for testing. The data set can be downloaded from here. On GitHub I have published a repository which contains a file `mnist.mat`

created from this raw data set which can easily be loaded with Octave or MATLAB so that you can easily use the data set in Octave or MATLAB.

Either you can use this file directly or you can create it with the `mnist.py`

Python script contained in this repository.

## Loading the data set directly in Octave/MATLAB

This step does not cloning the repository. You just have to download one file.

Download the file:

```
wget https://github.com/daniel-e/mnist_octave/raw/master/mnist.mat
```

Start Octave and type:

```
d = load('mnist.mat');
```

Now, `d`

is a struct which contains the training and testing examples with the corresponding labels:

- d.trainX is a (60000,784) matrix which contains the pixel data for training
- d.trainY is a (1,60000) matrix which contains the labels for the training data
- d.testX is a (10000,784) matrix which contains the pixel data for testing
- d.testY is a (1,10000) matrix which contains the labels for the test set

You could now display a single example. For example, to display the third example of the training set type the following:

```
X = d.trainX;
i = reshape(X(3,:), 28, 28)';
image(i);
```

## Create an Octave/MATLAB file from the raw data set

For this step it is recommended to clone the repository via:

```
git clone https://github.com/daniel-e/mnist_octave.git
```

First, you require some Python packages. This step is optional if you already have all packages installed. You can either install the required packages directly into your system with `pip`

or you can create a virtual environment into which the packages are installed.

**Install packages directly**

```
pip3 install scipy numpy matplotlib
```

**Install the packages into a virtual environment**

```
virtualenv -p python3 venv
# activate the virtual environment
source venv/bin/activate
pip3 install --upgrade pip
pip3 install scipy numpy matplotlib
```

**Download the raw data set**

If you cloned the GitHub repository this step is optional as the repository already contains the files. If you havenâ€™t cloned it you have to download the data set from http://yann.lecun.com/exdb/mnist/.

```
wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
```

**Execute mnist.py to create a matrix for Octave and MATLAB.**

```
./mnist.py
```

A new file `mnist.mat`

is created which contains the data. This matrix can now be loaded with Octave or MATLAB as described above.