上QQ阅读APP看书，第一时间看更新

How to do it...

Import the relevant packages and dataset, and visualize the input dataset:

from keras.datasets import mnist
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import np_utils
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In the preceding code, we are importing the relevant Keras files and are also importing the MNIST dataset (which is provided as a built-in dataset in Keras).

The MNIST dataset contains images of digits where the images are of 28 x 28 in shape. Let's plot a few images to see what they will look like in the code here:

import matplotlib.pyplot as plt
%matplotlib inline
plt.subplot(221)
plt.imshow(X_train[0], cmap=plt.get_cmap('gray'))
plt.grid('off')
plt.subplot(222)
plt.imshow(X_train[1], cmap=plt.get_cmap('gray'))
plt.grid('off')
plt.subplot(223)
plt.imshow(X_train[2], cmap=plt.get_cmap('gray'))
plt.grid('off')
plt.subplot(224)
plt.imshow(X_train[3], cmap=plt.get_cmap('gray'))
plt.grid('off')
plt.show()

The following screenshot shows the output of the previous code block:

Flatten the 28 x 28 images so that the input is all the 784 pixel values. Additionally, one-hot encode the outputs. This step is key in the dataset preparation process:

# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')

In the preceding step, we are reshaping the input dataset using the reshape method that converts an array of a given shape into a different shape. In this specific case, we are converting an array that has an X_train.shape[0] number of data points (images) where there are X_train.shape[1] rows and X_train.shape[2] columns in each image, into an array of an X_train.shape[0] number of data points (images) and X_train.shape[1] * X_train.shape[2] values per image. Similarly, we perform the same exercise on the test dataset:

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

Let's try to understand how one-hot encoding works. If the unique possible labels are {0, 1, 2, 3}, they will be one-hot encoded, as follows:

Essentially, each label will occupy a unique column in the dataset, and if the label is present, the column value will be one, and every other column value will be zero.

In Keras, the one-hot encoding approach on top of labels is performed using the to_categorical method, which figures out the number of unique labels in the target data, and then converts them into a one-hot encoded vector.

Build a neural network with a hidden layer with 1,000 units:

model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(10,  activation='softmax'))

In the preceding step, we mention that the input has 784 values that are connected to 1,000 values in a hidden layer. Additionally, we are also specifying that the activation, which is to be performed in the hidden layer after the matrix multiplication of the input and the weights connecting the input and hidden layer, is the ReLu activation.

Finally, the hidden layer is connected to an output that has 10 values (as there are 10 columns in the vector created by the to_categorical method), and we perform softmax on top of the output so that we obtain the probability of an image belonging to a certain class.

The preceding model architecture can be visualized as follows:

model.summary()

A summary of model is as follows:

In the preceding architecture, the number of parameters in the first layer is 785,000, as the 784 input units are connected to 1,000 hidden units, resulting in 784 * 1,000 weight values, and 1,000 bias values, for the 1,000 hidden units, resulting in a total of 785,000 parameters.

Similarly, the output layer has 10 outputs, which are connected to each of the 1,000 hidden layers, resulting in 1,000 * 10 parameters and 10 biases—a total of 10,010 parameters.

The output layer has 10 units as there are 10 possible labels in the output. The output layer now gives us a probability value for each class for a given input image.

Compile the model as follows:

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Note that because the target variable is a one-hot encoded vector with multiple classes in it, the loss function will be a categorical cross-entropy loss.

Additionally, we are using the Adam optimizer to minimize the cost function (more on different optimizers in the Varying the loss optimizer to improve network accuracy recipe).

We are also noting that we will need to look at the accuracy metric while the model is getting trained.

Fit the model as follows:

history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, batch_size=32, verbose=1)

In the preceding code, we have specified the input (X_train) and the output (y_train) that the model will fit. Additionally, we also specify the input and output of the test dataset, which the model will not use to train weights; however it, will give us an idea of how different the loss value and accuracy values are between the training and the test datasets.

Extract the training and test loss and accuracy metrics over different epochs:

history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
acc_values = history_dict['acc']
val_acc_values = history_dict['val_acc']
epochs = range(1, len(val_loss_values) + 1)

While fitting a model, the history variable will have stored the accuracy and loss values corresponding to the model in each epoch for both the training and the test datasets. In the preceding steps, we are storing those values in a list so that we can plot the variation of accuracy and loss in both the training and test datasets over an increasing number of epochs.

Visualize the training and test loss and the accuracy over a different number of epochs:

import matplotlib.pyplot as plt
%matplotlib inline 

plt.subplot(211)
plt.plot(epochs, history.history['loss'], 'rx', label='Training loss')
plt.plot(epochs, val_loss_values, 'b', label='Test loss')
plt.title('Training and test loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.subplot(212)
plt.plot(epochs, history.history['acc'], 'rx', label='Training accuracy')
plt.plot(epochs, val_acc_values, 'b', label='Test accuracy')
plt.title('Training and test accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()]) 
plt.legend()
plt.show()

The preceding code produces the following diagram, where the first plot shows the training and test loss values over increasing epochs, and the second plot shows the training and test accuracy over increasing epochs:

Note that the previous network resulted in an accuracy of 97%. Also, note that loss values (and thereby, accuracy) have a step change over a different number of epochs. We will contrast this change in loss with the scenario when the input dataset is scaled in the next section.

Let's calculate the accuracy of the model manually:

preds = model.predict(X_test)

In the preceding step, we are using the predict method to calculate the expected output values for a given input (X_test in this case) to the model. Note that we are specifying it as model.predict, as we have initialized a sequential model named model:

import numpy as np
correct = 0
for i in range(len(X_test)):
    pred = np.argmax(preds[i],axis=0)
    act = np.argmax(y_test[i],axis=0)
    if(pred==act):
        correct+=1
    else:
        continue

correct/len(X_test)

In the preceding code, we are looping all of the test predictions one at a time. For each test prediction, we are perming argmax to obtain the index that has the highest probability value.

Similarly, we perform the same exercise for the actual values of the test dataset. The prediction of the index of the highest value is the same in both the prediction and the actual values of the test dataset.

Finally, the number of correct predictions over the total number of data points in the test dataset is the accuracy of the model on the test dataset.