上QQ阅读APP看书，第一时间看更新

How to do it...

Let's go through the set up of scaling the dataset that we have used in the previous section, and compare the results with and without scaling:

Import the relevant packages and datasets:

from keras.datasets import mnist
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import np_utils

(X_train, y_train), (X_test, y_test) = mnist.load_data()

There are multiple ways to scale a dataset. One way is to convert all the data points to a value between zero and one (by dividing each data point with the maximum value in the total dataset, which is what we are doing in the following code). Another popular method, among the multiple other ways, is to normalize the dataset so that the values are between -1 and +1 by subtracting each data point with the overall dataset mean, and then dividing each resulting data point by the standard deviation of values in the original dataset.

Now we will be flattening the input dataset and scaling it, as follows:

# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')

X_train = X_train/255
X_test = X_test/255

In the preceding step, we have scaled the training and test inputs to a value between zero and one by dividing each value by the maximum possible value in the dataset, which is 255. Additionally, we convert the output dataset into a one-hot encoded format:

# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

Build the model and compile it using the following code:

model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(10,  activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Note that the preceding model is exactly the same as the one we built in the previous section. However, the only difference is that it will be executed on the training dataset that is scaled, whereas the previous one was not scaled.

Fit the model as follows:

history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=500, batch_size=32, verbose=1)

You will notice that the accuracy of the preceding model is ~98.25%.

Plot the training and test accuracy and the loss values over different epochs (the code to generate the following plots remains the same as the one we used in step 8 of the Training a vanilla neural network recipe):

From the preceding diagram, you should notice that training and test losses decreased smoothly over increasing epochs when compared to the non-scaled dataset that we saw in the previous section.

While the preceding network gave us good results in terms of a smoothly decreasing loss value, we noticed that there is a gap between the training and test accuracy/loss values, indicating that there is potential overfitting on top of the training dataset. Overfitting is the phenomenon where the model specializes on the training data that it might not work as well on the test dataset.