
How to do it...
A deep neural network architecture is built by adding multiple hidden layers between input and output layers, as follows:
- Load the dataset and scale it:
(X_train, y_train), (X_test, y_test) = mnist.load_data()
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
X_train = X_train/255
X_test = X_test/255
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
- Build a model with multiple hidden layers connecting the input and output layers:
model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(1000,activation='relu'))
model.add(Dense(1000,activation='relu'))
model.add(Dense(10, activation='softmax'))
The preceding model architecture results in a model summary, as follows:

Note that the preceding model results in a higher number of parameters, as a result of deep architectures (as there are multiple hidden layers in the model).
- Now that the model is set up, let's compile and fit the model:
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=250, batch_size=1024, verbose=1)
The preceding results in a model with an accuracy of 98.6%, which is slightly better than the accuracies we observed with the model architectures that we saw earlier. The training and test loss and accuracy are as follows (the code to generate the plots in the following diagram remains the same as the code we used in step 8 of the Training a vanilla neural network recipe):

Note that, in this scenario, there is a considerable gap between training and test loss, indicating that the deep feedforward neural network specialized on training data. Again, in the sections on overfitting, we will learn about ways to avoid overfitting on training data.