data:image/s3,"s3://crabby-images/8d6a4/8d6a4d1f288b3fd8580f3f526143dacf11f0abe2" alt="Neural Networks with Keras Cookbook"
Getting ready
To understand the impact of varying the optimizer on network accuracy, let's contrast the scenario laid out in previous sections (which was the Adam optimizer) with using a stochastic gradient descent optimizer in this section, while reusing the same MNIST training and test datasets that were scaled (the same data-preprocessing steps as those of step 1 and step 2 in the Scaling the dataset recipe):
model = Sequential()
model.add(Dense(1000, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, batch_size=32, verbose=1)
Note that when we used the stochastic gradient descent optimizer in the preceding code, the final accuracy after 100 epochs is ~98% (the code to generate the plots in the following diagram remains the same as the code we used in step 8 of the Training a vanilla neural network recipe):
data:image/s3,"s3://crabby-images/bdf35/bdf3575a853b68dfbb7afd85774ed504adac0a0c" alt=""
However, we should also note that the model achieved the high accuracy levels much more slowly when compared to the model that used Adam optimization.