
How it works...
You should notice that the accuracy is much lower initially and that it catches up only after a considerable number of epochs are run. The reason for a low accuracy during initial epochs is that the number of times of weight update is much lower in this scenario when compared to the previous scenario (where the batch size was smaller).
In this scenario, when the batch size is 30,000, and the total dataset size is 60,000, when we run the model for 500 epochs, the weight updates happens at epochs * (dataset size/ batch size) = 500 * (60,000/30,000) = 1,000 times.
In the previous scenario, the weight updates happens at 500 * (60,000/32) = 937,500 times.
Hence, the lower the batch size, the more times the weights get updated and, generally, the better the accuracy is for the same number of epochs.
At the same time, you should be careful not to have too few examples in the batch size, which might result in not only having a very long training time, but also a potential overfitting scenario.