上QQ阅读APP看书，第一时间看更新

Getting ready

To understand the impact of the scaling input on the output, let's contrast the scenario where we check the output when the input dataset is not scaled, with the output when the input dataset is scaled.

Input data is not scaled:

In the preceding table, note that the output (sigmoid) did not vary a lot, even though the weight value varied from 0.01 to 0.9. The sigmoid function is calculated as the sigmoid value of the multiplication of the input with the weight, and then adding a bias to it:

output = 1/(1+np.exp(-(w*x + b))

Where w is the weight, x is the input, and b is the bias value.

The reason for no change in the sigmoid output is due to the fact that the multiplication of w*x is a large number (as x is a large number) resulting in the sigmoid value always falling in the saturated portion of the sigmoid curve (saturated value on the top-right or bottom-left of the sigmoid curve).

In this scenario, let's multiply different weight values by a small input number, as follows:

The sigmoid output in the preceding table varies, as the input and weight values are small, resulting in a smaller value when the input and the weight are multiplied, further resulting in the sigmoid value having variation in output.

From this exercise, we learned about the importance of scaling the input dataset so that it results in a smaller value when the weights (provided the weights do not have a high range) are multiplied by the input values. This phenomenon results in the weight value not getting updated quickly enough.

Thus, to achieve the optimal weight value, we should scale our input dataset while initializing the weights to not have a huge range (typically, weights have a random value between -1 and +1 during initialization).

These issues hold true when the weight value is also a very big number. Hence, we are better off initializing the weight values as a small value that is closer to zero.