Discussion and further work
This model is now ready to be used to predict things. Is this the best model? No, it's not. Finding the best model is a never ending quest. To be sure, there are indefinite ways of improving this model. One can use LASSO methods to determine the importance of variables before using them.
The model is not only the linear regression, but also the data cleaning functions and ingestion functions that come with it. This leads to a very high number of tweakable parameters. Maybe if you didn't like the way I imputed data, you can always write your own method!
Furthermore the code in this chapter can be cleaned up further. Instead of returning so many values in the clean function, a new tuple type can be created to hold the Xs and Ys—a data frame of sorts. In fact, that's what we're going to build in the upcoming chapters. Several functions can be made more efficient using a state-holder struct.
If you will note, there are not very many statistical packages like Pandas for Go. This is not for the lack of trying. Go as a language is all about solving problems, not about building generic packages. There are definitely dataframe-like packages in Go, but in my experience, using them tends to blind one to the most obvious and efficient solutions. Often, it's better to build your own data structures that are specific to the problem at hand.
For the most part in Go, the model building is an iterative process, while productionizing the model is a process that happens after the model has been built. This chapter shows that with a little awkwardness, it is possible to build a model using an iterative process that immediately translates to a production-ready system.