In the AI transformation of an organization, the machine learning strategy plays a significant role in the ultimate fate of any machine learning project. Let's say we have trained a classifier with 90% accuracy on test examples. But that accuracy may not be good enough for the application.
To improve classifier's accuracy, we can try a range of measures, for instance;
- collect more data
- collect more diverse training set
- train algorithm for longer duration with gradient descent
- try Adam instead of straight gradient descent
- try a bigger network, try smaller network
- try dropout optimization method
- try $l_2$ regularization
- try different network architectures - activation functions, # of hidden units, etc.
There could be a lot of great ideas to improve a deep learning algorithm. But the problem is that we might pick up the wrong idea and spend months without substantial improvement in algorithm accuracy. For instance, collecting data for months is one of the most wrongly picked options in machine learning. Therefore, it is worth having a machine learning strategy, with which we can evaluate our options and choose the one which is most promising.
Orthogonalization
Parameter tuning is one of the areas where we have to pick the right parameter to tune first from many possible ones. Orthogonalization is about what to tune to achieve one effect - knowingly it is the trait of successful machine learning practitioners.
Orthogonalization is about separating tuning knob for the required effect, rather than a working on combined knob for multiple aspects. For instance, in a machine learning algorithm, we need to perform well on following fronts i.e., for four different effects;
- Training set (on cost function) ~ human-level performance
- Dev set
- Test set
- Real-world
According to Orthogonalization, we must achieve one effect from each at a time - training set first and the real world at the end.
Note: this article is inspired by Andrew Ng lecture.