In modern big data era, machine learning strategy plays significant role in ultimate fate of a machine learning project. Let's say we have trained a classifier with 90% accuracy on test examples. But that accuracy is not be good enough for the application. To improve classifier’s accuracy, we can try a range of measures, for instance;
- collect more data
- collect more diverse training set
- train algorithm for longer with gradient descent
- try Adam instead of straight gradient descent
- try bigger network, try smaller network
- try dropout optimization method
- try $l_2$ regularizaiton
- try different network architectures - activation functions, # of hidden units etc.
There could be a lots of great ideas to improve deep learning algorithm. But the problem is that we might pick up the wrong idea and spend months without substantial improvement in algorithm accuracy. For instance, collecting data for months is one of the most wrongly picked option in machine learning. Therefore, it is worth to have a machine learning strategy, with which we can evaluate our options and pick the one which is most promising.
Parameter tuning is one of the area where we have to picking right parameter to tune first from many possible ones.
Orthogonalization is about what to tune to achieve one effect - knowingly it is trait of successful machine learning practitioners.
That means we need to separate tuning knob for required effect, rather than a working on combined knob for multiple aspects. For instance, in machine learning algorithm, we need perform well on following fronts i.e. for 4 different effects;
- Training set (on cost function) ~ human level performance
- Dev set
- Test set
- Real world
According to orthogonalization, we must achieve one effect from each at a time - training set first and real world at the end.
Note: this article is inspired from Andrew Ng lecture.