Why do we compare machine learning algorithm with human-level performance? There are two reasons for that:
1) machine learning algorithms have become much more efficient due availability of data and computational power,
2) machine learning workflows have led to machine learning algorithms which are competitive with human-level performance.
In most machine learning problems, the progress (in terms of algorithm accuracy) is pretty fast in the beginning. However, as we surpass human-level performance the progress slows down. Though we keep pushing by training our model on more and more data, but it never surpasses the theoretical limit of Bayes optimal error.
Bayes error is the very best theoretical mapping function from $X$ to $Y$, which no machine learning algorithm can ever surpass. For instance, in speech recognition cannot be 100% accurate due to noise in audio. Similarly, in image recognition, some images cannot be recognized at all because they are so blurry.
Often human-level performance is not far from Bayes error. That means once machine learning algorithm has surpassed human-level performance there is not much room left for improvement. That is why machine learning progress slows down after surpassing of human-level performance.
So long as machine learning model is worse than human-level performance, there are few tricks we can try to improve its performance;
- get labeled data from humans
- manual error analysis
- bias/variance analysis.