@mistertandon Thanks a lot for reading my post, I'm glad you found it useful.
Now coming to your question, in the vectorized implementation we are indeed taking the average of gradients, just like in the pseudo-code in figure 39. I think you may have overlooked figure 50 while perusing the post, it shows how the derivatives end up being the average because of our use of the Cost Function.