Hey Anirudh, thanks for checking out this short supplementary post.
Now coming to your question, I honestly have no idea why J. Redmon and Co. used this approach for the classification of objects in YOLOv1. Looking at their paper I see they constantly mention that they treat class prediction as well as bounding box prediction as a “regression problem”. That could be one of the reasons though I’m not sure. Plus, they also mention that they had a hard time training the model because they treated both the classification and bounding box prediction as a regression problem. Later versions of YOLO solve the classification problem with a softmax prediction like it normally is in neural networks, though you probably knew that. Maybe you can ask J. Redmon himself on twitter perhaps he has some free time during these corona days.