Skip to main content

[ML Question] Handling Data Skew

Assume that we have a data set consisting of two classes A and B such that the data belonging to class A and B are in the ratio 1:9. We want to train a logistic regression on the dataset. 

What will you do down sample B or up sample A? Are the two options equivalent ? If the two options are not equivalent then what are advantages/disadvantages of the two options? 

Comments