Assume that we have a data set consisting of two classes A and B such that the data belonging to class A and B are in the ratio 1:9. We want to train a logistic regression on the dataset. What will you do down sample B or up sample A? Are the two options equivalent ? If the two options are not equivalent then what are advantages/disadvantages of the two options?
Assume that you have constructed a logistic regression model for a binary classification problem to predict network intrusion. The ratio of positive class (there is a network intrusion) samples to negative class (there is no network intrusion) samples was 1:99 in the training data. There were a large number (approx. 1000) of features used to train a model. All the conventional ML wisdom was used to train the model including right training data scaling, cost sensitive training, feature normalization etc. When you deployed the system into actual production, you observe that a large percentage (approx 10%) of actual instances are having missing values, where as in our training data we had 100% coverage for features. To relabel the new samples with missing features will take atleast 15 days. What will you do in the meantime while the relabeling data comes?