[ML Questions] Situation of Missing Data

Assume that you have constructed a logistic regression model for a binary classification problem to predict network intrusion. The ratio of positive class (there is a network intrusion) samples to negative class (there is no network intrusion) samples was 1:99 in the training data. There were a large number (approx. 1000) of features used to train a model.

All the conventional ML wisdom was used to train the model including right training data scaling, cost sensitive training, feature normalization etc.

When you deployed the system into actual production, you observe that a large percentage (approx 10%) of actual instances are having missing values, where as in our training data we had 100% coverage for features.

To relabel the new samples with missing features will take atleast 15 days. What will you do in the meantime while the relabeling data comes?

Rahul Agrawal

Search This Blog

[ML Questions] Situation of Missing Data

Labels

Comments

Post a Comment