Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Hands On: Thinking About Testing

thinking_about_testing.livemd

Hands On: Thinking About Testing

Problem & Solution

Situation:

You have a large dataset of handwritten characters with 1,000,000 samples and split them into 3 - 900,000 for training, 50,000 for validation, and 50,000 for testing.

Problem:

The data could be ordered. Consequently, there is a great chance that some letters are exclusively included only on the validation or test sets, and are missing on the training set and vice versa. Hence there are some data that were never before seen during training and would be hard for the model to determine.

Solution:

Shuffle the data before splitting them into training, validation, and test set like in MNIST where data is already pre-shuffled. This will ensure that distribution of data is the same throughout the three sets.