First thing first: I started reading the instructions for the competition (https://www.kaggle.com/c/nbme-score-clinical-patient-notes/data?select=train.csv) and taking a look at the dataset.
We are given typewritten patient notes alongside features (or in this case) ascribed to each patient case. Our job is to identify which features .
Not every one of these patient notes are annotated.
First step is to understand the performance metric.
Much of what I write is this article rewritten in my own words and for my own reference:
I also referred to wikipedia for bits and pieces I didn’t quite get:https://en.wikipedia.org/wiki/F-score
The best way to remember things is to use a concrete example so let’s do this. Let’s say we wanted to test how good our anti-virus (henceforth referred to as “AV”) is.
We’ll define a few terms and provide an intuitive view of what said terms measure in this context.
FP = False positive = The model claims a virus is present but the file is benign
TP = True Positive = the model claims a virus is present and the file really is virus
FN = False Negative = The model claims the file is free of viruses but the file is a virus
TN = True Negative = The model claims the file is free of viruses and it really is a virus
Precision = # of correct results found(TP)/ All results identified as positive (TP + FP)
Intuition: is your anti-virus identifying your 5 year nephew’s .txt file about koalas as a rootkit?
Recall = # of correct results found(TP)/ # of correct results in total (TP + FN)
Intuition: How many viruses did your anti-virus miss?
The above case is not limited . In the case, the only difference we have is that we gauge whether the AV . E.g. Suppose that the AV vendor wants to .
If the AV incorrectly identifies
In this case, we have just prepend “micro” to the names of the definitions above. We have:
E.g. calculating either precision,recall or accuracy will give us the micro-F1 value.