Looking at stuff case by case

A review of the competition reveals that both training and test set only contains 10 patient cases.

So I thought ti would be a useful thing to recreat the previous exercise by grouping each feature by case.

So we start:

The code to generate the plot by case would be something like this:

test2 =test2 %>% mutate(annon_len = annotation %>% map_int(nchar))

ggplot(test2, aes(x=annon_len)) +

geom_histogram(color=”black”, fill=”white”) +

ggtitle(“Lenght of annotation per case number”) +

facet_wrap(~case_num)

We can also do a boxplot to get a better idea.

http://www.sthda.com/english/wiki/ggplot2-facet-split-a-plot-into-a-matrix-of-panels

This code for example:

ggplot(test2, aes(x=case_num, y=annon_len, group=case_num)) +

geom_boxplot(aes(fill=case_num))

will return a boxplot.Visualization matters. If you play around with it a little bit , you’ll realize that having both histogram and boxplot tells a more complete story than either alone. The boxplots imply the cases are much more similar than not while it is clear from the histogram that it is so.

We can also cut stuff by intervals (see my upcoming post on R basics) if we really want to dig in:

for (x in c(0:9)){

print(“================================================”)

print(x)

test2[test2$case_num == x,]$annon_len %>% cut(breaks=5) %>% table %>% print

There is one weakness to such an ad-hoc approach, namely that we have to check things one by one.In this case, even if we don’t want to go down to the level of individual features, we can see that for 10 cases * 3 or 4 features/columns = 30 to 40 things we need to check, we have to find a way to visualize and digest them easily.

Keep it real

Looking at stuff case by case

Leave a comment Cancel reply

Looking at stuff case by case

Share this:

Leave a comment Cancel reply