Machine Learning in Healthcare (and otherwise)

The latest issue of JAMA arrived in my snail-mail box this week and it contained an editorial about machine learning in healthcare. This was, no doubt, in follow-up, to a recent issue of JAMA which contained 2 articles about deep learning algorithms which performed well in the detection of diabetic retinopathy and metastatic cancer.

This editorial does an excellent job of balancing the exciting potential of such algorithms with realistic expectations. Futher it succinctly encapsulates the various tiers of algorithmic complexity. Perhaps most interestingly the editorial contains a figure which maps out many familiar and not-so-familiar algorithms into a chart of human involvement vs. data volume. One of these cited algorithms is a study of coronary risk scoring algorithms developed by examining the electronic records of primary care patients in England. Authors used several techniques (random forest, logistic regression, gradient boosting, and neural networks) and compared performance to a classic risk score from the American Heart Association. They found that all of the machine learning algorithms out-performed the risk score. The intriguing part to me was that logistic regression performed nearly as well as neural networks, and quite a bit better than random forest. Two reasons why that is interesting: 1) I am a nerd, 2) neural nets and random forest are notoriously ‘black box’, while logistic regression is very clear in how the risk prediction is affected by the input variables (so-called ‘glass box’).

The bottom line is this: it is possible to find models which perform extremely well and that still maintain clarity.

You can find the full article here.

Book Review 2: Predictive Analytics

Another book review. Well, I had to give some talks recently – one on predictive analytics and another on genetics – and so I read The Gene by Dr. Siddhartha Mukherjee which I reivew here, and Predictive Analytics by Eric Siegel.

I really admire how both of these authors can cover what are really very technical topics in plain language. Where Dr. Mukherjee’s style emphasizes scientific rigor, inspiration and wonder, Mr. Seigel tends more toward the astounding or the entertaining. He covers multiple relevant topics and uses copious examples to illustrate predictive analtyics an the insights it can provide. He goes into especially great detail with IBM Watson’s performance on the game show Jeopardy!. There is quite a lot to unpack in that example from language processing to appropriate seleciton of candidate answer to the arrival at a final probabity of being correct.

Additional technical topics included linear and logistic regression, decision trees and ensembles.

Overall this is an excellent book and would be very valuable for those who need to use the results of predictive analyses in their business. It will not, however, enable you to perform these analyses yourself – it is just not that kind of text.

New Publications

The latest issue of the Journal of Insurance Medicine posted today. It contains 2 articles that I authored, one as a contributor along with a great group of friends and colleagues on MIB’s Mortality Research and Analysis Committee (about breast cancer mortality), and another as the lone author about the Random Forest algorithm for survival data.

I’ll spoil the conclusion on that last one – when I used a Cox model and a RSF model on colon cancer survival data from SEER, they had very similar concordance error rates, which is kind of a vote for Cox in that circumstance since the hazard ratio output offers a readier quantification of the relative importance of the predictors.

I got the idea to do this while taking courses in the Coursera Data Analysis Signature track. We had to do a project with our own data and create a Shiny app to go with it. (A Shiny app is an interactive web page that can be created using R and R-studio). I chose to create a colon cancer survival calculator based on SEER data and using a Random Forest approach. You can try out my app here, but be patient, it takes a while to load the first time.