Mortality Analysis – The Next Level

 

I just returned from Seattle, where I was attending the AAIM Triennial Conference. For the 3rd time, I taught the Basic Mortality Methodology Course. It went very well and was well received by the group. Further, there were a couple of excellent sessions on analytics and big data applied to life insurance.

During this conference I was asked by several attendees and colleagues how they might further their study and understanding of mortality analysis. Some even broaden that to ask about data analysis in general, because it is becoming a more prominent skill in the world of life insurance.

So, in this post I will cite some of my favorite resources. Some are aimed at delivering a high-level understanding of analytics, while others are more technical, teaching you how to actually perform the analyses.

A lot of the books I mention are available as a free pdf or as a ‘pay what you want’ download. This is a testament to the open nature of much of the community that uses R and other types of free software. If you can, please consider paying something so that these fantastic resources continue to do their excellent work.

First, a bit of self-promotion. This is a link to 3 videos which I made for a seminar called “Mortality Analysis with Modern Tools”. It contains a refresher on basic, ‘classical’ mortality analysis, a tutorial on the use of SEER*Stat to gather cancer survival data, and a brief introduction to R, the free statistical programming language, and R-studio, a software tool which makes using R much easier.

The password is AAIM2016.
Books:
Life Expectancy in Court :  This book is a very clear, concise and easy to understand analysis of the use of actuarial analysis in the determination of life expectancy in a legal setting. It relies only on pencil-and-paper methods which are easily translated to spreadsheet tools.
Intro to Statistical Learning (this site has the pdf as well as links to slides and videos):  This is a methodological textbook, and a toned-down version of the very academic text “Elements of Statistical Learning”. It contains tons of great examples with R-code and detailed vignettes. There is a set of free videos by the authors and a robust community of folks who have attempted and completed the exercises at the back of each chapter.
Reckoning with Risk . This one is more for a lay audience – great for underwriters but also very clarifying for anyone else. The author really brings home the difference between common measure of test performance like specificity and sensitivity – and the more important measure of real world use like positive predictive value. Definitely a must read if you break into hives when someone shows you a 2×2 table.
R for Data Science. A bit more technical here – this one is not about statistics, but rather the manipulation, cleaning and display of data which is so integral to any analytic endeavor. I highly recommend the ‘tidy’ approach as is outlined in this text.
Courses:
Coursera Data Science: This is a series of 10 online courses which are available for a nominal fee. They provide an excellent introduction to the use of R as well as a fairly detailed look at the statistics underlying typical analytic methods like linear and logistic regression. I would say that 7 or 8 out of the 10 are fairly easy – though some have time-consuming homework. The others are pretty challenging. I leave it up to you to decide which ones. The authors include several biostatiticians from the Johns Hopkins School of Public Health.
Chromebook Data Science . I have not taken these, but I plan on checking out at least a couple of them. It is given by Jeff Leake – one of the aforementioned JHU biostatiticians, and focuses on the use of a simple web-enabled laptop to perform serious data analysis by making the most of web-based resources like AWS.
Regression Modelling Strategies: This is a book, but also a short (one-week) course offered at Vanderbilt University by the eminent statistician Frank Harrell. This is one if you are feeling like you might actually know something, and are ready to find out otherwise. I took this one a few years ago – it was excellent, but also very challenging.  Dr. Harrell also has a great blog and his book is an excellent resource.
There are so many more of these it would be folly for me to attempt to review them all.
Websites:
R-bloggers: A repository of web log posts from around the internet which deal with R and various other statistical issues. Their top-ten list is very useful, but you can also discover other nuggets of gold on any given visit.
KD nuggets: Speaking of nuggets, this site offers a wide array of articles relevant to data analysis. It’s scope is broader than R-bloggers so you may need to dig a little deeper to find what you want. But FYI they have much more comprehensive reviews of courses and books than I am offering here.
Personal Activity:
I can’t emphasize this enough. The best way to gain and retain mortality analysis skills is to actually practice those skills on real world data. So go ahead – analyze your department’s workflow, analyze the mortality risks you find in a relevant article, assign yourself the task of updating your company’s underwriting manual on a topic that interests you, or write a mortality abstract for publication. But find something that takes the task out of the theoretical realm and into the real world. You will find that your knowledge of the topic does the same thing.

Liquid Biopsies

There has been a lot of excitement in the life insurance industry over the potential applications of liquid biopsies. This is a test on a blood sample which can detect circulating cancer DNA.

In a pair of editorials in the journal Cancer recent developments in the field are presented. Part 1 is of particular interest, as it discusses the potential use of liquid biopsy as a screening test in the general population. It was very gratifying to see the the principal researcher, Dr. Papadopoulos, was cautious in his approach despite the 99% specificity. As he pointed out, though 99% sounds good, it is not sufficiently high for general population screening, where the prevalence of cancer is low. It was also noted that the test is much better at detecting late-stage cancer than early-stage cancer. This may be a problem, since early detection is what is needed to drive down mortality rates.

This particular weakness may actually be a strength if the purpose is life insurance testing. One of the main goals of testing in the life insurance industry is to guard against possible anti-selection. Because even late-stange malignancies may not cause abberations in the standard laboratory panel, current testing regimens may not be able to detect malignancy in those who do not know or do not want to admit that they have cancer. The liquid biopsy, however, if it is sufficiently inexpensive, may fill this need. Certainly, more research is needed before any conclusions related to population or life insurance testing can be reached.

Part 2 of the series deals primarily with the possible change in the way that Pap tests are done. Some new research has shown that Pap tests done with a different collection brush combined with liquid biopsy technology can detect cervical, endometrial and even some ovarian cancers. This is exciting because ovarian cancer really has no good screening test and is the 4th leading cause of cancer deaths in women. Again, caution is needed because this is not the first time that a promising screening test for ovarian cancer has come around, only to be found to be useless later on (CA-125).

 

Underwriting Manual Construction: Tables vs. Calculators

Yeah, I know, exciting, right?

Well, I have had a good amount of experience authing manual sections and it seem that no matter where you go there are a lot of opinions about the use of calculators vs. tables, especially when it comes to sections dealing with labs like liver function tests or certain cancers like breast, prostate and colon cancer.

To review the points on both side of the debate it helps to consider why the debate arises at all. I think it has to do with the desire to incorporate as many factors (both favorabe and unfavorable) as possible in order to enable the most accurate underwriting possible. This is a fine goal but it comes with a price: complexity. I have found that once the number of varialbes one needs to consider becomes greater than about 6, tables get very long and difficult to read. This is often when a calculator will become appealing to the author of the manual section.

The major strength of a calculator is consistency. Put in the correct inputs and you will get the correct output. The weakness of  a calculator is related to this strength. If a mistake is made in the calculator’s construction, then the resulting errors are guaranteed to occur in every case. Also, some have criticized calculators as being detrimental to the eduction of the underwriter using it. If all you do is ‘plug and chug’ you do not develop a sense of which factors are important to the mortality risk. I have seen this myself in the process of interviewing underwriters. I often ask an interviewee to tell me whether positive estrogen receptors are a positive or negative risk factor in breast cancer (it is positive). I have heard the answer “I’m not sure, I just use the calculator” more than once.

This is unfortunate and I don’t think one needs to sacrifice the promotion of important eductional concepts in order to reap the benefits of a calculator. Instead, a well-designed calculator can include indicators, pop-up boxes or short descriptions which will help develop the underwriter’s knowledge. For instance, if a breast cancer calcultor has a section for estrogen receptors utilizing a drop-down menu for entry, it is a simple matter to indicate using color codes, up/down arrows or pop-up boxes that checking “positive” is a favorable factor. Even making the choice say “positive receptors (favorable)” reinforces the concept without confusion or additional complexity.

So, in my opinion, good calculator design can improve accuracy and consistency while also promoting clarity and understanding. Of course, good calculator design is a complex task and may be a topic for another post.

Machine Learning in Healthcare (and otherwise)

The latest issue of JAMA arrived in my snail-mail box this week and it contained an editorial about machine learning in healthcare. This was, no doubt, in follow-up, to a recent issue of JAMA which contained 2 articles about deep learning algorithms which performed well in the detection of diabetic retinopathy and metastatic cancer.

This editorial does an excellent job of balancing the exciting potential of such algorithms with realistic expectations. Futher it succinctly encapsulates the various tiers of algorithmic complexity. Perhaps most interestingly the editorial contains a figure which maps out many familiar and not-so-familiar algorithms into a chart of human involvement vs. data volume. One of these cited algorithms is a study of coronary risk scoring algorithms developed by examining the electronic records of primary care patients in England. Authors used several techniques (random forest, logistic regression, gradient boosting, and neural networks) and compared performance to a classic risk score from the American Heart Association. They found that all of the machine learning algorithms out-performed the risk score. The intriguing part to me was that logistic regression performed nearly as well as neural networks, and quite a bit better than random forest. Two reasons why that is interesting: 1) I am a nerd, 2) neural nets and random forest are notoriously ‘black box’, while logistic regression is very clear in how the risk prediction is affected by the input variables (so-called ‘glass box’).

The bottom line is this: it is possible to find models which perform extremely well and that still maintain clarity.

You can find the full article here.

The Analytic Process

Spotted this at Harvard Buisness Review. It is a short review of the analytic process in business and how it can go astray. It is definitely worth a read if you are involved with the implementation or creation of analytic models for business.

Book Review 2: Predictive Analytics

Another book review. Well, I had to give some talks recently – one on predictive analytics and another on genetics – and so I read The Gene by Dr. Siddhartha Mukherjee which I reivew here, and Predictive Analytics by Eric Siegel.

I really admire how both of these authors can cover what are really very technical topics in plain language. Where Dr. Mukherjee’s style emphasizes scientific rigor, inspiration and wonder, Mr. Seigel tends more toward the astounding or the entertaining. He covers multiple relevant topics and uses copious examples to illustrate predictive analtyics an the insights it can provide. He goes into especially great detail with IBM Watson’s performance on the game show Jeopardy!. There is quite a lot to unpack in that example from language processing to appropriate seleciton of candidate answer to the arrival at a final probabity of being correct.

Additional technical topics included linear and logistic regression, decision trees and ensembles.

Overall this is an excellent book and would be very valuable for those who need to use the results of predictive analyses in their business. It will not, however, enable you to perform these analyses yourself – it is just not that kind of text.

Book Review : The Gene

Because of some recent plane travel and a conference at a golf resort (I don’t play golf) I had some down time in which to catch up with my reading. I had been working on The Gene: An Intimate History, by Dr. Siddhartha Mukherjee. It is a 600 + page epic which means it does not travel well in its native, paper form. Actually, that is the worst thing I can say about it.

The book takes a deep, wide look at the history of genetics from the very beginning of man’s study of heredity (yes, there are pea plants) to the damaging social effects of mid-century eugenics. What I found eye-opening here was the widespread enthusiasm in America for eugenics by forced sterilization at the time. This was illustrated by the famous Buck vs. Bell case where none other than the luminary judge Oliver Wendell Holmes ruled that the forced sterilization of the “weak minded” was permissible and did not violate the 14th Amendment.

This was one of a few chapters in the book that dealth with the wider social impact of heredity and genetics. Mostly, though the book is concerned with the history of scientific discoveries. Considerable time is spend with Mendel, Darwin, Crick, Watson, Franklin, and Berg as well as modern-day innovators like Venter, Collins and Doudna (developer of the CRISPR-Cas9 technology).

The overall tone of the book is one of cautious optimism, but it does not fail to point out significant concerns about the under-regulated use of CRISPR on human fetal tissue or germ cells. The author also points out that genomic science is still in its infancy, and that the complex interplay of multiple genes in disease is as yet poorly understood.

Overall, I found this to be a riveting look at nearly all aspects of genetics. It is, perhaps a difficult read for those without a background in science or medicine – but only for a few particularly dense pages.

AAIM Day 2

Another great day here at AAIM. Harvard’s own Dr. Sanjiv Chopra kicked it off with an inspiring and, in one case, heart rending tale of the various triumphs of medicine from the distant and recent past. He also offered some very interesting projections into the future, which ranged from the fairly non-controversial increase in the use of ‘personalized medicine’ to the somewhat more fanciful ‘exercise pill’ to the downright unlikely ‘free healthcare in America’ within the next 50 years. He was a wonderful speaker – and I’m not just saying that because of the free book.  Dr. Elyssa Del Valle, fresh from her victory over Serena Williams, brought a lot of interesting studies about liver fibrosis and cirrhosis without the use of biopsy. Kevin Glasgow and Iraida Labra from MunichRe talked about interesting cases of life insurance fraud. If you had asked what I expected to see at AAIM this year “fake Hatian funeral” would not have been among them.  At lunch it was an honor to see the esteemed doctors Mackenzie, Titcomb and Clark receive the degree of Fellow. The afternoon session brought us Dr. Faisal Merchant who discussed many interesting aspects of syncope (hint for prospective JIM authors: look at his slides – plenty of great fodder for articles there). Speaking of new authors, my talk with Ross Mackenzie brought in a cadre of enthusiastic prospective authors whom I hope will soon flood Ross’ inbox with manuscripts. Rounding it out, I attended Cliff Titcomb’s excellent talk about practical mortality analysis. I especially appreciated his treatment of Markov models and will consider expanding our treatment of that in the Basic Mortality Course in the future.

AAIM Day 1

I’m attending the AAIM Annual Meeting. The first day was great – with a great list of speakers who created a lot of buzz in the audience. Our first platform with Dr. Elizabeth Arias from the National Center for Health Statistics was excellent and very timely, as she addressed the spike in accidental poisoning deaths – which are mostly from opioids. Another interesting finding she highlighted was the “Hispanic paradox” where Americans of Hispanic ethnicity have, on average, a survival advantage over non-Hispanics. The first concurrent session was highlighted by Dr. Anitha Rao’s excellent discussion of dementia; she is also an innovator who has developed a care planning tool for families struggling with this disease – and which could also help insurers optimize home-based care solutions. Dr. Ira Adams-Chapman presented an encyclopedic talk on the morbidity and mortality of infant prematurity. I felt like I was in medical school again, with a great professor.
In the afternoon, Dr. Jaime Vengoecchea brought some much needed perspective to the genetic testing furor. He really sand-blasted some of the shine off of the shiny object. His talk was well received and hopefully will bubble up to the ears of industry executives. Our final platform by Mike Fulks from CRL was up to his usual standards of excellence and reviewed some very relevant laboratory conundrums. He also did a good job of dealing with a very persistent questioner.