Martin Cansdale, Chief Scientist at Healthily explains the science behind the Healthily Smart Symptom Checker and why Healthily uses Bayesian inference.
The Artificial Intelligence (AI) behind the Healthily Smart Symptom Checker has been continuously updated and refined over the last 6 years. But some of the decisions made in developing the first version of the AI are as important now as they were in 2015, in particular the use of Bayesian techniques to predict the cause of a user’s symptoms.
Bayesian inference is a method of updating our beliefs in the probability of an event or hypothesis, based on new information we observe.
Imagine you’re sitting indoors with no view of a window and you want to estimate how likely it is to be raining. You might start by thinking about how often it rains where you are, perhaps taking into account the time of year, and come up with an initial probability of 10%.
Next, you think back to the weather that morning when there were dark clouds. You estimate that there are dark clouds on about 90% of days when it rains and only 40% of days when it doesn’t rain. You use this information – that the clouds you observed are much more likely to be seen on a rainy day than a dry day – to update your estimate for the chances of rain. After doing the Bayesian calculations, you find that this is now 20%.
Finally, you look at someone coming in the door and see that they’re carrying a wet umbrella. This wouldn’t surprise you if it was raining – you estimate that 80% of people would be using an umbrella when it’s raining – but would if it was a dry day. You also give a generous 1% chance of someone carrying a wet umbrella on a dry day – perhaps a water main has burst outside.
You feed this new information into your Bayesian calculations. Because it would be so surprising to see someone with a wet umbrella on a dry day, this changes your estimate a lot – now you think there’s more than a 95% chance that it’s raining.
At Healthily, we can use the same approach to update our estimates of the probabilities of different conditions that could be causing a user’s symptoms. With each symptom a user tells us about, we can update the estimates based on how often each condition causes that symptom.
The merits of using Bayesian methods can be seen by comparing it with two other approaches we could have used, Machine Learning and Decision Trees.
Machine Learning takes a set of training data and uses this to train a model to make decisions. Models range from the extremely simple to large and sophisticated neural networks. Depending on the complexity of the problem and the model being used, the amount of training data needed can vary from a few dozen examples to tens of millions.
There are no shortages of models we could have tried. The difficulty is in finding suitable training data. While Machine Learning has been applied to Electronic Health Records (EHRs), this isn’t a suitable source for us. It’s important that the data used to train a model reflects the situation in which the model will be used.
For us, this means data on the same population that will use our Smart Symptom Checker. EHRs can provide data for patients in primary care or hospital settings, but this is different to the pre-primary care situation in which our symptom checker is intended to be used. A symptom could be highly indicative of a particular condition among patients who have been to see a primary care doctor or been referred to hospital for treatment, but of much less significance among people who’ve not yet made the decision to seek medical attention.
A large proportion of people with conditions suitable for self-care will decide to stay at home or visit a pharmacy rather than a doctor, so information on their symptoms will never make it into their health records.
A second approach would be to construct a Decision Tree. This is in essence a flowchart – perhaps with multiple starting points for each symptom a user could present with – that directs the flow of questions, with different routes for each possible answer the user could give.
Decision Trees can be created automatically through Machine Learning, but this would run into the same problems as other Machine Learning algorithms for finding suitable data. They can also be curated manually, using available clinical data and the expertise of clinicians.
Manual curation of a Decision Tree, however, becomes more difficult and time consuming as the problem becomes more complex. Our symptom checker contains hundreds of medical conditions, hundreds of symptoms each with a number of different attributes, and various factors linked to the user’s age, gender, and medical history. Mapping these out in a Decision Tree would require hundreds of starting points and an unmanageable number of possible paths to take. Adding a new symptom or condition to the system would require a time-consuming reevaluation of the whole tree.
This complexity would be required just to allow the user to start with a single symptom. Our chat interface allows users to choose any combination of symptoms, making the problem even less surmountable for a Decision Tree.
And so we chose a Bayesian approach, allowing users to take an individual path through a consultation with our Smart Symptom Checker, based on what they have told us about themselves, and the symptoms they come to us with. By supplementing this Bayesian approach with rules on clinical best practice, we have developed a world-class symptom checker. The provision of safe and accurate advice on what may be causing the user’s symptoms, and what they should do next, empowers people to take control of their own health.
Important: Our website provides useful information but is not a substitute for medical advice. You should always seek the advice of your doctor when making decisions about your health.