Big Data. We’ve all heard the term, but many of us may not fully understand the concept or all the its many layers. Some still think it’s a trendy buzzword thrown around by twenty-something executives in Silicon Valley working on the latest gaming system or website. But the reality is big data is literally everywhere, and it’s not just a thing, it’s also an action.
Big data is gathering right now, all the time: As your mouse hovers over this webpage, when you receive book or movie recommendations from Amazon and Hulu, as you share files via DropBox or WeTransfer, and even when physicians are caring for their patients – as detailed a recent Wall Street Journal piece.
Over the next couple weeks you’ll hear from experts across Penn Medicine who use big data in patient care, and clinical and basic science research. But before getting into the thick of how we’re connecting health information with advanced technology capabilities, I sat down with Mike Draugelis, former Lockheed Martin missile defense engineer turned Penn Medicine’s Chief Data Scientist, to learn a bit more about the topic, the use of big data in health care and beyond, and the future of the industry.
Q. What is big data and where did the concept come from?
A. Big data is a very ambiguous term, and you tend to not find a clear definition of it. The term covers items of data and the way it is stored, processed and analyzed.
The advent of big data really came when the question was raised of how to avoid the bottle neck that came with disk space; storing data on servers came at a cost and as the data amount increased so did the price of the space, which became exponential.
There is the concept of the three V’s – variety, volume and velocity – which can help distill the term.
For example, sorting one terabyte of data, just a list of words on a thumb drive, takes about 60 minutes. That time does not even account for any algorithms or advanced analytics. As the kind of compressed data changes – meaning, the variety of files including text, photos, or time series data – and volume increases (as that original one terabyte turns to five or 25 or 500), the speed, or velocity, at which it is processed, stored, and analyzed needs to increase on the same trajectory.
Companies like Amazon, Yahoo and Google originally started collecting as much data as they possibly could – how a user navigated a website, ad-clicks, and page views – and then tried to figure out how they could use that information.
The point is, data is being collected all the time, from every possible place, which essentially what created the term now known as “big data.”
Q. Being different from the IBMs or the Amazons of the world, does a health system have a lot of big data?
A. Penn Medicine has a ton of data. In terms of the electronic medical records (EMR), there is a large amount of information, even if you just count the records per patient, which fills about one terabyte of disk space. That amount of data can’t be processed on a laptop, but it’s definitely not large enough for web scale – or the cloud – processing. However when you then start expanding the data we’re looking at, say to include how much data is in radiology at Penn – 250 terabytes – that’s really big. And if you look at the data collected by Raina Merchant, MD, an assistant professor of Emergency Medicine and director of the Penn Social Media & Health Innovation Lab, and her team in the lab, who are aggregating really exciting social media data to inform cardiovascular risk, the data are getting bigger and bigger with time – in the 10’s of terabytes.
All that said, if you were to take these numbers to Google or any company that traditionally uses big data, our information might be viewed a “big-ish” data.
So in our case, we apply big data technology so that we can train our algorithms to process data at a faster rate. Essentially what big data allows us to do is to look at this huge variety of information, mash it together, and see what patterns emerge.
Q. How is big data used in healthcare?
A. One example I use to explain this is Netflix. It can predict that my wife likes romantic comedies and that I just like comedies. This isn’t happening because it knows the process of my brain, but rather because it has a tremendous amount of data and has built a machine learning model – or data model – that can make accurate predictions.
In health care that is a really great pathway to follow because the body so complex – it’s called predictive healthcare.
Traditionally, physicians might say a patient is at risk for X if their blood pressure is above Y and heart rate is below Z, which is based on scientific research which is then translated at the bedside.
But challenges still remain for a few reasons. First, the body doesn’t really work that way, in terms of “if X then Y” every time. It’s far more complex. Clinicians need to take into account far more information and characteristics to determine risk. And second, even if we could model patient data in some really complex way to actually mimic all the functions of the body, the human brain can only think in a simple way when applying the findings to patient care.
So where machine learning comes in is we can actually build a data model to take in all of the complexities of the patient variables over time, and come up with a really strong forecasting prediction. We can then use that to simplify and condense all the other factors a clinician would need to look at to determine if someone is at risk for a specific condition.
Q. What is the predictive healthcare process at Penn Medicine?
A. At a high level, the opportunity to integrate predictive health care solutions comes when care teams rethink the way they provide care to our patients. It leads to us ask the question: how do we take new insights and deliver care in a more focused and valuable way for the patient?
Most of the work we’re doing in this space is through continuous collaboration between the data scientists and the clinical team –usually a 50/50 split – which typically following a three step work-flow:
- Algorithm Proof of Concept: does it find what we are looking for?
- Pathway: is it clinically relevant, does the care team know what to do, is there already a patient pathway created, how can we actually see better outcomes?
- Implementation: How do we scale this across the whole health system?
Right now we have what’s called the Sepsis Solution in the pathway phase. The prevalence of the infection is really rare and it’s hard to detect, but we have an algorithm that performs really well; it’s as good as anything published. However, the warning alert is coming about 30 hours too early. We’ve been operational with it for about six months and there hasn’t been a change in patient outcomes. We have to continue to study it to figure out what’s missing, if we are training on the wrong risk factors, if we need to change the alert system, etc.
I’d say predictive healthcare is similar to R&D in that we’re constantly learning from these models and are always making changes. From the outset, we don’t know that the outcome will be, or if it will be valuable. It’s essential to have a culture that is okay with a continuous need to change or pivot until we get it right. At Penn, in its academic culture, clinicians seem really ready to embrace that approach.
Q. What does the future look like for big data at Penn Medicine, more broadly?
A. Each year, we set out to tackle three to four large projects that follow a year-long iterative release, and two smaller projects that are on a six to twelve week timeline and may not require a predicative algorithm.
Next month we’re looking to pilot a program led by Corrina M. Oxford, MD, an assistant professor of Clinical Obstetrics and Gynecology, around putting a process in place to reduce maternal morbidity. After a woman goes into labor, she is at risk for hemorrhage, preeclampsia, infection, etc., so we are looking to identify these women early and get them the care needed to reduce risk.
Another project led by Tracey Evans, MD, an associate professor of Clinical Medicine in the division of Hematology-Oncology, Abigail T. Berman, MD, MSCE, the associate director of the Penn Center for Precision Medicine, and Jennifer Braun MHA, BSN, RN, the director of Quality Improvement and Patient Safety for the Cancer Service Line, is identifying lung cancer patients in an outpatient setting who are at risk for unscheduled readmissions. We’ve built this algorithm and are already starting to catch at-risk patients up to two weeks prior to their event. This work will continue throughout the year.
Outside of Penn Medicine, what we are going to see is a more fluid interaction with computers. It will make the physician-patient interaction more conversational and more personal.
Deep learning and voice recognition – as used in the Amazon Echo, Siri, and Google Home – are already becoming more and more prevalent. If that technology can be integrated into patient care, instead of physicians taking notes on a computer in the exam room while speaking with the patient, the computer will pick up and process the interaction in real-time.
There still needs to be a lot of advancement in computer visualization, voice recognition, natural language detection, but this is already happening outside of the healthcare space. The Amazon Echo is actually in some health systems already.
The technology is there. It needs to be integrated, but we could start seeing this integrated into patient care in the next two to three years.