A Data Science Central Community
There is an estimated 50 Petabytes of data in the health care realm, predicted to grow to 25,000 Petabytes by 2020, reported by a new info-graphic from Oracle. From this astonishing data report, we can see that the healthcare industry is generating a huge amount of data, driven by clinical records, medical care and compliance & regulatory requirements.
Luckily, big data analytic application has been widely used in the health care industry to extract insights from the wealth of data. Being able to accurately identify the association, trends and patterns had empowered such data analytic techniques to save more people’s lives and lower their medical care costs. These large amounts of data has been extensively applied to support a wide range of health care services, including clinical decision, population management, disease detection, real time statistical analysis, pharmaceutical research, etc. Thus, to know more about how data analytics is working while implementing a health care program will spur a sound and well-rounded health care development.
With the wild expansion of public health information, we can use data analytic technique to crawl and filter out varied types of public health info data. Thanks to the data analytic methods, medical workers are able to manage large amount of unstructured data and then explore the insights from these data. Note that there are multiple channels to collecting population health information. Officially, lots of medical data now comes from the hospital information system (HIS), which includes electronic medical record system (EMRS), laboratory information system (LIS), picture archiving & communication system，(PACS), radiology information system (RIS), clinical decision support system (CDSS), etc. Apart from these data sets, many other medical care appliances can also help to record the life symptom information, like ECG data, blood oxygenation, blood pressure, pulse, and body temperature. You can even get health information from some social media platforms or search engines. All of these data can be helpful for medical workers or researchers to make a meaningful therapy decision.
As known to all, Google has successfully predicted the influenza A (H1N1) outbreak almost 2 weeks earlier ahead of US Centers for Disease Control and Prevention (CDC) in 2009. It was the big data technique that Google had used to crawl the relevant searching results from its users, and detected the outbreak of the influenza. More specifically, there are two categories of gleaning infectious disease information: positive collection and passive collection when concerned about the user data. For the passive data collection, Google collects the periodic data submitted by users to analyze the current situation or future trending. While the positive collection means analyzing the tweets, micro blogs or searching history records, which updates much faster, for disease prediction.
Actually, apart from Google, the micro blogs posted on other social medias or searching history records could also be a warning sign of the underlying disease outbreak at the very beginning. There are many ways to collect the posts from the social medias, including using the public APIs provided by those media platforms themselves, programming to to build a crawler on your own , or you may needn't bother to deal with any coding or technical skills by using an automatic web crawler. By filtering out the keywords from the micro blogs, data scientists can build up a predictive influenza model based on the keywords’ characters using the LASSO algorithm. Plus, during the spreading of a disease, as a long time exposure to the pathogens is linked to a higher chance of infection, tracing the social media information and locations will be conducive to learning about how the disease is transmitted.
Many public health investigations indicate that some disease may relate to the gene types, life styles, body symptoms. Promisingly, the genetic information and medical history records can be explored to prevent the underlying disease by designing a personalized and customized treatment while the incubation period. For example, the Mayo System, serving as a data analyzing platform for data scientists, has been developed to store and analyze the history records data from patients and customize a personalized therapeutic plan for people in needs. By analyzing the body symptoms and other history records, medical workers can find out the matching diagnosis information from the data analyzing system, and then propose an instructive treatment plan effectively.
For the present, the exorbitant cost in the health care comes a lot from the failed medical cases. According to the American Medical Association, 98,000 fatal cases that could have been avoided in America happened. America has spent over 1700 billion dollars in the medical health care industry.
Plus, the treatment cost, hospitalization length can also be used to predict an expense monitoring model to detect the abnormal data by using data mining techniques, thus preventing the abuse of medical resources. For example, the abnormal bills can be filtered out by the rules from the data analytic tools base, then send out warnings to supervisors to avoid a list of impending problems, including exorbitant charge, mismatching diagnosis, medical apparatus and instruments abuse, etc.