Big Data Analytics

Transforming radiation therapy through "big data" analytics

It should be common for clinics to have the ability to rapidly assemble datasets to address practice quality improvement (PQI), routine clinical translational research (CTR), and other arising questions to aid patients in our clinics today. We enter a wealth of information into electronic health records (EHR) and radiation oncology information systems (ROIS) on a daily basis. It should be the rule rather than the exception, that we seamlessly use this information about past experience to help today’s patients.

To reach that goal we have to overcome technical, cultural, and clinical process barriers. “Data Farming” is a realistic and functional conceptualization for shaping expectations of the type of work and commitment needed to construct reliable databases supporting practice quality improvement and clinical translational research. The objective is to harvest large volumes of data that we could use as raw materials for analyzing health care patterns and outcomes. Like the farmer who considers the implication of every part of the sowing, growing and harvesting process on the yield of high quality grain, we need to examine how best to use the tools available in our electronic systems to increase the volume of actionable data that is readily available. High quality data sources rarely exist independent of our efforts, just waiting to be found, or mined. They result from intent and dedication of resources to grow these data sources and curate (weed out) misleading information.

A “Data Farming” conceptualization also helps highlight five “Vs” we have found to be prominent in technology and process discussions of big data in radiation oncology.

  • Variability: Various given data types (e.g. weight, labs, DVH curves) may need to be aggregated and from multiple sources based on criteria such as time range, stake holder group or vendor. Differences in location, access requirements, storage technology, nomenclature, formatting, units, and data quality contribute to complexity of extract, transform and load (ETL) operations.
  • Veracity: Incorrect data values or missing data undermine our ability to draw accurate statistical conclusions about distributions of values and relationships between data elements. Many PQI and CTR efforts focus on data at the outer range or even in the tails of distributions where the “law of averages” cannot wash out errors.
  • Volume: Storage and processing requirements for data elements can drive technology decisions when very large ( e.g. >1 Petabyte). Thresholds for this classification evolve rapidly as technologies progress.
  • Velocity: Data input stream rates can drive technology decisions when very large (e.g. >0.1 Terabyte/sec). Processing speeds for the system of analytics, interface and aggregation tiers drive tractability of incorporating analytics into clinical process flows. Thresholds for this classification evolve rapidly as technologies progress.
  • Value: Implementation of big data solutions have high costs: financial, technical, staffing resource allocation, process change, and political capital. Obtaining needed support depends on addressing cost vs benefit to PQI and clinical translational research efforts.

The vision for efforts required to make routine use of Big Data a part of clinical reality in Radiation Oncology, is similar to the vision for creating a productive farm yielding large volumes of high quality grain. We are both the consumers and the producers of the yield that serves to help us improve patient care. Farming cultures evolved their processes and technologies from sufficient for subsistence to enabling large-scale automation. An analogous evolution in Radiation Oncology data is within reach. It requires a community effort leveraging the skills, insights and data use needs of all clinical and IT staffing groups as well as professional societies.

Collaborative development and adoption of standardizations by multiple institutions, professional organizations, vendors and clinics to increase volume and availability of data sets created as part of routing processes is a vital part of that community effort. Engagement by government as part of these communities is needed to overcome barriers to combining these data sets so that the information learned through treating patients today can be used to improve treatments and health care policies for the patients of tomorrow. At UM we are engaged on all of these fronts to enable harvesting the experience we’ve gained from our past to help the patients in our future.