Research Data & Analytics


To be a trusted data and analytic resource for healthcare research and evaluation, influencing standards for excellence in quality and innovation, to improve the health of our members and communities.


Data and Analytics - the backbone of health research. The mission of IHR's biostatistics and data specialist community is to acquire ever-changing research data, develop data domain knowledge, and produce insights to important questions using innovative methods for our research, evaluation, and operational partners.

Research Analytics & Biostatistics

IHR analysts turn vast amounts of data into actionable insights, using a vast array of methodologies. The IHR employs the skills of a large team comprised of Biostatisticians, Epidemiologists, and a Health Economist, in addition to analysts who are adept in evaluation methodologies, data visualization, natural language processing, and machine learning techniques. IHR analysts partner with both researchers and operational partners to consult on appropriate and innovative methodologies. They utilize scientific rigor to contribute to grant funded research, adding to the body of evidence to improve our public health, or to support operational implementations.

Research Data Management

IHR data specialists perform the data management that makes our research possible. Research is evidence-based, and evidence is rooted in the rich data collected in a vast array of clinical and administrative data systems underlying clinical practice and the provision of insurance. Data must be curated and transformed in a way that supports each research hypothesis.

Data specialists are experts at the Extract/Transform/Load (ETL) process and adept at locating and learning the intricacies of new data sources based on the needs of a study. They are responsible for ensuring that data pulls are complete and accurate and help to inform research study needs by understanding data caveats within our clinical and administrative systems that could potentially cause bias in an analysis.

To create efficiency across large numbers of similar research data requests, a team of research data specialists maintain the Virtual Data Warehouse (VDW) - a product created by extracting and pre-processing disparate data from multiple data sources into a research-ready format that support research data requests.

The Virtual Data Warehouse (VDW)

The VDW has been the primary data source for hundreds of grant studies since its creation over 12 years ago. It is a rich, quality checked, clinical information data mart that combines many complex data sources into an easy-to-use format for research analysts and programmers. Source data include the electronic health record, administrative claims system, state death data, tumor registry, hospital machines, and many more.

The VDW is created by cleaning, standardizing, and combining data from these different systems into 'content areas' (e.g., Enrollment, Demographics, Utilization, Death.) that may be easily linked to each other. The table below lists the broad set of data content areas harmonized within the VDW to date. New data content areas are considered each year to support new areas of research.

Data Content Areas Harmonized within the VDW

Content Area Description
Utilization Includes encounter, diagnoses and procedures from both Kaiser and non-Kaiser provider
Demographics Birthdate, gender, race, and ethnicity
Enrollment Member periods of enrollment, enrollment plan types
Benefits Dollar and percentage of copay, deductible, and coinsurance, types of benefits
Vital Signs Height, weight, body mass index, and blood pressure
Census Geocoded information on education, income, housing, and race information based on neighborhood
Geographically Enriched Member Socio-Demographics Race probabilities and geographic descriptors
Pharmacy Outpatient dispensing, including those from outside claims
Ordered Meds Outpatient prescribing and associated diagnoses
Laboratory Completed tests and results
Social History Tobacco, alcohol and illegal drug use, sexual behavior, and contraceptive use
Death Death date and state certified cause of death
Providers Specialty and provider type of internal and external providers
Problem List Current status of patient's problem list
Language Spoken or written language(s) of member
Pregnancy Pregnancy outcome episode and mother-baby linkage
Tumors Data documenting confirmed neoplasms; size, histology, stage, etc.
Infusion Ordered and dispensed drugs at infusion center and treatment plan
Bone mineral density Calculated BMD, t-score, scan date/time, location and fracture risk scores
Patient reported outcomes Self-administered questionnaires including brief pain inventory (BPI) and the patient health questionnaire (PHQ)
Spirometry results Completed Spirometry tests and results

What Makes the VDW Unique

What makes the VDW unique within the KPCO data landscape is that it is designed specifically with research needs in mind. The VDW can provide a full picture of any given patient's interactions with our health care system and their health status over the duration of their membership, which allows for its use in answering a large variety of research questions. The VDW can also provide insight to patient attributes and coverage characteristics that may influence the use of our health care system or health outcomes. Data spanning nearly 20 years of historical KPCO membership and utilization allows the IHR to conduct point-in-time/cross-sectional analysis and longitudinal analysis.

Another unique quality of the VDW, and one of its key strengths, is the ongoing collaboration for governance, development, and quality assurance efforts with the Kaiser Permanente Center for Effectiveness and Safety Research (CESR) and the Health Care Systems Research Network (HCSRN). CESR's focus is on comparing how well and how safely different preventive services and treatment approaches work within our clinical practice at Kaiser Permanente. HCSRN is a consortium of 20 research centers, including all 8 KP regions, embedded in health plans across the United States and Tel Aviv, Israel. All additions and modifications to the VDW model go through these governing bodies to ensure usability and robustness of the data across research projects and organizations.

While the VDW can be thought of as a data warehouse that combines data from all 20 organizations, there is no centrally located store of data where data from all sites can be touched in one single run. This is known as a 'federated' data model, and what makes the VDW 'virtual'. Each member organization creates and maintains structurally identical data models and retains control over their local data, utilizing local programmers with singular expertise about their source data systems.

Multi-site research is accomplished through 'distributed programming'. Analytic programs are written by a lead site against VDW specifications, and then distributed to participating sub-sites where they can be easily run against their VDWs. Results are reviewed, and analytic results are returned to the lead site. This process preserves privacy for all patients and ensures the quality of the data returned for analytics.

The standard HCSRN VDW data model is the core of our KPCO VDW. It also serves as a primary source for other common data models managed by the IHR, including: