top of page
Search

From Raw CMS PUFS to a Public-Safe Tableau Story: A Medical Economics Portfolio Build

  • Kailey Northam
  • Jan 10
  • 4 min read

Medical economics analysis often starts with a simple question—why is spend high, and where is it changing?—and quickly turns into three harder ones:


  1. Is spending growth driven by utilization, unit cost, or both?

  2. Is spend concentrated among a small subset of providers (outlier risk)?

  3. How much variation is explained by geography, even after standardization?


The sections below explain how the data was processed, what was published publicly, and what the findings imply. Click the photo to navigate to an interactive story.
The sections below explain how the data was processed, what was published publicly, and what the findings imply. Click the photo to navigate to an interactive story.


This project uses CMS Public Use Files (PUFs) to answer those questions at a macro level, then publishes a public-safe, reproducible portfolio deliverable: a Tableau Story supported by documented datasets, SQL queries, and data dictionaries.


What I analyzed


I focused on two CMS PUF families:

  • Medicare Physician & Other Practitioners PUF (2022–2023): Service utilization and spending at an aggregated level, used to measure year-over-year changes and concentration patterns. CMS notes these datasets are organized around provider and service reporting using HCPCS coding, where Level I codes are CPT maintained by the AMA.

  • Medicare Geographic Variation PUF (2014–2023): State-level standardized per-capita spending trends and a 2023 standardized snapshot. CMS standardization is intended to remove geographic payment-rate differences so resource use comparisons are more meaningful across regions.


A key constraint: while the PUFs are public, CPT/HCPCS code content is subject to licensing restrictions, and I did not want my portfolio repo to distribute restricted code content. CMS explains the HCPCS/CPT relationship and the AMA-maintained CPT component.


Approach: building a medical economics “metric layer”


I built this project like I would in an analytics job: load raw data locally, create a consistent metric layer, compute insights, then publish only the safe artifacts.


Workflow

  1. Local ingest: Raw CMS PUF files were downloaded and loaded into a local SQLite database.

  2. Transform & normalize: I standardized datatypes, created derived totals where needed, and built clean analysis tables.

  3. Concentration modeling: I generated a provider concentration curve to quantify how much spend is captured by the top provider percentiles.

  4. Geographic variation analysis: I used the Geographic Variation PUF standardized per-capita measures to compare states over time and in a 2023 snapshot. CMS publishes methods explaining why standardization is used for geographic comparisons.

  5. Publish public-safe outputs: I exported aggregated CSVs and public SQL scripts to GitHub and created data dictionaries so the work is reviewable and reproducible.















The deliverables (and what’s public)


Click the photo to open a link to my GitHub repository.
Click the photo to open a link to my GitHub repository.

My GitHub repo is intentionally structured so a reviewer can follow the work without accessing raw files:


  • data/public/ — analysis-ready CSV exports used in Tableau

  • sql/public/ — SQL queries used to produce export tables

  • docs/Dictionaries/ — dataset dictionaries for each public export

  • docs/deliverables/ — Tableau Story PDF + downloadable workbook (TWBX) + project write-up

  • docs/executive_summary.md — one-page summary for quick review


This is a deliberate “portfolio standard”: it demonstrates end-to-end skill (data, SQL, BI, documentation) while keeping restricted content out of a public repository.


Key findings (high level)


From the published Tableau outputs:

  • Allowed spend increased from 2022 to 2023, and service volume increased as well.

  • Provider spend is highly concentrated and increased YoY (top provider percentiles captured a larger share in 2023).

  • Service concentration is lower than provider concentration, suggesting “who delivers care” (provider distribution/outliers) is often a bigger spend driver than service mix alone.

  • Geographic variation remains large even after standardization, and state trend lines show persistent divergence across 2014–2023.


Why this matters for medical economics


This type of analysis is valuable because it aligns directly with how payers and health systems think about cost:


1) Concentration tells you where risk lives

When spend is concentrated, broad, blunt interventions are inefficient. Instead, the practical question becomes: Which segment of providers accounts for the majority of spend—and how stable is that pattern year over year?


That’s why concentration curves are useful: they quantify whether you’re dealing with a diffuse problem (many small contributors) or a concentrated one (few large drivers).


2) Standardized geographic variation separates “price” from “resource use”

CMS standardization is designed to remove geographic differences in payment rates so comparisons are more meaningful across regions. If large variation persists after standardization, it points to differences that may be driven by utilization patterns, practice style, access, or population characteristics.


External research supports that Medicare per-capita variation can remain wide even after accounting for price and health differences.


3) This is the lens employers want to see

For roles in medical economics, finance, or payer analytics, this project shows:

  • Comfort working with large administrative datasets

  • The ability to build a clean metric layer

  • Skill in SQL + BI storytelling

  • Strong documentation + governance habits


Limitations (what this project does not claim)

  • PUFs are aggregated; this is not member-level claims adjudication.

  • It doesn’t include risk adjustment or patient-level clinical outcomes.

  • Findings are best interpreted as macro-level signals, useful for prioritization and hypothesis generation.


Next steps (how I’d expand this in a real role)

  • Add additional years for physician services (trend beyond 2022–2023).

  • Decompose spend growth into utilization vs unit cost vs mix components.

  • Add statistical outlier flags for provider spend using robust methods (IQR/median absolute deviation), still staying within public-safe aggregation rules.

  • Layer in policy context using sources like MedPAC’s Medicare spending data books and reports.


References

  • CMS. Medicare Data for the Geographic Variation Public Use File (Methods Paper).

  • CMS. Geographic Variation in Standardized Medicare Spending (State tool/definition).

  • CMS. Medicare Physician & Other Practitioners PUF overview + data dictionaries (provider/service).

  • CMS. Healthcare Common Procedure Coding System (HCPCS) overview (CPT as Level I maintained by AMA).

  • MedPAC. Health Care Spending and the Medicare Program (Data Book).

  • KFF. Medicare Spending Per Beneficiary (State Health Facts).

  • Zhang Y, et al. Geographic Variation in Medicare Per Capita Spending… (peer-reviewed, PubMed Central).

  • KFF. Medicare 101 (program spending context).

 
 
 

Comments


bottom of page