Skip to content

Blog

MRFs

Impact of Zombie Rate Filtering For TiC Data

In a prior post, we covered the Serif Health approaches for zombie rate filtering, and the volume of data impacted by different filtering approaches. Our question for this blog post is: what impact does all this data in the raw file have on the resulting distributions, and where do distributions land when the data is removed?

Matthew Robben

Published

3/14/2025

We are seeing healthcare organizations, researchers, and policymakers start to rely on data from the Transparency in Coverage (TiC) price transparency dataset to drive negotiations and make decisions. A frequent summary statistic quoted or used is the median of the set of datapoints in a given market. However, it’s no secret that the raw, unfiltered TiC data often contain unbillable codes for the listed providers—datapoints in the set that do not reflect what is realistic or possible, AKA 'zombie' rates.

In a prior post, we covered the Serif Health approaches for zombie rate filtering, and the volume of data impacted by different filtering approaches.

Our question for this blog post is: what impact does all this data in the raw file have on the resulting distributions, and where do distributions land when the data is removed?

To illustrate the significance of filtering, let's compare unfiltered ("raw") data with filtered data sets that remove zombies based on taxonomic likelihood, and actual claims history, and see where the medians end up.

Recap: Unbillable Codes in Price Transparency Data

We’ve covered this before, but a quick recap: when health plans publish their machine-readable files, they generate the files off of fee schedules, which list every code and thus includes prices for both billable and unbillable services for any given provider specialty. Unbillable codes in the TiC listings will not correspond to actual claims paid by insurers or patients, as attempting to bill those codes would likely lead to a claim denial in the real world. 

Serif Health has multiple methodologies to address this issue, both derived from commercial claims - we can filter the data using taxonomic likelihood, which is the probability a provider with a specific taxonomy bills the code (or its computed inverse), or we can use claims to exactly filter based on which providers have historically billed the code in question. This results in two computed columns in our data lake - is_billed_taxonomy and is_billed_claims, which we can use independently or in combination to gain confidence a given row should be included.

Experimental Methodology

Ok, so, let’s do an experiment. We’ll take raw March 2025 TiC data from the biggest PPO networks for each of the major commercial plans (BCBS Blue Card, Aetna OAMC, Cigna OAP, and United Choice Plus). We will extract all records for three very common procedure billing codes, and aggregate the results into distributions, dropping any completely identical records or records with zero or invalid values in fields, and grab the 50th percentile value.

We’ll use the following parameters in our query:

  • Modifier list is null
  • Negotiation types of percentage and per_diem are excluded
  • Arrangement is ‘ffs’ (fee for service)
  • Group by payer, code, billing_class

We won’t try to deduplicate data further (removing same ein, same rate, different npi list or site combos) or use ein/npi counting methods - just build distributions over the records that remain. While we recognize this is an artificial method that conflates different facility types together and can be biased by the methodology used in generating the posting, we’re trying to show an example in a blog and not pretend to play actuary or produce data for use in an actual negotiation. 

Finally, we’ll repeat that query applying our taxonomic filter (removing provider types that are unable to bill the code) and our claims filter (removing any providers who have not recently billed the code). 

OK now, enough boilerplate, let’s get to some data!

Median Shift Results

__wf_reserved_inherit
Median Rates as each filter approach is applied

This data tells an important story:

  1. The taxonomy filtering method tends to increase median price, while claims filtering tends to increase median price further. The most egregious example, BCBS’s raw median is $1,800.00, but after taxonomy filtering, it jumps to $7,500.00. The claims-filtered median further jumps to $13,800.65, a 666.7% increase from the raw value. I was pleasantly surprised to find this wasn't a hard and fast rule - two of the distributions in the set had a median move to the left with filtering. Both were professional rates, but on different codes and different payers, indicating that not all zombies are necessarily bad (distribution impacting) zombies?
  1. Institutional rates are more impacted by filtering than professional rates. Average delta from filtering for institutional rates is a whopping 163%, while for professional, it’s only 7.7%. What’s interesting about this is that we removed a lot more professional data in our filter passes - 93% of professional records removed vs. only 67% of institutional (see table below).
__wf_reserved_inherit
Counts deltas as each filter is applied
  1. Raw institutional medians are too low to be realistic, and some institutional medians appear too low even after filtering. 2025 CMS hospital (OPPS schedule) baseline prices for these codes are reproduced below. Only two payer/code combinations had their distribution median above the baseline in the raw data. After claims filtering, several codes like Aetna’s median rate for 27447 still don’t hit the 2025 CMS OPPS baseline payment rate. Initially I thought this might be due to mixing facility types. But even after adding a taxonomy filter to our query to restrict results to just hospitals, the median only gets to $4409. It is highly unlikely for a hospital to accept a contract below medicare payment amounts, let alone the majority of them. There’s a few possible explanations - exotic contract structures, using revenue or custom codes, shifting reimbursement to an implant code, etc - but that’s fodder for a different future post.
__wf_reserved_inherit
2025 OPPS payment rates

Conclusion: Filtering Price Transparency Data is Critical

Price transparency data is only as useful as its accuracy. If left unfiltered, unbillable data points substantially affect price distributions and misrepresent the true cost landscape. Through taxonomic-based and, more importantly, claims-based filtering, one can obtain a more accurate view of healthcare pricing and ensure financial decisions are well-informed.

In the evolving landscape of price transparency, filtering is not just a technical necessity—it’s a fundamental requirement for meaningful analysis. Get in touch with us at Serif Health if your organization needs our expertise with market analytics. We’re happy to put our methodology to work for you!