Zombie Hunting: Filtering Approaches for Price Transparency Data

For those of you who have had the opportunity to analyze payer MRF data as it appears at source, I’m certain one word has come to mind: noise. As many of our previous blog posts have mentioned, the MRF files are full of “zombie rates”: records reporting a rate for a billing code that none of the providers on that record would bill in the real world. Think of an MRF record for a knee replacement reimbursement rate where the contracting entity is a mental health provider… there’s a 0% chance of that being useful data. For some payer MRF postings, we’ve seen upwards of 80% of the data are zombie rates. Filtering these low fidelity records out drastically reduces the (initially unwieldy) size of the data. At Serif Health we’ve taken several measures to research and implement a solution that cuts through this noise and amplifies the true signal in the data. This blog post discusses an overview of our approach.

Making the Case for Utilization Data

Given an MRF record with a billing code and a set of providers (identified by their NPIs), how do you discern whether or not the record is useful? The answer of course lies in real-world utilization data (typically, claims). The ideal solution would be one that analyzes every paid claim that has ever been incurred in the United States to conclude which billing codes a provider has ever billed. Unfortunately, obtaining every single claim record is an extremely difficult (and expensive) endeavor. Even if you somehow got your hands on an exhaustive claims dataset, the exercise of unifying the data schemas from disparate claims sources for analysis is no trivial task. But what if you had a significant volume of claims data? How could you make statistical inferences from that dataset and extrapolate those learnings to classify zombie rates?

Conservative Method: Filtering by Taxonomy

At Serif Health, we analyzed several billion de-identified claim lines and found that using a provider's NUCC taxonomy code as an abstraction for determining whether a provider could realistically bill a code was highly effective in identifying zombie rates. This was accomplished by mapping the healthcare organization or practitioner NPI from a claim record to the NUCC provider taxonomy codes they specialize in. Using the claims dataset we then computed several conditional probabilities. At a high level these probabilities sought to answer the following questions:

Given that the provider’s taxonomy is X, what is the probability of them billing code Y?
- This protects against incorrectly filtering out rare taxonomies that very few providers practice.
Given that the billing code is A, what is the probability of the provider’s taxonomy being B?
- This protects against incorrectly filtering out rare billing codes that very few providers bill.

After devising a model to filter out provider taxonomy and billing code pairings with low conditional probabilities, the result of the analysis was a mapping table containing provider taxonomies and the corresponding billing codes they could realistically bill with sufficient probability. This mapping is then used to filter every record across every network that Serif Health ingests data by applying the following simplified rules to each and every record:

Take the NPIs listed on the record and look up their taxonomy details (both primary and non-primary), in our provider directory.
Search the aforementioned mapping table for the billing code on the record.
Conclude whether the provider has a taxonomy that can realistically bill the code on the MRF record.
Only the NPIs with a taxonomy that can bill the code are retained in the resulting filtered NPI list on the record. If none of the NPIs can bill the code, the record was removed.

Here’s an example of zombie rate tagging in action. Aetna’s national network from September has hundreds of rows of rates for knee replacement surgery…for Mental Health Counselors! By applying our filtering methodology, we can add an “is_billable” column to the dataset which flags zombie rates and makes removing such rows trivial.

This cleaned version of the dataset powers the search results seen on Serif Health’s Signal tool. The removal of zombie rates mitigates noisy results, saving users the exercise of sifting through the data to find the most relevant results. The significant reduction in data volume has also enabled Signal to power lightning fast, more accurate, fine-grained searches. For example, searching MRF records on the region or taxonomy details of individual NPIs within the NPI list were previously difficult to produce at near-instant response times. Often, such searches require the already large dataset to be further expanded so that each MRF record is repeated for each NPI present on that record’s NPI list. This was an expansion that exponentially increased the data redundancy, processing time, costs, and storage footprint. However, the significant reduction in data volume achieved by the cleaned data set has unlocked efficient processing of the data into a model more conducive for powering such searches.

Aggressive Method - Filtering on Claims

Filtering by taxonomy is a solid approach, but not a perfect one. Some taxonomy codes are quite broad, like “Ambulatory Surgical Center”, compared to what those facilities do in the real world. We know from experience ambulatory surgical centers can specialize in cataract surgery, foot and ankle surgery, spine surgery, or other narrow specialties, yet by the taxonomic filter criteria explained above, all specialty surgery centers would show every surgical procedure code as eligible.

To address this, Serif Health generates and stores a further filtered version of the dataset we call our ‘gold tier’. For this version of the dataset, the filtering logic directly checks if a provider has actually ever billed a claim for the specified billing code in our claims library. This approach is, of course, prone to over-filtering as our claims sample, though significantly large, is not an exhaustive representation of all claims. But it results in cleaner search results.

See the example below, extracted from Aetna’s latest posting. Two different ASCs are listed with rates for CPT 27447 (knee replacement) and 66984 (cataract surgery). By our taxonomic filter, all of these rows are is_billable = TRUE However, just from reading the ein_name field, we can tell it’s highly unlikely that both surgeries are performed at both locations. The is_billed_claims column at the far right indicates that indeed, only cataract surgery is performed at Eye Surgery Princeton, and knee replacement surgery is performed at the Center for Orthopedic Surgery, LLC. The rows highlighted in yellow are the is_billed_claims rows that we can safely drop from search results, returning a cleaner search experience for someone looking for either one of those procedures.

Conclusion

With these techniques, payers’ price transparency postings are no longer overwhelming or noisy. You do not need to worry about whether a rate is ‘real’ since we verify what billing patterns occur in the market by referencing commercial claims.

By synthesizing claims and price transparency, Serif Health offers a powerful data asset for you to understand prevailing market rates, specific providers’ contracts and utilization, and what opportunities exist for your organization.

Continue to follow our blog for insights on other ways claims can add value to price transparency data, new ways we’re leveraging artificial intelligence and machine learning to clean / standardize data fields, and other lessons we’ve learned experimenting with the ~300 billion price transparency records payers post each month.

If you’d like to explore our filtered and enriched datasets, Signal portal, or find care APIs, get in touch with the team today at hello@serifhealth.com or book time on our calendly here.

Signal New

APIs

Data Delivery

Reporting & Analytics

Providers

Plans

Employers

Innovators

Blog

Making the Case for Utilization Data

Conservative Method: Filtering by Taxonomy

Aggressive Method - Filtering on Claims

Conclusion