Skip to content

Blog

MRFs

Transparency Data Gets Harder to GET: MRF Processing Notes June 2024

Our June MRF Processing Notes blog focuses on a new behavior we are seeing with increasing consistency from UHC, Anthem, and various other BCBS Entities - rate limiting and/or blocking access to machine readable files. Read on for details.

Matthew Robben

Published

6/21/2024

In this edition of MRF Processing Notes, we’re going to focus on a new behavior we are seeing with increasing consistency from UHC, Anthem, and various other BCBS Entities - rate limiting and/or blocking access to machine readable files.

Serif Health is delivering on a promise of ‘powering price transparency’, and that starts with data accessibility from the source - the insurance plans. When accessibility to raw machine readable data is impeded, it’s not just an annoyance for us - it’s harming all healthcare consumers who want to make informed, rational choices about cost and quality of care. 

Screenshot from BCBS of Vermont's MRF file for June (some liberties taken).

In this post, I’ll break down the kinds of blocking and rate limiting we’re seeing at Serif Health, and give you some tips on what to do about it.

High notes

  • UHC shifted posting and hosting structure in May, causing aborted file downloads
  • Anthem continues to rate limit / block downloads at the start of each month.
  • BCBS VT + Change Healthcare - Still down??? Really???
  • Aetna dropped most MS-DRG rates in May

UHC’s new MRF hosting approach increases fragility and reduces access

We can start by diving into some of the changes the country’s largest private health insurer made, ostensibly in the name of security. From our perspective, these changes have only resulted in reduced access to MRF data.

Head on over to their transparency in coverage directory and you’ll see the same UI. A search bar, checkbox control, and list of resulting files below. 

I've considered my system's capacity...did UHC? Read on for answers.

What’s really different is the URL and hosting structure under the hood. First, UHC introduced URL signatures on their underlying storage bucket, so what used to be a simple URL:

https://mrfstorageprod.blob.core.windows.net/public-mrf/2024-06-01/2024-06-01_-A-1-PUMP-INC_index.json

Is now:

https://mrfstorageprod.blob.core.windows.net/public-mrf/2024-06-01/2024-06-01_-A-1-PUMP-INC_index.json?sp=r&st=2024-03-26T04:49:21Z&se=2025-03-25T12:49:21Z&spr=https&sv=2022-11-02&sr=c&sig=M0sm1qeV6LULQexjwJYsuupRKv1UpgsQLzpfLtZbzkk%3D

All the extra bits and bytes after the question mark are mandatory URL ‘signature’ properties required to actually download the file - you can try accessing the simple, first URL I posted. It now 404’s with a ResourceNotFound error.

URL signatures aren’t news and they aren’t bad on their face, but they’re of questionable utility for public disclosures. They’re typically used when you want to prevent URL guessing or enumeration, which isn’t a concern here (remember these are mandatory public disclosures!)

The challenge with signed URLs:

  1. There’s no URL twiddling or guessing allowed. If you are missing even a single character of the URL, the signature no longer matches, causing a AuthenticationFailed error on access. Given how long these URLs are, and that there’s already escape characters showing up in my blog post, this is going to introduce issues in automation systems trying to parse and process them…
  2. …which also means getting index updates month over month becomes a manual instead of automated process. If you’re A1 Pump, you can’t just increment the month number and fetch your data again in July. You have to go click through United’s UI, fetch the exact signed URL with new parameters and updated signature value, and then you can get to your data.

Ok, so, if they introduced signed URLs to indexes, they did that to their in network files inside the indexes, right? In a similar vein of securing URLs and preventing guessing?

Well, not really. Instead, they link out to a new Azure web service UHC is running that seems to be fronting/proxying the file content without a signature:

https://transparency-in-coverage.uhc.com/api/v1/uhc/blobs/download?fd=2024-06-01&fn=2024-06-01_United-HealthCare-Services--Inc-_Third-Party-Administrator_OHPH-Chiro_28_in-network-rates.json.gz

Astute readers might go, aha! You just solved your increment access problem. Just regex replace the direct bucket url prefix to route through the API server as shown in the index, and hit that instead.

Sure enough, it does work: https://transparency-in-coverage.uhc.com/api/v1/uhc/blobs/download?fd=2024-06-01&fn=2024-06-01_-A-1-PUMP-INC_index.json 

BUT. While direct Azure blob access via signed URLs is a PITA for the reasons I explained above, at least it’s reliable. Like, highly reliable, at the nigh-infinite scalability our modern trifecta of at-scale cloud providers have provisioned to their object storage infrastructure. 

The transparency-in-coverage.uhc.com API on the other hand, is NOT highly reliable. In fact, while testing the url I just copied, which only points to a 3.4KB file, it failed to resolve the URL on the second try with a hang in my browser and resulting HTTP 504 response. Other members of the Serif Health engineering team were unable to access it either. 

A 504 is a good sign your API service is under-capacitized

Now imagine you need to download hundreds of multi-gigabyte files through the same API along with everyone else who has an interest in price transparency. Unlike the pain-in-the-butt kinds of fragility I explained with the signed index URLs, shoddy-API-that-can’t-serve-text-files-reliably fragility like this is extremely problematic for all of us. All of UHC’s indexes point to this new UHC API for every in-network file, there is no other reliable mapping, and the values can and do change monthly. Very tricky to drink from an ocean of carrier data when they’ve effectively given us a paper straw with no SLA as our access pipe.

Long-time readers will recall that we saw a similar pattern with Humana in the very early days of price transparency. List 400,000+ files in your PPO network index, and put a fragile, undersized web service in front of it for retrieval. 

Unreliable data access is exactly what we’ve been seeing for most UHC data files throughout the May and June timeframe. Almost half of the files we attempted to download on 5/1 and 6/1 failed with an “ABORTED” status. It took almost a week of retrying (with exponential backoff, we’re good citizens over here!) before we had a complete data set, which has never before been an issue with UHC. Thankfully, retries do eventually work, you just have to be patient.

Anthem’s Rate Limiting (and other grossness) Continues

Anthem continues to make figuring out what files are in-network a nightmare, thanks to their 10.4GB compressed single-line of JSON index file, obfuscated file names, and the labyrinthian structure of the BCBS Blue Card ‘network’ as it pertains to machine readable file data. (Seriously, their index is so bad we made parsing it the first round of our engineering interview process; it’s a decent litmus test). 

But what’s worse is their rate limiting and blocking of downloads at scale. Anthem operates multiple plans in each of the 14 states they operate in; at Serif Health we tend to pull PPO + HMO, and sometimes some exchange variants, in each state. Roughly thirty networks means about fifty files total each month we’re downloading, and these URLs are direct Amazon S3 bucket access (here’s an example URL from Nevada):

https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.amazonaws.com/anthem/NV_FEPNMED0000.json.gz

Unfortunately, more and more of those downloads are getting rejected / aborted by a rate limiter. It has to be rate limiting because, again, we’re hitting AWS directly here, and initial URL access works just fine. Until it doesn’t. 

Suggestion for Anthem: If you cleaned some of the billions of redundant, useless rows of junk out of these files, your hosting bills will go way down as well as rate limiting as well as my complaining. Positive value trifecta!!!  

BCBS VT Still Can’t ‘Change’

Sadly, while Change Healthcare and the federal government have been releasing feel-good pressers the past few days about all systems being green again on RCM and payments, someone forgot to turn on the MRF download server. 

https://mrf-download.changehealthcare.com/ 

This might be the most deeply ironic ‘ACCESS BLOCKED’ page ever created by humanity. The sentence “You are unable to access changehealthcare.com” provides a rare opportunity for schadenfreude, quickly followed by the depressing realization that anyone with traction changing healthcare will probably wind up selling to Optum for billions, at which point they degrade into part of the problem more than part of any solution. 

That aside, BCBS of Vermont, the largest payer we know of on Change's MRF hosting platform, is still getting bitten by their vendor decision as of this month. Their index TOC works just fine: https://www.bluecrossvt.org/documents/toc-json but all the files inside of it fail with the same Change Healthcare error:

https://mrf-download.changehealthcare.com/bcbsvt/files/2024-06-01_BCBSVT_0002_PPO_VFP_in-network-rates.json.gz 

Hopefully, UHC, this serves as a great example why it’s not a good idea to split your index hosting from your MRF file hosting and create intermediate service-layer dependencies. 

I have tried to notify BCBS of Vermont via their media relations email address. I got rejected. It’s almost like they don’t want to do any media relations.

Aetna Drops all but four DRGs from its files in May

Ok, this one isn’t really a file access issue so much as a ‘how do you keep screwing up MRF generation at its core’ question from yours truly. 

In May, all the alarm bells went off after Aetna’s automated ingest occurred; none of the files passed our QA validation gates. This is pretty common with small and midtier players, but not Aetna.

Looking into the issue, the quality gate that failed was ‘all common codes present’ - and the offending codes were all in the DRG ranges. Every Aetna file we tested had gone from 800+ DRGs present in the file to just four.

We opted not to sound the alarm on this one, taking a wait-and-see approach, and sure enough the DRGs have all been restored for June. 

I mention it here because it calls into question how this data is getting generated at its source. How do you skip 99% of the most cost-intensive code range in a monthly compliance posting? 

Takeaways

I’ve been cautioned by many smart individuals in the industry not to assume malicious intent so much as assume incompetence. If the technical teams at the health plans mentioned in this post need help getting systems to scale, my DM’s are open. Here's a general set of principles: remember these are public, machine-readable disclosures. Assume machines will access them - put all of the files in one cloud bucket and allow unsigned URL access without middlemen or proxies or portals that require clicking. If you do use signed URLs, set expiration to a year, and publish the urls in your indexes and test all links for validity before publishing the indexes. If you want to dramatically lower your hosting costs and technical capacity requirements, start by reducing the amount of spam, junk, and duplication inside the MRFs and across files - I'd wager most plans would save 60-80% of their current hosting and bandwidth costs if they just published clean MRF data in the first place. No rate limiters required.

If you're on the data consumer side, keep in mind that UHC and Anthem have no incentive to engineer MRF hosting sites that scale to infinity. I’ll also entreat you to be a good citizen. Exponentially back off failed requests, don’t make 50,000 concurrent download requests at 3AM on the first of each month, be smart about what data you need versus 'fetch all of it' approaches that are inherently wasteful for you and for the plan.

If you’d rather skip the complicated engineering, or you’re looking for a MRF ingestion partner working through challenges like these month over month so you don’t have to, you know who to call