v1.1.0 Release
METL 1.1 is released!
This release is an important milestone in the development of METL. With this release, we introduced metl-deploy
which is a set of scripts that make it super easy to deploy METL on AWS. My Price Health uses METL on Google Cloud and Health Rosetta uses METL on AWS so metl-deploy
takes METL on a multi-cloud journey. Since METL runs on Kubernetes, we are confident it can also easily run on Azure and bare metal too.
What's new?
The principal user-facing feature for METL 1.1 is the introduction of a set of claims loader scripts for Assured Benefits Administrators. This gives us the ability to view post-adjudicated claims data for this TPA's claims data. In addition to the loader, the ABA claims loader was our first load into the claims
schema with post-adjudicated claims data, so we needed to also make some modifications to the schema to support the claims load.
As with all other METL database loads, loading data to the database consists of 2 components.
extractor
configuration. METL extracts files before loading them. In METL, extracting can mean a number of different things:- Unzip a file. This is a frequent use case for CMS and other standard data files online
- Convert data from multiple formats. For now
extractor
supports fixed-width files and CSV files. We anticipate adding additional file formats over time. - Clean invalid data from data files. It is fairly common with data files from CMS and other sources that there will be spurious invalid characters in them. Our personal favorite is the NPI registry data files. They are around 9 GB unzipped and it can take a long time to load that much data to the database. It's always fun to spend 50 minutes waiting for a file to extract and attempt to load only to find that there is an error on line 1,230,874. If you manually fix that, you'll then find an error on 1,597,145. If we have to manually clean up files like this, a 50 minute load can stretch on for many hours, if not over multiple days of manual cleaning.
- Pull data from one or more files in a zip or a single file. Sometimes it's important to be able to combine data from multiple files into a single file (e.g. NPI registry data that spreads name information across multiple files), and sometimes it's important to be able to split a single file into multiple files (e.g. claims data which may repeat the same data on multiple lines, but we don't want to load that data multiple times). If data is in files that looks similar to the tables they will be loaded into, it's a LOT easier to write load scripts.
- Write data to an easily bulk-loadable set of CSV format files. By unifying the format for the files that we load to the database,
extractor
makes it tons easier to build load scripts forcsvloader
csvloader
configuration and load scripts. METL starts with uniform, clean CSV's, loads them to staging tables of the same names and then runs load scripts to transform them to their final shape.- The first step is for METL to read the load config file. It identifies the order that files will be loaded and what scripts are run. All load scripts are just
.sql
files and everything gets checked into source control so it's easy to see exactly what's going on at any time. - In the order specified in the config file, load steps are performed. For most load steps, a CSV is loaded into a staging table and then a load script is run to load the data into the production tables.
- Because every load file is just
SQL
, any transformation that you can do with data inSQL
, you can do withcsvloader
- The first step is for METL to read the load config file. It identifies the order that files will be loaded and what scripts are run. All load scripts are just
Release details
Claims: Pull Request, Commit hash: 6ebcc3f
METL-deploy: Pull Request, Commit hash: 7daa830
Monorepo (Not public, but added here for documentation. We expect to open source extractor
and csvloader
in the future): Pull Request