Skip to main content

v1.4.0 Release

ยท 8 min read
Founder and CEO

The 1.4 METL release is here!

Now that we've got a bunch of data in the database, this release focuses on performance. We made some significant improvements to some of our tables and views to improve performance. We also switched to using a new function called insert_with_id to load many tables and saw a massive productivity improvement for data load script development.

What's new?โ€‹

  1. Some significant improvements to tables and views to improve query times. All of the member* views are now faster than last release
  2. Unified all the eligibility loads. We now have one set of scripts that load all eligibility files. Barring any unforeseen new things with eligibility, we think we can model any future eligibility files without load script changes. This cuts the time to configure a new eligibility load to 2-4 days
  3. In an effort to make the member table even more unique to a single member, SSN is now the only unique key for this table (previously it also included DOB, but due to data inconsistencies, this was causing some SSN's to appear multiple times, once with each family member's SSN)
    1. This allows for the best method we know of to get non-duplicate names of members in a drop-down.
    2. The only downside to using only SSN is that this means that if a TPA is not sending SSN for all dependents that only the subscriber will show in the drop-down. The upside to that "problem" though is that it will remind us to bug the TPA to give us all the data ๐Ÿ™‚
  4. A new member_member_eligibility table. This joins our unique member to the eligibility information for that member over time. This allows us to see all the information about a member over time. Because member is by SSN, this even allows us to see changes across TPA's. A future release will allow for a UI to fix data errors that cause a member to be associated with a SSN that is not actually theirs but may've been typed incorrectly by the TPA.
  5. We now use the function insert_with_id to load data to most of our lookup data tables. It has cut the amount of code in most of our load functions by > 2/3. It also fixed a number of load bugs that had crept in from all the manually created load files before. Its job is to insert new unique data based on a unique index, get back the correct ID from either the generated snowflake (if not yet loaded) or from the existing table (if previously loaded).
  6. Speaking of load files, we made lots of improvements and fixes in general to our load files which already has made it simpler and faster to create new data loads.
  7. Not software related, but human: Cody and Alec are now fully up to speed on the database, the tables, and claims and eligibility data. They can blast through a new TPA load file without almost any outside help.
  8. We've added the ability to create a snapshot of our database after a set of data files are loaded. We've taken a snapshot of all the data we've loaded so far. This will make it comparatively trivial to validate that data continues to load as its supposed to in the future without regressions. And, if changes occur, it's pretty easy to do a quick diff in a standard diff tool to see and validate the changes
  9. New validation views. We used all of these in our own data validation:
    1. claim_coverage_view
    2. claim_diagnosis_view
    3. claim_drg_view
    4. claim_member_view
    5. claim_procedure_view
    6. claim_provider_view
    7. claim_view
  10. Added PrivateLink connectivity to allow Health Rosetta to connect to our METL database instance for your staging environment. Will work with Runako tomorrow to get setup

Change to be aware of:โ€‹

  • I mentioned this already a few times, but the subscriber linking for each member is now in member_eligibility. Last release it was in member_coverage. This shouldn't affect anything Health Rosetta is doing since you weren't using this column yet.

Data loads added/updated in this releaseโ€‹

  1. ABA claims
  2. ABA eligibility
  3. ACS claims
  4. ACS eligibility
  5. Aither Claims
  6. GPA Claims - I360 format
  7. GPA claims - GPA format
  8. GPA eligibility
  9. JPF claims
  10. JPF eligibility - fixed phone numbers
  11. Lucent CA claims
  12. Lucent CA eligibility
  13. Lucent WI claims
  14. VHBA eligibility

More about insert_with_idโ€‹

It sounds simple enough, but we have a lot of "unique" data tables in the database to track changes in data without also incurring the full cost of storing duplicate data all the time. For example, we have an address_seen table which has every unique address we've ever encountered, a provider_seen table with every unique provider we've seen, etc. In total, we have dozens of these tables. Unfortunately, they are hard to load because of the semantics of those load scripts. But, we've done it enough now that David took the template we had built and created a load_with_id function which does all the work for us. It looks up the unique constraint, dynamically builds the insert query and much more. It's quite the marvel that he created. ๐Ÿ™‚

Fortunately, now loading those few dozen tables goes from a complicated and error-prone script like this:

-- strategy modified from https://stackoverflow.com/a/42217872
WITH input AS (
SELECT
address_id::bigint,
upper(address1) AS address1,
upper(address2) AS address2,
upper(city) AS city,
upper(state_or_province) AS state_or_province,
upper(postal_code) AS postal_code,
country_code
FROM address_seen_staging
), inserted AS (
INSERT INTO claims.address_seen (
address_id,
address1,
address2,
city,
state_or_province,
postal_code,
country_code
)
SELECT * from input
ON CONFLICT DO NOTHING
RETURNING *
)
SELECT input.address_id, existing.address_id AS new_id
INTO TEMP TABLE address_seen_id
FROM input
INNER JOIN claims.address_seen existing
ON coalesce(input.address1, '') = coalesce(existing.address1, '')
AND coalesce(input.address2, '') = coalesce(existing.address2, '')
AND coalesce(input.city, '') = coalesce(existing.city, '')
AND coalesce(input.state_or_province, '') = coalesce(existing.state_or_province, '')
AND coalesce(input.postal_code, '') = coalesce(existing.postal_code, '')
AND coalesce(input.country_code, '') = coalesce(existing.country_code, '')
AND '' = coalesce(existing.country_subdivision, '')
UNION ALL
SELECT input.address_id, inserted.address_id as new_id
FROM input
INNER JOIN inserted
ON coalesce(input.address1, '') = coalesce(inserted.address1, '')
AND coalesce(input.address2, '') = coalesce(inserted.address2, '')
AND coalesce(input.city, '') = coalesce(inserted.city, '')
AND coalesce(input.state_or_province, '') = coalesce(inserted.state_or_province, '')
AND coalesce(input.postal_code, '') = coalesce(inserted.postal_code, '');

CREATE INDEX address_seen_id_idx ON address_seen_id (address_id);

To a simple one that looks like this:

CREATE TEMP VIEW insert_view AS (
SELECT
address_id::bigint,
upper(address1) AS address1,
upper(address2) AS address2,
upper(city) AS city,
upper(state_or_province) AS state_or_province,
upper(postal_code) AS postal_code,
country_code,
'' AS country_subdivision
FROM address_seen_staging
);

SELECT claims.insert_with_id('claims.address_seen', 'address_id');

The use of insert_with_id was helpful already this release and dramatically cut down debugging and fixing time because some of our load scripts were wrong. The new function fixed them immediately. Moving forward, we won't have to write another of these scripts which will speed up development time too.

The other change we made this release was to further standardize all of our loads. We've been slowly building up all of our loads so that every TPA data load includes all the data available in the database. Our loading is now just a matter of plugging in fields to each template. We have slowly gotten rid of all the custom load scripts that we've written in the past and standardized everything into one set of scripts. All of our eligibility files are loaded with one standard set of load SQL files. Our claims data is down to 3 special load files that I hope we'll be able to get rid of next release. The goal is to make it drop-dead easy to load data and I think we're really close to that. The last big improvement we have on our roadmap to improve is to build a template so that we can map fields to the database in the exact same format as the Excel spreadsheets we develop for each load. At that point, we hope the YAML file for extractor will be as easy as copy/paste from Excel.

So, even as much as we've improved our speed for loading data this release vs. last release, I hope that subsequent releases will see a similar improvement in speed

Patch releasesโ€‹

1.4.1โ€‹

Migration update

1.4.2โ€‹

Migration update

1.4.3โ€‹

Migration update

1.4.4โ€‹

Migration update

1.4.5โ€‹

Fixes load script for Aither

Release detailsโ€‹

Claims: Pull Request: N/A, Commit hash: 7fea583

METL-deploy (no change since 1.3): Pull Request, Commit hash: 8d03407

Monorepo (Not public, but added here for documentation. We expect to open source extractor and csvloader in the future): Pull Request