Healthcare Data Lake using Amazon HealthLake

Home >blogs >
Healthcare Data Lake using Amazon HealthLake

Healthcare and Life Sciences industry has gone through a massive digital transformation over the last decade, leading to vast data collection. To find value from this data and adopt machine learning models, organizations must address challenges such as data normalization, availability, integrity, and governance. Medical information is highly distributed, contextual, and includes primarily unstructured data such as intake forms, clinical notes, X-rays, CT scans, handwritten prescriptions, insurance claims, etc.

Amazon launched a new service at re:invent 2020 to address the growing pain of managing health-related data in the cloud. The service named Amazon HealthLake is a fully managed HIPAA-eligible service that enables healthcare and life science companies to store, transform, query, and analyze health data on the AWS cloud at a petabyte-scale. Companies can aggregate all of their disparate health information across various styles and formats into a centralized data lake provisioned and managed by Amazon HealthLake. It makes it easy to import both structured and unstructured data from on-premises to AWS cloud. Companies can leverage the pre-built machine learning models to normalize and index the information by tagging the key dates, medical descriptors, and events like medications, procedures, and diagnosis. It enables users to search and analyze all the health information quickly.

Amazon HealthLake does the heavy lifting to configured multiple data sources, ingests the data, indexes all the information to be searched later, and stores it in open standard formats — like the FHIR mandated format. It processes the unstructured text data using NLP and images using ML models like binary classification, multiclass classification, or regression. Once the data has been converted to structured and centralized information, you get a complete view of an individual patient’s history to a level of granularity where you can apply advanced analytics or machine learning models for prediction.

Ingesting data using HealthLake

Amazon HealthLake makes it easier to ingest data from on-premises data sources to AWS. Organizations can use Bulk Import feature to migrate their on-premises files to S3 bucket easily.

Storing data in the open standard format

To enable the fast search queries, Amazon HealthLake Data Store creates a complete view of each patient’s medical history in chronological order. The Data Store facilitates information exchange using the open standard V4 FHIR specification and is always running to keep the index up to date. To meet the regulatory compliance, it enables rigorous security and access control.

Transforming data using Machine Learning models

Amazon HealthLake integrates medical natural language processing (NLP) to transform raw medical data from the Data Store. It uses specialized pre-built Machine Learning models that have been trained to understand and extract meaningful information from unstructured healthcare data. The original resource stays unchanged, and the extracted medical information is automatically appended to the resource.

Querying and searching data

Users can search all the information on a patient using predefined filters or utilize FHIR CRUD (Create/Read/Update/Delete) and FHIR Search operations supported by Amazon HealthLake.

Visualization of data and making predictions

Developers can leverage integration with Amazon QuickSight to quickly create dashboards on the normalized data to explore trends and patterns among their patents. Developers can also use Amazon SageMaker to build, train and deploy their Machine Learning models on the data to make predictions.

As of this writing, Amazon HealthLake is only available in US East (North Virginia) region, but we are sure that it will be made available in the other areas pretty soon. Contact us if you are having challenges with ingesting your data in different formats and finding value out of this data. Our team of cloud experts can help set up your data pipeline or help build a data lake using Amazon HealthLake.