Modern Data Lake architecture on AWS

Data lake on AWS – Case Study

About the Client

Stark and Bain is a leader in staff augmentation within manufacturing industry, supplying manpower in Automotive, Warehouse Operations and other manufacturing companies. Stark and Bain helps companies accelerate their time to market and focus on core functional competencies by offloading their staffing needs to experts. Stark and Bain, with its unique approach boasts of 95% attendance and up to 13% cost reduction.

Executive Summary

Avahi is supporting the Client’s journey to rearchitect existing On-Prem EDW solution (Informatica/Teradata) to Data Lake based singe platform with objective to have single of source of truth from heterogeneous data, help reduce cost, provide flexibility, scalability, ability to handle big data and enable advanced analytics embarked on its.

Business Challenges

The customer challenges could be broadly classified in below categories:

Current EDW has challenges meeting business demands for advanced analytics
Provisioning on-premise resources for new business requirements is a lengthy process
Lack of ability to connect and handle disparate & heterogeneous data sources
Absence of consolidated platform and tools to analyse data in real-time.
Lack of a single repository for IT and Operational technology data
Inconsistencies due to disparate tools for data analytics and visualization across the organization
Current infrastructure lacks support for self service capabilities for the business.
Challenges extending data models to include unstructured data such as meter reading, weather etc.

Solution

Avahi solution revolves around building the data lake using AWS S3 as storage and move existing functionalities from On-Prem EDW solution (Informatica/Teradata) to AWS services using Informatica BDM. The modern architecture will provide single platform for heterogeneous data, help reduce cost, provide flexibility, scalability, ability to handle big data, streaming data and enable advanced analytics.

Data Lake: A holistic data lake is the modern architecture to handle all enterprise data needs.
Batch: Efficient handling of Batch process for ingestion and data flows to different Data Lake zones
Stream: Stream ingestion and data flow through different AWS service layers and data lake
Consumption: Data Consumption ability at any layer of Data Lake through Athena, Quicksight
Advanced Analytics: Quick insights from high velocity data and ability to mix and mash heterogeneous data
Storage: Low Cost data storage decoupled from compute. On Demand Compute: Transient in-memory compute instances provide cost per use
High Availability: Multi AZ VPC to deploy solutions and S3 cross region replication provides high availability & Durability
Self Service: Ease of use, AWS services like Quicksight, Athena and applied accelerators will enable self service