Data lake on AWS – Case Study
About the Client
Stark and Bain is a leader in staff augmentation within manufacturing industry, supplying manpower in Automotive, Warehouse Operations and other manufacturing companies. Stark and Bain helps companies accelerate their time to market and focus on core functional competencies by offloading their staffing needs to experts. Stark and Bain, with its unique approach boasts of 95% attendance and up to 13% cost reduction.
Avahi is supporting the Client’s journey to rearchitect existing On-Prem EDW solution (Informatica/Teradata) to Data Lake based singe platform with objective to have single of source of truth from heterogeneous data, help reduce cost, provide flexibility, scalability, ability to handle big data and enable advanced analytics embarked on its.
The customer challenges could be broadly classified in below categories:
- Current EDW has challenges meeting business demands for advanced analytics
- Provisioning on-premise resources for new business requirements is a lengthy process
- Lack of ability to connect and handle disparate & heterogeneous data sources
- Absence of consolidated platform and tools to analyse data in real-time.
- Lack of a single repository for IT and Operational technology data
- Inconsistencies due to disparate tools for data analytics and visualization across the organization
- Current infrastructure lacks support for self service capabilities for the business.
- Challenges extending data models to include unstructured data such as meter reading, weather etc.
Avahi solution revolves around building the data lake using AWS S3 as storage and move existing functionalities from On-Prem EDW solution (Informatica/Teradata) to AWS services using Informatica BDM. The modern architecture will provide single platform for heterogeneous data, help reduce cost, provide flexibility, scalability, ability to handle big data, streaming data and enable advanced analytics.
- Data Lake: A holistic data lake is the modern architecture to handle all enterprise data needs.
- Batch: Efficient handling of Batch process for ingestion and data flows to different Data Lake zones
- Stream: Stream ingestion and data flow through different AWS service layers and data lake
- Consumption: Data Consumption ability at any layer of Data Lake through Athena, Quicksight
- Advanced Analytics: Quick insights from high velocity data and ability to mix and mash heterogeneous data
- Storage: Low Cost data storage decoupled from compute. On Demand Compute: Transient in-memory compute instances provide cost per use
- High Availability: Multi AZ VPC to deploy solutions and S3 cross region replication provides high availability & Durability
- Self Service: Ease of use, AWS services like Quicksight, Athena and applied accelerators will enable self service