Designed and developed an optimal metadata-driven framework that can reduce data pipeline development effort by 80%

- Designed and implemented data pipeline to process semi-structured data using PySpark and stored processed data in Redshift and AWS RDS
- Ingested data from disparate data sources such as Salesforce API, SFTP, and other APIs using Python for data pipelines.
- Automates works using python scripts
- Orchestrated workflows using Airflow and monitored them using AWS Cloudwatch
- Implemented alerts using AWS SNS
- Designing, developing and testing out Proof of concepts

06 Jan 2020 - 31 Dec 2021

