Hi there π Iβm
Data Engineer | AWS | Python | Databricks | Airflow
Email | LinkedIn | My Portfolio | GitHub
Results-driven Data Engineer with 3 years of experience designing, developing, and optimizing large-scale ETL/ELT pipelines, real-time streaming, and cloud-native lakehouse architectures across AWS, Azure, and Databricks. Proven success in improving performance, scalability, and data quality for enterprise-grade analytics and machine learning workloads. At FedEx, I build and optimize PySpark-based pipelines processing 1B+ daily records, leveraging Databricks, Airflow, Delta Lake, and AWS Glue to achieve 99.9% SLA compliance. Experienced in multi-tenant lakehouse design, schema evolution, and CI/CD automation with Jenkins and Git-based workflows. Previously at Knowledge Solutions and CloudEnd Platform, I engineered robust data ecosystems on Azure Data Factory, Synapse Analytics, and Snowflake, enabling customer churn prediction, forecasting models, and governed data lakes that improved analytics efficiency by 40%+. Skilled in Python, SQL, and PySpark, with hands-on expertise in Airflow, Databricks, Glue, ADF, Redshift, and Synapse. Adept at implementing data quality frameworks (Great Expectations, Delta constraints), managing governance (Unity Catalog, IAM), and automating CI/CD deployments for scalable, high-reliability data solutions.
Python, PySpark, SQL, Scala
Apache Spark, Databricks, Delta Lake, Apache Airflow, AWS Glue, Azure Data Factory
AWS (S3, Redshift, Glue, Step Functions, SNS), Azure (ADLS, Synapse Analytics, DevOps, Monitor)
Snowflake, Azure Synapse Analytics, Amazon Redshift, SQL Server
SSIS, Databricks Workflows, AWS Glue, Azure Data Factory
Great Expectations, Delta Lake Constraints, Unity Catalog, IAM, Hive Metastore
Apache Airflow, Jenkins, Azure DevOps, AWS Step Functions
Power BI, Tableau
Azure Machine Learning, Prophet, Scikit-learn, Feature Engineering
Git, GitHub, Databricks Repos, Jenkins, Azure DevOps Pipelines
SQL Server, PostgreSQL, MySQL, NoSQL
University at Albany, SUNY β Master of Science in Data Science
Albany, NY β’ Aug 2023 β May 2025
Relevant Coursework: Advanced Statistics, Machine Learning, Big Data Analytics, Data Mining, Business Intelligence, Statistical Computing
CMR Institute of Technology, Hyderabad β B.Tech in Computer Science
Hyderabad, India β’ Aug 2018 β May 2022
Relevant Coursework: Data Structures & Algorithms, DBMS, Software Engineering, OOP, Web Technologies
Jan 2025 β Present β’ USA
Jun 2022 β Jul 2023
Jun 2021 β May 2022
Spark, Snowflake, Tableau, ETL
Built end-to-end Spark and Snowflake ETL pipelines and delivered a Tableau dashboard enabling analysts to track real-time inflation and wage data across 190+ countries, reducing reporting delays by 80%.
Python, Machine Learning, Tableau, Predictive Modeling
Developed predictive churn models and Tableau dashboards to surface high-risk telecom customers and behavioral patterns, empowering business analysts to drive targeted retention strategies and policy actions.