AWS Certified Data Engineer Associate
Designed and deployed scalable ETL pipelines processing 2M+ unstructured text records from social/news feeds, enabling downstream NLP, sentiment analysis, and actionable product-market insights
Built EC2-hosted machine learning pipeline orchestrated with AWS Lambda, automating Latent Semantic Indexing (LSI) training and reducing manual retraining cycles by 70%
Utilized Terraform scripts to deploy repeatable AWS infrastructure, ensuring high availability and reproducibility
Developed interactive Tableau dashboards combining NLP insights with CRM segmentation, empowering marketing teams to make data-driven decisions and increasing campaign conversions by 15%
Engineered 25+ Airflow and dbt pipelines across Azure SQL and data lakes, reducing data refresh latency by 60% and boosting accessibility for analytics teams
Standardized RF testing data into analytics-ready formats, saving 5+ engineering hours weekly and accelerating high-volume RF module validation
Containerized data services with Docker and set up Prometheus monitoring to track performance and failures, reducing downtime by 45%
Ensured data governance by integrating validation with Great Expectations and enforcing RBAC + row-level security policies
Designed scalable dimensional data models and built interactive Power BI dashboards for 3 business units
Developed scalable PySpark pipelines on Hadoop to process 1.2M+ insurance records, enabling efficient feature engineering for risk modeling and improving predictive accuracy by 20%
Mentored 50+ graduate students in building applied AI/BI applications, integrating LLMs (OpenAI, Anthropic) into enterprise analytics use cases
Built SnowGPT, a LangChain-powered tool on Snowflake, enabling natural language-to-SQL translation driving 40% increase in self-service analytics adoption
Deployed containerized ML models with Docker and GitHub Actions CI/CD, reducing manual deployment time by 60%
Engineered REST-based integrations between LLM APIs and ServiceNow, enhancing conversational AI for support bots, resulting in a 40% reduction in ticket escalations
Built parameterized SQL automation scripts to streamline reporting cycles, while partnering with QA and product teams to validate data accuracy against KPIs and SLA thresholds
Personalized News Platform
Engineered FastAPI-based services with optimized RAG indexing, reducing content retrieval times by 35% and driving higher user engagement.
Deployed containerized Streamlit app with Docker + Kubernetes, ensuring fast, reliable personalized news delivery every 8 minutes.
AWS-Based Analytics Platform
Built serverless ELT pipeline using AWS Lambda, Glue, and Redshift to analyze 80K+ user interactions in real-time.
Automated deployment with Terraform IaC and scaled workloads via Kubernetes (EKS), improving pipeline resilience.
Large-Scale Data Processing
Architected Spark pipeline processing 1M+ retail transactions, enabling trend analysis and accurate forecasting.
Implemented Grafana dashboards and OpenTelemetry tracing for proactive error detection and pipeline observability.
Enterprise BI Solution
Engineered ETL workflows processing 25M+ sales records using Azure Data Factory and Alteryx.
Built Kimball-based models to boost ingestion speed by 60% and developed Power BI dashboards for improved forecasting.
I'm always interested in discussing new opportunities in data engineering and cloud architecture.