Boston, MA  ·  Open to relocation

Software
Engineer.

I build systems end-to-end, cloud infrastructure, data pipelines, AI-powered APIs and production deployments. Shipped at Skyworks Solutions and Crewasis, processing millions of records daily.

Data Infrastructure AI / LLM Systems FastAPI · React AWS · Azure · GCP ETL Pipelines Full-Stack Deployment
View My Work Resume GitHub
Shardul Chavan
4+ Years of experience
2M+ Records / day in prod
7 Projects shipped
// flagship projects

The work I'm most proud of

⭐ AI Automation
snowbridge

AI-powered database migration engine — T-SQL to Snowflake, automated end-to-end. GPT-4 converts views and stored procedures. Zero manual intervention.

71Tables
760KRows migrated
0Manual steps
GPT-4SnowflakePythonParquetT-SQLAzure SQL
GPT-4 schema conversion
Views and stored procedures converted automatically — no hand-written SQL rewriting.
30+ type mappings
Edge cases for spatial, binary, and legacy SQL Server types — fully automated.
Single-command pipeline
Extract → transform → load → validate. One command. No babysitting.
Data integrity verified
760K rows, 71 tables migrated with full verification. Built for enterprise.
🚀 Full-Stack AI Product · Live
NewsSphere

Personalized AI news platform — user-interest-driven article delivery with text-to-speech, JWT auth, playlist management, and Airflow-powered real-time ingestion every 8 minutes.

8 minRefresh cycle
3Cloud platforms
LiveDeployed app
FastAPIStreamlitAirflowPineconeOpenAIGCPAzure SQLJWTDocker
Interest-driven personalisation
Users register with topic interests (Business, Environment, Tech…) — top 5 articles served per session from matching sources including BBC, NYT, Google News.
Text-to-speech & voice input
Full audio delivery — listen to articles hands-free. Speech-to-text for voice queries. Built and owned by Shardul end-to-end.
Airflow every 8 minutes
Scheduled DAGs ingest only new or changed articles — smart dedup keeps the feed fresh without duplicates.
JWT auth + playlist system
Secure login, persistent playlists saved to Azure SQL, and ephemeral queues for quick listens — full product UX.
// all projects

Things I've built

RAGDoc
SEC Filing Q&A · Full Stack

RAG pipeline over SEC filings. Nougat OCR + OpenAI for structured Q&A. JWT-secured FastAPI backend, Airflow-orchestrated ingestion, Streamlit frontend. Analysts query documents in plain English.

FastAPIOpenAIAirflowDockerStreamlitAzure
SnowGPT
↑ 40% self-service analytics adoption

Natural language to SQL on Snowflake. LangChain NL-to-SQL with FastAPI backend and Streamlit UI. Non-technical teams now query the data warehouse without writing a line of SQL.

LangChainSnowflakeFastAPIStreamlitDocker
NewsSphere
↓ 35% retrieval latency · refreshes every 8 min

Real-time personalized news. RAG + Pinecone vector search, FastAPI microservices on GCP Cloud Run. Content refreshes every 8 minutes, personalized per user interest graph.

GCPFastAPIPineconeDockerAzure SQLCI/CD
AWS YouTube Analytics
80K+ interactions · serverless

Serverless ELT using AWS Lambda, Glue, and Redshift for multi-regional engagement analysis. Terraform IaC + EKS autoscaling. QuickSight dashboards for marketing insight.

LambdaGlueRedshiftTerraformEKSQuickSight
Retail Sales Pipeline
1M+ transactions · Databricks

Spark/Databricks pipeline processing 1M+ retail transactions for trend analysis and forecasting. Grafana + OpenTelemetry tracing for proactive error detection and SLA monitoring.

SparkDatabricksAzureGrafanaOpenTelemetry
Iowa Sales Intelligence
↓ 60% ingestion time · 25M+ records

Enterprise BI over 25M retail records. Kimball-style dimensional models, ADF + Alteryx ETL workflows, Power BI and Tableau dashboards deployed across 3 business units.

Azure Data FactoryAlteryxPower BITableau
FS & Process Monitor
C++ · macOS systems programming

Lightweight C++ daemon for macOS file system and process monitoring. IPC, concurrency primitives, real-time CLI telemetry. Built and profiled with LLDB.

C++macOSLLDBIPCConcurrency
// capabilities

What I build

01
Data Infrastructure

Pipelines that move millions of records reliably — batch, streaming, real-time. Raw ingestion to analytics-ready models.

  • Airflow · dbt · PySpark · Spark
  • AWS Glue · Azure Data Factory
  • Snowflake · Redshift · BigQuery
  • Terraform · Docker · Kubernetes
02
AI / LLM Systems

AI features actually useful in production — RAG pipelines, NL-to-SQL, document intelligence, LLM-powered automation.

  • LangChain · OpenAI · AWS Bedrock
  • RAG · Vector search · Pinecone
  • NLP · Sentiment analysis
  • GPT-4 schema conversion
03
APIs & Deployment

Full backend-to-cloud delivery — REST APIs, auth layers, containerized services, CI/CD that actually ships to production.

  • FastAPI · React · Streamlit
  • Docker · GitHub Actions · Jenkins
  • Prometheus · Grafana · ELK
  • JWT Auth · Microservices
// experience

Where I've worked

Crewasis
AI Software Engineer Intern
Jan 2025 – Jun 2025
Dallas, TX · Remote
  • Architected ETL on AWS Lambda & EC2 processing 2M+ records/day from social/news feeds for real-time NLP and sentiment analysis.
  • Implemented LSI ML pipeline reducing manual retraining by 70%.
  • Wrote Terraform IaC (EC2, S3, Lambda) — repeatable, high-availability deployments.
  • Built Tableau dashboards combining NLP + CRM segmentation, lifting campaign conversions by 15%.
  • Introduced CI/CD with GitHub Actions & Jenkins and QA harnesses, cutting manual QA by 85%.
PythonAWS EC2/S3/LambdaTerraformFastAPIDockerReactTableauGitHub Actions
Skyworks Solutions Inc.
Software Engineer – Data Platforms (Co-op)
Jan 2024 – Jun 2024
Boston, MA
  • Built 25+ Airflow & dbt pipelines across Azure SQL and data lakes — reduced latency by 60%.
  • Developed APIs to auto-parse BOM data from vendor PDFs, saving 5+ engineering hours weekly.
  • Re-architected RF test ingestion — distributed system tripled throughput for NPI cost tracking.
  • Containerized services with Docker + Prometheus, cutting downtime by 45%.
  • Designed Kimball dimensional models and Power BI dashboards across 3 business units.
AirflowdbtAzure SynapseAzure Data FactoryPower BIDockerPrometheus
Northeastern University
Graduate TA & AI Researcher
Sep 2023 – Dec 2024
Boston, MA
  • Built SnowGPT — LangChain NL-to-SQL on Snowflake, driving 40% increase in self-service analytics adoption.
  • Developed PySpark on Hadoop processing 1.2M+ insurance records, improving model accuracy by 20%.
  • Mentored 50+ graduate students integrating LLMs into enterprise analytics workflows.
  • Containerized ML models with Docker + GitHub Actions CI/CD, reducing deploy time by 60%.
PySparkHadoopLangChainSnowflakeFastAPIDocker
Accion Labs
Software Engineer – Data & API Integration
Jan 2022 – Jul 2022
Mumbai, India
  • Engineered REST APIs connecting ServiceNow with external systems, reducing ticket escalations by 40%.
  • Built parameterized SQL automation for reporting cycles and KPI/SLA validation pipelines.
ServiceNowREST APIsMySQLSnowflakePython
// stack

Technical stack

Languages
PythonSQLScalaJavaScriptBashC++
Frontend & APIs
ReactFastAPIStreamlitREST APIsJWT Auth
Data Engineering
AirflowdbtPySparkDatabricksAWS GlueADF
Cloud & Infra
AWSAzureGCPDockerKubernetesTerraform
AI / ML
LangChainOpenAIBedrockRAGScikit-learnNLP
BI & Monitoring
TableauPower BIGrafanaPrometheusQuickSight

Let's build
something.

Open to full-stack, data engineering, and AI/ML roles. Comfortable at a 5-person startup and a 50,000-person enterprise. Let's talk.

Actively looking · Available now
  • Startup founders — I ship production systems solo, end-to-end. No hand-holding needed.
  • Enterprise HMs — Skyworks, Northeastern, Accion Labs. I know how large orgs work.
  • AI-first teams — LLM integration, RAG pipelines, ML deployment in production.
  • Data-heavy products — Pipeline to dashboard. 2M+ records/day in production.
Shardul's AI
Online
Hey! I'm Shardul's AI. Ask me about his projects, stack, or what kind of roles he's looking for 👋