Masood Salman Choudhury
Senior Data Engineer & AI Solutions Architect

About
Senior Data Engineer & AI Solutions Architect with 5+ years' experience delivering end-to-end data platforms and intelligent applications across fintech, SaaS, and industrial analytics. Expert in designing scalable data pipelines, building AI-powered systems and Agentic AI with LLMs and machine learning models, and deploying robust cloud-native solutions on AWS, Azure, and GCP.
Experience
- Senior Data Engineer & AI Solutions Architect @ HelixioraNetherlands - RemoteSummary:
- Senior data and AI lead delivering customer-facing projects end to end: scalable Azure and AWS data platforms, agentic AI and RAG, LLM fine-tuning and serving, full-stack SaaS, and secure cloud operations with strong stakeholder partnership and mentorship.
Responsibilities:
- Led customer-facing delivery by partnering with stakeholders to design and deploy innovative end-to-end solutions across Azure and AWS, translating business problems into scalable data, AI, and platform architectures
- Designed and built large-scale Azure Databricks pipelines using PySpark and SQL to ingest, clean, and transform multi-source datasets (Blob Storage, Cosmos DB); implemented real-time streaming with Kafka; delivered curated Delta Lake layers aligned with Kimball-style dimensional modeling for analytics and ML workloads
- Led development and deployment of multiple production-grade full-stack AI SaaS platforms on Azure using FastAPI, Next.js, React Native and Expo (Android and iOS), OAuth 2.0, PostgreSQL, and Stripe
- Architected and deployed multi-stage agentic RAG, chatbot, and automation systems using OpenAI Agent SDK, LangChain, LangGraph, CrewAI, Pinecone, and custom Python pipelines integrating Slack and Google Drive
- Built ML-ready datasets in Delta Lake, supported downstream model training, and optimized vector search and reranking pipelines for retrieval quality
- Fine-tuned domain-specific LLMs using Unsloth with LoRA and deployed them into production AI workflows to improve inference accuracy
- Deployed vLLM and llama.cpp for high-performance LLM serving optimized for low latency and high concurrency
- Designed CI/CD with GitHub Actions deploying containerized applications to AWS ECS using Docker, Terraform, and infrastructure as code; implemented Zero-Trust security, SSL/TLS, firewall rules, and secure API gateways
- Implemented centralized logging and monitoring with Prometheus and Grafana for observability
- Mentored engineers through code reviews, architectural guidance, and engineering best practices
- Python
- PySpark
- SQL
- Databricks
- Delta Lake
- Kafka
- FastAPI
- Next.js
- React Native
- Expo
- LangChain
- LangGraph
- CrewAI
- Pinecone
- PostgreSQL
- MySQL
- Docker
- AWS
- Azure
- Terraform
- GitHub Actions
- Git
- Prometheus
- Grafana
- Nginx
- Senior Data Engineer @ VaultoroManchester, United Kingdom - On-SiteSummary:
- Architected Kimball-style warehousing and scalable financial ETL on GCP; co-led a FastAPI savings product and analytics supporting security and growth.
Responsibilities:
- Architected a Kimball-style star-schema data warehouse using Elasticsearch and BigQuery for real-time KPI dashboards in Kibana
- Built scalable ETL pipelines with Python, GCP Dataflow, Scrapy, and Apache Airflow processing 10M+ financial rows daily
- Co-led development of a FastAPI-based Savings Platform processing $2M+ monthly deposits
- Optimized MongoDB, PostgreSQL, Elasticsearch, and BigQuery for performance
- Performed anomaly detection and fraud analysis using Pandas
- Automated data validation workflows ensuring pipeline integrity and uptime
- Deployed applications using Docker and Kubernetes
- Built a Random Forest model to identify high-value clients
- Python
- Elasticsearch
- BigQuery
- Kibana
- GCP
- FastAPI
- MongoDB
- PostgreSQL
- Apache Airflow
- Pandas
- Docker
- Kubernetes
- Data Analyst @ SoftCrop ITGuwahati, India - On-siteSummary:
- Delivered actionable insights via Tableau dashboards and automated data collection processes.
Responsibilities:
- Delivered actionable insights via Tableau dashboards (waterfall/cohort analysis), improving stakeholder decision-making
- Automated competitor data scraping (Scrapy) and ETL into MySQL, reducing manual effort by 50%
- Analyzed sales and geographic data for fiber network expansion strategy
- Python
- Tableau
- MySQL
- Scrapy
- Pandas
Projects
A comprehensive collection of production-ready Docker Compose stacks for self-hosted services, including monitoring, databases, home automation, and infrastructure management tools for personal homelab environment.
- 🐳 Designed and deployed 15+ production-ready Docker Compose stacks for self-hosted services
- 📊 Implemented comprehensive monitoring stack with Prometheus, Grafana, cAdvisor, and Node Exporter for infrastructure visibility
- 🔍 Built Elasticsearch stack for log aggregation and search capabilities with Kibana visualization
- 🏠 Integrated Home Assistant for smart home automation and IoT device management
- 💾 Configured MySQL and database services with optimized Docker configurations for data persistence
- 🚀 Deployed Portainer for container management and orchestration with streamlined deployment workflows
- ⚡ Implemented performance testing tools (LibreSpeed, OpenSpeedTest) for network and system benchmarking
A Python automation script that automatically tags anime episodes as filler or canon by scraping animefillerlist.com and intelligently renaming files with metadata for better organization and viewing experience.
- 🔍 Built automated web scraper to extract episode metadata from animefillerlist.com for accurate filler/canon classification
- 📁 Implemented intelligent file renaming system that preserves quality tags while adding filler/canon metadata
- ⚙️ Developed configurable system supporting multiple anime series with customizable quality tags and file paths
- 🎯 Created smart episode detection algorithm to handle various filename formats and episode numbering schemes
- 📊 Automated metadata integration that enhances media library organization and viewing experience
- 🚀 Streamlined workflow for anime enthusiasts to efficiently manage large episode collections
A computer vision-based hand gesture recognition system using TensorFlow and OpenCV for real-time gaming control, enabling hands-free game interaction through custom-trained machine learning models.
- 🤖 Built custom hand gesture detection model using TensorFlow transfer learning with SSD MobileNet V2 FPNLite architecture
- 📸 Developed automated image collection pipeline for custom hand gesture dataset creation and labeling
- 🎯 Implemented real-time hand gesture recognition from video feed for gaming applications
- 🎮 Created practical gaming integration demonstrated with Chrome Dino game control
- 🔬 Applied transfer learning techniques to optimize model performance for specific gesture recognition tasks
- 📊 Structured project workflow with Jupyter notebooks for data collection, training, and detection phases
A Python-based automation tool that downloads and applies the latest Nvidia DLSS and DLSS Frame Generation DLLs to local games, ensuring optimal gaming performance and visual quality.
- 🔧 Built automated DLL management system for Nvidia DLSS and Frame Generation updates
- 🌐 Integrated with TechPowerUp.com API for real-time DLL version monitoring and downloads
- 📁 Implemented intelligent file discovery to locate DLSS DLLs across multiple game directories
- 💾 Created automated backup system with timestamped naming for safe DLL rollback
- ⚙️ Developed configurable system supporting multiple game library paths and server locations
- 🚀 Packaged as executable with PyInstaller for easy distribution and deployment
- ⭐ Achieved 10+ stars on GitHub demonstrating community recognition and utility
A Python-based web scraping application that monitors Gameloot.in for PC component stock changes and sends real-time Telegram notifications when products become available or go out of stock.
- 🕷️ Built automated web scraper using BeautifulSoup and Python for real-time PC component monitoring
- 📊 Implemented MongoDB integration for persistent storage and smart deduplication of products
- 🤖 Developed Telegram bot integration for instant stock change notifications
- 🔍 Implemented change detection algorithms to identify new products, restocked items, and sold-out products
- 📝 Comprehensive logging system with configurable levels for monitoring and debugging
- 🚀 Automated deployment with error handling and retry mechanisms for robust operation
MSc project at University of Liverpool predicting ship delays with 80% accuracy using machine learning techniques.
- 🎯 Predicted ship delays 10+ days in advance with 80% accuracy
- 🔧 Cleaned and engineered features using Pandas, NumPy, and Pearson correlation
- 🤖 Evaluated multiple ML models (SVM, DT, RF, NN) and selected Random Forest
- 🚀 Deployed the model with FastAPI for real-time predictions
Personal project for self-learning diabetes prediction using machine learning techniques.
- 📊 Exploratory Data Analysis (EDA) with Seaborn and UMAP visualization
- ⚙️ Tuned XGBoost hyperparameters with Optuna and ML-Flow
- 🌐 Deployed using Streamlit framework
Education
University of Liverpool
Asian Institute of Management and Technology
Certificates
Skills
- Python
- Databricks
- PySpark
- Delta Lake
- Langchain
- Pinecone
- React Native
- PostgreSQL
- MySQL
- Docker
- AWS
- Azure
- GCP
- Terraform
- Git
- Elasticsearch
- BigQuery
- FastAPI
- MongoDB
- Apache Airflow
- Pandas
- Kubernetes
- Tableau
- Scrapy
- Prometheus
- Grafana
- Nginx
- Kafka
- Next.js
- Expo
- LangGraph
- CrewAI
- GitHub Actions
- MLflow
- Weaviate
- Scikit-learn
- PyTorch
- Django
- Go
- C++