Nguyen Bao Duy

Ha Noi,Viet Nam

Summary

Computer science undergraduate specializing in data engineering and artificial intelligence. Developed high-throughput data processing systems utilizing Python and Apache Spark, with expertise in batch and real-time streaming architectures. Built retrieval-augmented generation pipelines, constructed knowledge graphs with Neo4j, and implemented hybrid search solutions using Elasticsearch. Proficient in automated data acquisition, HTML parsing, and containerizing environments with Docker, delivering scalable and reliable software solutions.

Overview

Certification

Work History

Vietnamese Stock Data Ingestion

& Analytics System

Group Project

10.2025 - 02.2026

Assisted in developing high-throughput data ingestion pipeline for collecting tick-by-tick stock data from financial APIs and web sockets.

Supported engineering of speed layer using Spark Streaming for processing and aggregating raw market events with millisecond latency.

Contributed to implementing real-time windowing functions for transforming unstructured tick data into OHLCV candlesticks across multiple timeframes (1m, 5m, 15m).

Github: https://github.com/baoduy2048/bigdata

Vietnamese Legal Q&A System

(RAG)

Group Project

09.2025 - 01.2026

Assisted in developing web crawling system using BeautifulSoup4 for collecting legal documents from government HTML portals.

Supported the design of an HTML parsing pipeline to convert unstructured web data into structured formats while maintaining hierarchical relationships.

Contributed to the creation of a knowledge graph in Neo4j to illustrate complex legal hierarchies and cross-references between various law sets.

Helped orchestrate data environment using Docker to ensure reproducibility and isolated setups for crawling and storage microservices.

Github: https://github.com/baoduy2048/RAG-KG

Education

Computer Science

Hanoi University of Science and Technology

Skills

Programming languages: Python, C, SQL

Data processing: Apache Spark, Pandas, NumPy

AI and RAG: LangChain, vector search, LLM integration

Web scraping and extraction: BeautifulSoup, HTML parsing