Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Nguyen Bao Duy

Ha Noi,Viet Nam

Summary

Computer science undergraduate specializing in data engineering and artificial intelligence. Developed high-throughput data processing systems utilizing Python and Apache Spark, with expertise in batch and real-time streaming architectures. Built retrieval-augmented generation pipelines, constructed knowledge graphs with Neo4j, and implemented hybrid search solutions using Elasticsearch. Proficient in automated data acquisition, HTML parsing, and containerizing environments with Docker, delivering scalable and reliable software solutions.

Overview

1
1
Certification

Work History

Vietnamese Stock Data Ingestion

& Analytics System
Group Project
10.2025 - 02.2026

Assisted in developing high-throughput data ingestion pipeline for collecting tick-by-tick stock data from financial APIs and web sockets.

Supported engineering of speed layer using Spark Streaming for processing and aggregating raw market events with millisecond latency.

Contributed to implementing real-time windowing functions for transforming unstructured tick data into OHLCV candlesticks across multiple timeframes (1m, 5m, 15m).

Github: https://github.com/baoduy2048/bigdata

Vietnamese Legal Q&A System

(RAG)
Group Project
09.2025 - 01.2026

Assisted in developing web crawling system using BeautifulSoup4 for collecting legal documents from government HTML portals.

Supported the design of an HTML parsing pipeline to convert unstructured web data into structured formats while maintaining hierarchical relationships.

Contributed to the creation of a knowledge graph in Neo4j to illustrate complex legal hierarchies and cross-references between various law sets.

  • Helped orchestrate data environment using Docker to ensure reproducibility and isolated setups for crawling and storage microservices.

Github: https://github.com/baoduy2048/RAG-KG

Education

Computer Science

Hanoi University of Science and Technology

Skills

Programming languages: Python, C, SQL

Data processing: Apache Spark, Pandas, NumPy

AI and RAG: LangChain, vector search, LLM integration

Web scraping and extraction: BeautifulSoup, HTML parsing

Certification

  • National Olympiad Training Team Member in Mathematics
  • First Prize in Mathematics - Provincial Academic Excellence Award

Timeline

Vietnamese Stock Data Ingestion

& Analytics System
10.2025 - 02.2026

Vietnamese Legal Q&A System

(RAG)
09.2025 - 01.2026

Computer Science

Hanoi University of Science and Technology
Nguyen Bao Duy