Mohamed Ouijjane logo
  • Home
  • Projects

    Selected work, case studies, and builds.

    Certificates

    Verified learning and credentials.

    Resume

    Professional experience and skills.

  • Blogs
  • About
  • Contact
Resume
Mohamed Ouijjane logo
  • Home
  • Projects

    Selected work, case studies, and builds.

    Certificates

    Verified learning and credentials.

    Resume

    Professional experience and skills.

  • Blogs
  • About
  • Contact
Resume
Project screenshot 1
HomeProjectsCPU Grid - Distributed Monte Carlo Traffic Simulation

CPU Grid - Distributed Monte Carlo Traffic Simulation

A distributed simulation platform for accelerating Monte Carlo traffic analysis through a Java RMI master-worker architecture.

Distributed SystemsJava RMIGrid ComputingNext.jsMonte Carlo Simulation

Overview

CPU Grid is a high-performance distributed computing system designed to tackle computationally expensive Monte Carlo traffic simulations. By leveraging a master-worker architecture orchestrated via Java RMI, the system partitions complex simulation workloads into smaller, manageable chunks that are processed in parallel across multiple nodes. This approach significantly reduces execution time while maintaining simulation accuracy.

The Problem

Monte Carlo traffic simulations are notoriously computationally expensive, often requiring hours or days to run on a single machine. The challenge was not only to improve performance through parallelization but also to handle the complexities of distributed coordination, service discovery, fault tolerance, and providing a user-friendly interface for researchers to interact with the system without needing deep technical knowledge of the underlying infrastructure.

The Solution

The solution implements a robust grid computing platform where a central Master node acts as the orchestrator. It accepts simulation jobs from the web interface, partitions them into independent sub-tasks, and dynamically dispatches them to available Worker nodes registered in the RMI registry. The Workers execute the simulations in parallel and return partial results to the Master, which aggregates them into a final dataset. The entire process is monitored in real-time via a modern Next.js dashboard.

Architecture

A distributed master-worker grid computing architecture using Java RMI for service discovery and task orchestration, with a Next.js frontend for simulation control, monitoring, and result visualization.

How It Works

Step 1

Workers register themselves with the RMI Registry upon startup.

Step 2

User configures and launches a simulation via the Next.js Dashboard.

Step 3

Master node validates the request and splits the workload into chunks.

Step 4

Master dispatches chunks to available Workers via RMI.

Step 5

Workers execute their assigned simulation chunks in parallel.

Step 6

Partial results are returned to the Master for aggregation.

Step 7

Master compiles the final result and notifies the frontend.

Step 8

User views visualization and exports data from the dashboard.

Engineering Challenges

1

Maintaining data consistency and coordinating asynchronous jobs across distributed RMI-based nodes

2

Handling network faults and worker dropouts gracefully

3

Ensuring consistent service discovery across a dynamic network

4

Bridging the Java RMI backend with a REST-based Next.js frontend

5

Optimizing data serialization for large simulation result sets

Tech Stack

Java 17Java RMIMavenNext.jsTypeScriptChart.jsBashGit

Key Features

  • Distributed task scheduling and load balancing
  • Asynchronous job execution with non-blocking I/O
  • Real-time result aggregation and visualization
  • Live cluster monitoring (worker status, CPU usage)
  • Simulation history and persistent configuration
  • Remote node administration and health checks

Limitations

  • •Performance is dependent on network latency and stability
  • •Currently relies on in-memory storage for active jobs (limited persistence)
  • •Basic handling for severe network partitions (split-brain scenarios)

Future Roadmap

  • →Implement persistent database storage for historical job data
  • →Add stronger fault tolerance with checkpointing and resume capability
  • →Develop a more advanced adaptive scheduling algorithm
  • →Containerize nodes for easier cloud deployment (Docker/Kubernetes)
  • →Enhance observability with distributed tracing
Back to ProjectsView on GitHub