Simulation Model¶
ncsim uses a discrete event simulation (DES) to model task execution and data transfer across a networked computing environment. This page covers the event model, time management, compute and transfer calculations, and bandwidth sharing mechanics.
Discrete Event Simulation Basics¶
A discrete event simulation advances time by jumping from one event to the next, rather than stepping through fixed time increments. The core loop is:
- Pop the highest-priority event from the queue.
- Advance the simulation clock to that event's time.
- Execute the event handler, which may schedule new events.
- Repeat until the queue is empty.
No fixed time step
Unlike tick-based simulations, DES skips idle periods entirely. A simulation with two events at t=0.0 and t=100.0 processes only those two events, regardless of the 100-second gap between them.
This approach gives ncsim exact timing for task completions and data transfers without discretization error.
Event Types¶
ncsim defines six core event types, each with a fixed priority value. Lower priority values are processed first when multiple events occur at the same simulation time.
| Event Type | Priority | Description |
|---|---|---|
DAG_INJECT |
0 | A new DAG arrives and is handed to the scheduler |
TASK_COMPLETE |
1 | A task finishes execution on its assigned node |
TRANSFER_COMPLETE |
2 | A data transfer finishes on its link path |
TASK_READY |
3 | All predecessors of a task are satisfied |
TASK_START |
4 | A task begins execution on a node |
TRANSFER_START |
5 | A data transfer begins on a link |
Why completions come before starts
At the same simulation time, completions must be processed before starts. When a task completes, it frees its node and triggers output transfers. Those transfers may complete instantly (same-node), making a downstream task ready. If starts were processed first, the downstream task could miss being scheduled at the correct time.
Four additional event types are reserved for future extensions:
MOBILITY_UPDATE (100), LINK_STATE_CHANGE (101),
RESCHEDULE_TRIGGER (102), and TDMA_SLOT_START (103). Their handlers
currently no-op.
Event Ordering¶
Events are stored in a min-heap (priority queue) and ordered by a three-element sort key:
| Component | Purpose |
|---|---|
round(sim_time, 6) |
Microsecond precision avoids floating-point comparison issues |
event_type.priority |
Ensures correct causal ordering at the same time instant |
event_id |
Monotonically increasing counter guarantees FIFO order for ties |
This three-level ordering guarantees determinism: given the same inputs and the same seed, the simulation produces an identical event sequence every time.
Event Cancellation¶
The queue supports lazy cancellation. When a transfer's completion time is recalculated (due to bandwidth sharing changes), the old completion event is added to a cancelled set. When the queue pops a cancelled event, it silently discards it and pops the next one. This avoids the cost of heap removal while maintaining correctness.
Determinism¶
Reproducibility guarantee
Same scenario YAML + same --seed = identical event sequence,
identical makespan, identical trace output.
Determinism comes from three sources:
- Microsecond rounding -- All times are rounded to 6 decimal places
via
round_time(), eliminating platform-dependent floating-point drift. - Priority-based ordering -- Event type priorities impose a fixed processing order at each time instant.
- Monotonic event IDs -- A global counter breaks all remaining ties in insertion order.
Compute Model¶
Each node has a compute_capacity measured in compute units per second.
Each task has a compute_cost measured in compute units. The execution
time is:
For example, a task with compute_cost: 200 on a node with
compute_capacity: 100 takes 2.0 seconds.
Node Queuing¶
Each node runs at most one task at a time (single-server model). When a task becomes ready and its assigned node is busy, the task enters a FIFO queue on that node. Tasks are dequeued and started in arrival order when the node becomes idle.
No preemption
A running task always completes before any queued task starts. The
TaskState dataclass tracks compute_remaining to support future
preemptive scheduling, but the current engine does not interrupt
running tasks.
Task Lifecycle¶
A task moves through these statuses:
| Status | Meaning |
|---|---|
PENDING |
Waiting for predecessor tasks to complete |
READY |
All predecessors complete; waiting for node availability |
QUEUED |
In the node's FIFO queue (node is busy) |
RUNNING |
Executing on the node |
COMPLETED |
Finished execution |
Transfer Model¶
When a task completes, the engine schedules data transfers for each outgoing edge in the DAG. Transfers move data from the source task's node to the destination task's node over a network path.
Local Transfers¶
If the source and destination tasks are assigned to the same node, no network transfer occurs. The predecessor is marked complete immediately at zero cost.
Network Transfers¶
For tasks on different nodes, the routing model determines a path (sequence of link IDs). The transfer time is:
Where:
data_sizeis the edge'sdata_sizein MB.effective_bandwidthis the bottleneck bandwidth across all links in the path, after accounting for per-link fair sharing and interference.total_latencyis the sum of all link latencies along the path (store-and-forward model).
Multi-Hop Paths¶
For paths with more than one link:
- Bottleneck bandwidth = minimum bandwidth across all links in the path.
- Total latency = sum of latencies across all links (each hop adds its propagation delay).
The store-and-forward model means data must fully arrive at each intermediate node before being forwarded to the next hop.
Bandwidth Sharing¶
When multiple transfers use the same link simultaneously, they share the
link's bandwidth equally. If N transfers share a link with base bandwidth
B (after interference), each gets B / N.
Dynamic recalculation
When a transfer starts or completes on a link, the engine recalculates the completion times of all other active transfers on that link. Old completion events are cancelled and replaced with new ones reflecting the updated bandwidth allocation.
The recalculation accounts for partial progress. When bandwidth changes mid-transfer:
- The engine computes how much data was transferred at the previous rate during the elapsed time.
- It subtracts that from
data_remaining. - It schedules a new completion event based on the remaining data and the new effective rate.
For multi-hop transfers, the effective bandwidth is the minimum across all links in the path, each independently sharing its bandwidth among its concurrent transfers.
Interference Effects on Bandwidth¶
When an InterferenceModel is active, bandwidth sharing becomes a
two-level calculation:
- Interference factor: The model returns a factor f in (0, 1] for each link based on other active links in the network (e.g., nearby wireless transmitters).
- Per-link fair sharing: N concurrent transfers on the link each get
(B * f) / N.
Interference factors are recomputed whenever any transfer starts or completes, and affected transfers have their completion times recalculated.
Example: Two-Task Execution¶
The following diagram shows the event sequence for a simple scenario with two tasks (T0 and T1) where T1 depends on T0 and they are assigned to different nodes.
sequenceDiagram
participant EQ as Event Queue
participant Engine as ExecutionEngine
Note over EQ,Engine: t=0.000000
EQ->>Engine: DAG_INJECT (dag_1)
Note right of Engine: Scheduler assigns T0->n0, T1->n1<br/>Schedules TASK_READY for root task T0
EQ->>Engine: TASK_READY (T0)
Note right of Engine: Node n0 is idle<br/>Schedules TASK_START for T0
EQ->>Engine: TASK_START (T0, n0)
Note right of Engine: runtime = 100 / 100 = 1.0s<br/>Schedules TASK_COMPLETE at t=1.0
Note over EQ,Engine: t=1.000000
EQ->>Engine: TASK_COMPLETE (T0, n0)
Note right of Engine: T0 done. Edge T0->T1 has data_size=50 MB<br/>T0 on n0, T1 on n1: network transfer<br/>Schedules TRANSFER_START
EQ->>Engine: TRANSFER_START (T0->T1, link l01)
Note right of Engine: effective_bw = 100 MB/s, latency = 0.001s<br/>transfer_time = 50/100 + 0.001 = 0.501s<br/>Schedules TRANSFER_COMPLETE at t=1.501
Note over EQ,Engine: t=1.501000
EQ->>Engine: TRANSFER_COMPLETE (T0->T1)
Note right of Engine: T1's last predecessor satisfied<br/>Schedules TASK_READY for T1
EQ->>Engine: TASK_READY (T1)
Note right of Engine: Node n1 is idle<br/>Schedules TASK_START for T1
EQ->>Engine: TASK_START (T1, n1)
Note right of Engine: runtime = 200 / 50 = 4.0s<br/>Schedules TASK_COMPLETE at t=5.501
Note over EQ,Engine: t=5.501000
EQ->>Engine: TASK_COMPLETE (T1, n1)
Note right of Engine: No successors. DAG complete.<br/>Makespan = 5.501000s
Tracing this locally
Run this exact scenario with:
The verbose flag (-v) logs every event to the console. The
output directory will contain trace.jsonl and metrics.json.
Output Files¶
trace.jsonl¶
One JSON object per line, in event order. Every record has:
| Field | Type | Description |
|---|---|---|
seq |
int | Monotonically increasing sequence number |
sim_time |
float | Simulation time in seconds (6 decimal places) |
type |
string | Event type (dag_inject, task_start, task_complete, transfer_start, transfer_complete, sim_start, sim_end) |
Additional fields vary by event type (e.g., dag_id, task_id,
node_id, link_id, duration, data_size).
metrics.json¶
A single JSON object with summary metrics:
| Field | Type | Description |
|---|---|---|
scenario |
string | Scenario file name |
seed |
int | Random seed used |
makespan |
float | Time of last task completion |
total_tasks |
int | Number of tasks across all DAGs |
total_transfers |
int | Number of data transfer edges |
total_events |
int | Total events processed |
status |
string | "completed" or "error" |
node_utilization |
object | Per-node busy_time / makespan (0.0 to 1.0) |
link_utilization |
object | Per-link data_transferred / (bandwidth * makespan) |