Output Files¶

Every ncsim run produces three files in the --output directory. Together, these files form a self-contained record that can fully reproduce and analyze any simulation run.

output/
  scenario.yaml      # Copy of input scenario with all defaults filled in
  trace.jsonl         # One JSON object per line for every discrete event
  metrics.json        # Summary statistics for the run

scenario.yaml¶

A verbatim copy of the input scenario YAML file, placed in the output directory for convenience. This ensures that every output folder is self-contained -- you can re-run the exact same simulation from the output directory alone.

# Re-run from a previous output folder
ncsim --scenario output/my_run/scenario.yaml --output output/my_run_v2

Self-contained output folders

Copying the scenario into the output directory means you never lose track of which configuration produced a given set of results, even if you later modify the original scenario file.

trace.jsonl¶

The trace file records every discrete event that occurred during the simulation, one JSON object per line (JSON Lines format). Events are written in chronological order with monotonically increasing sequence numbers.

Example Trace¶

{"seq":0,"sim_time":0.0,"type":"sim_start","trace_version":"1.0","seed":42,"scenario":"demo_simple.yaml"}
{"seq":1,"sim_time":0.0,"type":"dag_inject","dag_id":"dag_1","task_ids":["T0","T1"]}
{"seq":2,"sim_time":0.0,"type":"task_scheduled","dag_id":"dag_1","task_id":"T0","node_id":"n0"}
{"seq":3,"sim_time":0.0,"type":"task_start","dag_id":"dag_1","task_id":"T0","node_id":"n0"}
{"seq":4,"sim_time":0.0,"type":"task_scheduled","dag_id":"dag_1","task_id":"T1","node_id":"n0"}
{"seq":5,"sim_time":1.0,"type":"task_complete","dag_id":"dag_1","task_id":"T0","node_id":"n0","duration":1.0}
{"seq":6,"sim_time":1.0,"type":"transfer_start","dag_id":"dag_1","from_task":"T0","to_task":"T1","link_id":"l01","data_size":50}
{"seq":7,"sim_time":1.501,"type":"transfer_complete","dag_id":"dag_1","from_task":"T0","to_task":"T1","link_id":"l01","duration":0.501}
{"seq":8,"sim_time":1.501,"type":"task_start","dag_id":"dag_1","task_id":"T1","node_id":"n0"}
{"seq":9,"sim_time":3.501,"type":"task_complete","dag_id":"dag_1","task_id":"T1","node_id":"n0","duration":2.0}
{"seq":10,"sim_time":3.501,"type":"sim_end","status":"completed","makespan":3.501,"total_events":10}

Event Type Reference¶

Event Type	Key Fields	Description
`sim_start`	`trace_version`, `seed`, `scenario`, `scenario_hash`	Simulation begins. Always the first event (`seq: 0`).
`dag_inject`	`dag_id`, `task_ids`	A DAG is injected into the simulation at the specified `sim_time`.
`task_scheduled`	`dag_id`, `task_id`, `node_id`	The scheduler assigns a task to a compute node.
`task_start`	`dag_id`, `task_id`, `node_id`	A task begins executing on its assigned node.
`task_complete`	`dag_id`, `task_id`, `node_id`, `duration`	A task finishes execution. `duration` is wall-clock compute time.
`transfer_start`	`dag_id`, `from_task`, `to_task`, `link_id`, `data_size`	A data transfer begins between two tasks. `data_size` is in MB. May include `route` for multi-hop paths.
`transfer_complete`	`dag_id`, `from_task`, `to_task`, `link_id`, `duration`	A data transfer finishes. `duration` is total transfer time in seconds. May include `route` for multi-hop paths.
`sim_end`	`status`, `makespan`, `total_events`	Simulation complete. Always the last event.

Common Fields¶

Every event includes these fields:

Field	Type	Description
`seq`	int	Monotonically increasing sequence number, starting at 0
`sim_time`	float	Simulation time in seconds when the event occurred
`type`	string	Event type identifier (see table above)

Time precision

All sim_time and duration values are rounded to microsecond precision (6 decimal places) to avoid floating-point drift across long simulations.

metrics.json¶

A JSON file containing summary statistics for the simulation run.

Example¶

{
  "scenario": "demo_simple.yaml",
  "seed": 42,
  "makespan": 3.501,
  "total_tasks": 2,
  "total_transfers": 1,
  "total_events": 10,
  "status": "completed",
  "node_utilization": {
    "n0": 0.857,
    "n1": 0.0
  },
  "link_utilization": {
    "l01": 0.143
  }
}

Field Reference¶

Field	Type	Description
`scenario`	string	Name of the input scenario YAML file
`seed`	int	Random seed used for this run
`makespan`	float	Total simulation time from first event to last task completion (seconds)
`total_tasks`	int	Number of tasks across all DAGs
`total_transfers`	int	Number of data-dependency edges across all DAGs
`total_events`	int	Total number of discrete events in the trace
`status`	string	`"completed"` on success, `"error"` on failure
`node_utilization`	object	Per-node utilization ratio (0.0--1.0). Computed as total busy time divided by makespan.
`link_utilization`	object	Per-link utilization ratio (0.0--1.0). Computed as total transfer time divided by makespan.
`error_message`	string	Present only when `status` is `"error"`. Describes the failure.

When the WiFi interference model (csma_clique or csma_bianchi) is active, additional fields are included:

Field	Type	Description
`rf_config`	object	Full RF configuration used (tx power, frequency, path loss exponent, etc.)
`carrier_sensing_range_m`	float	Computed carrier sensing range in meters
`link_phy_rates_MBps`	object	Per-link PHY data rate in MB/s before contention adjustment
`max_clique_sizes`	object	Per-link maximum clique size from the conflict graph

Working with Output Files¶

Loading trace.jsonl in Python¶

import json

def load_trace(path):
    """Load all events from a trace file."""
    events = []
    with open(path) as f:
        for line in f:
            events.append(json.loads(line))
    return events

events = load_trace("output/my_run/trace.jsonl")
print(f"Total events: {len(events)}")
print(f"Makespan: {events[-1]['makespan']}")

Filtering events by type¶

events = load_trace("output/my_run/trace.jsonl")

# Get all task completion events
completions = [e for e in events if e["type"] == "task_complete"]
for c in completions:
    print(f"  {c['task_id']} on {c['node_id']}: {c['duration']:.3f}s")

Loading metrics.json in Python¶

import json

with open("output/my_run/metrics.json") as f:
    metrics = json.load(f)

print(f"Makespan: {metrics['makespan']:.3f}s")
print(f"Status: {metrics['status']}")

# Print node utilization
for node, util in metrics["node_utilization"].items():
    print(f"  {node}: {util:.1%}")

Comparing two runs¶

import json

def load_metrics(path):
    with open(path) as f:
        return json.load(f)

m1 = load_metrics("output/heft_run/metrics.json")
m2 = load_metrics("output/cpop_run/metrics.json")

speedup = m1["makespan"] / m2["makespan"]
print(f"HEFT makespan:  {m1['makespan']:.3f}s")
print(f"CPOP makespan:  {m2['makespan']:.3f}s")
print(f"CPOP speedup:   {speedup:.2f}x")

Computing transfer overhead from trace¶

events = load_trace("output/my_run/trace.jsonl")

total_compute = sum(
    e["duration"] for e in events if e["type"] == "task_complete"
)
total_transfer = sum(
    e["duration"] for e in events if e["type"] == "transfer_complete"
)

overhead = total_transfer / (total_compute + total_transfer) * 100
print(f"Compute time:    {total_compute:.3f}s")
print(f"Transfer time:   {total_transfer:.3f}s")
print(f"Transfer overhead: {overhead:.1f}%")

Large trace files

For scenarios with many DAGs or tasks, trace files can grow large. Consider streaming the JSONL file line-by-line rather than loading the entire file into memory:

import json

with open("output/big_run/trace.jsonl") as f:
    for line in f:
        event = json.loads(line)
        if event["type"] == "task_complete":
            # Process incrementally
            pass