Custom Experiments¶
This guide shows how to design your own parameter sweep experiments using ncsim. Whether you are comparing schedulers, varying RF parameters, or scaling network size, the pattern is the same: generate scenarios, run ncsim via subprocess (or import it as a library), and collect results.
Experiment Design Principles¶
- Define your hypothesis. For example: "HEFT outperforms CPOP on heterogeneous networks with high communication-to-computation ratio."
- Identify independent variables -- what you vary (scheduler, network size, data size, RF parameters, etc.).
- Identify dependent variables -- what you measure. Usually makespan, but you can also extract per-transfer durations, link utilization, or task placement from the trace file.
- Control variables -- keep everything else fixed. Use the same random seed across runs to eliminate scheduling variability from a single trial.
- Replicate with multiple seeds to average out stochastic effects.
Template: Parameter Sweep Script¶
The following template demonstrates the standard pattern used by all ncsim
experiment scripts. It invokes ncsim as a subprocess with CLI overrides, then
reads metrics.json to extract results.
#!/usr/bin/env python3
"""Template for ncsim parameter sweep experiments."""
import json
import os
import subprocess
import sys
from pathlib import Path
OUTDIR = "/tmp/ncsim_my_experiment"
def run_scenario(yaml_path, output_dir, **overrides):
"""Run ncsim with optional CLI overrides.
Args:
yaml_path: Path to scenario YAML file.
output_dir: Directory for trace and metrics output.
**overrides: CLI flag overrides (e.g., scheduler="heft", seed=42).
Returns:
output_dir on success, None on failure.
"""
cmd = [
sys.executable, "-m", "ncsim",
"--scenario", str(yaml_path),
"--output", str(output_dir),
]
for key, value in overrides.items():
cmd.extend([f"--{key.replace('_', '-')}", str(value)])
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
print(f"ERROR: {result.stderr[-200:]}")
return None
return output_dir
def get_makespan(output_dir):
"""Extract makespan from metrics.json."""
metrics_path = os.path.join(output_dir, "metrics.json")
with open(metrics_path) as f:
return json.load(f)["makespan"]
def main():
os.makedirs(OUTDIR, exist_ok=True)
# Example: sweep schedulers across multiple seeds
schedulers = ["heft", "cpop", "round_robin"]
seeds = [1, 2, 3, 4, 5]
results = {}
for sched in schedulers:
makespans = []
for seed in seeds:
outdir = os.path.join(OUTDIR, f"{sched}_s{seed}")
run_scenario(
"scenarios/parallel_spread.yaml", outdir,
scheduler=sched, seed=seed
)
makespans.append(get_makespan(outdir))
results[sched] = makespans
avg = sum(makespans) / len(makespans)
print(f"{sched}: avg makespan = {avg:.3f}s")
if __name__ == "__main__":
main()
CLI overrides
Any scenario YAML parameter can be overridden from the command line. Common
overrides include --scheduler, --routing, --interference, --seed,
--tx-power, --freq, --path-loss-exponent, and --wifi-standard. See
the CLI Reference for the full list.
Example Experiments¶
1. Scheduler Comparison¶
Vary the scheduler across multiple scenarios and seeds to find which algorithm performs best under different workload characteristics.
schedulers = ["heft", "cpop", "round_robin"]
scenarios = [
"scenarios/parallel_spread.yaml",
"scenarios/demo_simple.yaml",
"scenarios/bandwidth_contention.yaml",
]
seeds = range(1, 11) # 10 seeds for statistical significance
for scenario in scenarios:
for sched in schedulers:
for seed in seeds:
outdir = f"/tmp/sched_cmp/{Path(scenario).stem}/{sched}_s{seed}"
run_scenario(scenario, outdir, scheduler=sched, seed=seed)
2. Interference Radius Sweep¶
Vary the proximity interference radius from 5m to 50m to understand how interference range affects makespan.
radii = [5, 10, 15, 20, 25, 30, 40, 50]
for radius in radii:
outdir = f"/tmp/radius_sweep/r{radius}"
run_scenario(
"scenarios/interference_test.yaml", outdir,
interference="proximity",
interference_radius=radius,
)
3. WiFi Parameter Sensitivity¶
Vary TX power, frequency, or path loss exponent to study their impact on network capacity and makespan.
# TX power sweep
for tx_power in [10, 15, 20, 23]:
outdir = f"/tmp/txpower_sweep/p{tx_power}"
run_scenario(
"scenarios/wifi_test.yaml", outdir,
interference="csma_bianchi",
tx_power=tx_power,
)
# Path loss exponent sweep
for n in [2.0, 2.5, 3.0, 3.5, 4.0]:
outdir = f"/tmp/pathloss_sweep/n{n}"
run_scenario(
"scenarios/wifi_test.yaml", outdir,
interference="csma_bianchi",
path_loss_exponent=n,
)
4. Network Scaling¶
Programmatically generate larger networks and measure how scheduler performance
scales with node count. See run_routing_comparison.py for a complete example
of generating grid topologies in code.
5. Data Size Sensitivity¶
Vary data_size on DAG edges to find the crossover point where widest_path
routing begins to outperform shortest_path (bandwidth-dominated vs
latency-dominated workloads).
Using ncsim as a Library¶
For tighter integration or custom analysis, you can import ncsim modules directly instead of invoking the CLI:
from ncsim.models.wifi import (
RFConfig,
snr_to_rate_mbps,
rate_mbps_to_MBps,
bianchi_efficiency,
carrier_sensing_range,
received_power_dBm,
snr_dB,
sinr_dB,
path_loss_dB,
)
# Compute PHY rate at a given distance
rf = RFConfig(tx_power_dBm=20, freq_ghz=5.0, path_loss_exponent=3.0)
distance = 40.0 # meters
rx_power = received_power_dBm(rf.tx_power_dBm, distance, rf)
snr = snr_dB(rx_power, rf.noise_floor_dBm)
rate = snr_to_rate_mbps(snr, rf.wifi_standard, rf.channel_width_mhz)
print(f"PHY rate at {distance}m: {rate:.1f} Mbps ({rate_mbps_to_MBps(rate):.2f} MB/s)")
# Carrier sensing range
cs_range = carrier_sensing_range(rf)
print(f"Carrier sensing range: {cs_range:.1f}m")
# Bianchi MAC efficiency for N contending stations
for n in range(1, 6):
eta = bianchi_efficiency(n)
print(f" eta({n}) = {eta:.4f}, per-station share = {eta/n:.4f}")
This is the approach used by run_interference_verification.py to compute
analytical predictions that are compared against simulation output.
Library vs CLI
Using ncsim as a library gives you direct access to the WiFi model functions for analytical calculations, but the full simulation pipeline (scenario loading, scheduling, trace writing) is easiest to drive through the CLI.
Analyzing Results¶
Collect results from multiple runs into a structured format for analysis. Here is a pattern using pandas DataFrames:
import json
import os
import pandas as pd
def collect_results(base_dir):
"""Scan output directories and build a DataFrame of results."""
rows = []
for run_name in sorted(os.listdir(base_dir)):
metrics_path = os.path.join(base_dir, run_name, "metrics.json")
if not os.path.exists(metrics_path):
continue
with open(metrics_path) as f:
metrics = json.load(f)
rows.append({
"run": run_name,
"makespan": metrics["makespan"],
"status": metrics.get("status", "unknown"),
"total_events": metrics.get("total_events", 0),
})
return pd.DataFrame(rows)
df = collect_results("/tmp/ncsim_my_experiment")
print(df.groupby("run")["makespan"].describe())
Generating Comparison Plots¶
import matplotlib.pyplot as plt
# Example: bar chart comparing schedulers
fig, ax = plt.subplots(figsize=(8, 5))
for sched in ["heft", "cpop", "round_robin"]:
subset = df[df["run"].str.startswith(sched)]
ax.bar(sched, subset["makespan"].mean(),
yerr=subset["makespan"].std(), capsize=5)
ax.set_ylabel("Makespan (s)")
ax.set_title("Scheduler Comparison")
fig.savefig("/tmp/ncsim_my_experiment/scheduler_comparison.png", dpi=150)
Extracting Trace Data¶
For more detailed analysis, parse the trace.jsonl file to extract per-task
and per-transfer timing:
import json
def parse_trace(trace_path):
"""Parse trace.jsonl into task and transfer records."""
tasks = {}
transfers = {}
with open(trace_path) as f:
for line in f:
event = json.loads(line)
etype = event["type"]
if etype == "task_start":
tasks[event["task_id"]] = {
"node": event["node_id"],
"start": event["sim_time"],
}
elif etype == "task_complete":
tasks[event["task_id"]]["end"] = event["sim_time"]
tasks[event["task_id"]]["duration"] = event["duration"]
elif etype == "transfer_start":
key = (event["from_task"], event["to_task"])
transfers[key] = {
"link": event["link_id"],
"data_size": event["data_size"],
"start": event["sim_time"],
}
elif etype == "transfer_complete":
key = (event["from_task"], event["to_task"])
transfers[key]["end"] = event["sim_time"]
transfers[key]["duration"] = event["duration"]
return tasks, transfers
This gives you access to individual task execution times, transfer durations, link assignments, and scheduling decisions for any analysis you need.
Best Practices¶
Checklist for reliable experiments
- Use fixed seeds for reproducibility. Every run should specify
--seed Nso results can be exactly reproduced. - Run multiple seeds (at least 5-10) to average out scheduling variability and get statistically meaningful results.
- Save all output. Each output directory contains a copy of
scenario.yaml, enabling exact reproduction of any run. - Use meaningful output directory names. Encode the variable values in
the directory path (e.g.,
heft_s42,txpower_20_freq_5). - Print progress indicators for long sweeps so you know which run is currently executing.
- Check for failures. Always verify that
metrics.jsonexists and thatstatusis not"error"before using the makespan value. - Control one variable at a time. When comparing schedulers, keep routing, interference, and all RF parameters constant (and vice versa).
- Use the same scenario YAML across compared runs, varying only the CLI override for the parameter under study.
Directory Structure Convention¶
A well-organized experiment output looks like this:
/tmp/ncsim_my_experiment/
heft_s1/
scenario.yaml
trace.jsonl
metrics.json
heft_s2/
...
cpop_s1/
...
figures/
scheduler_comparison.png
Scaling to Large Sweeps¶
For sweeps with hundreds of runs, consider:
- Parallelizing with Python's
concurrent.futures.ProcessPoolExecutor(each ncsim invocation is independent). - Skipping completed runs by checking if
metrics.jsonalready exists before launching a subprocess. - Writing results incrementally to a CSV or JSON file after each run completes, so partial results survive interruptions.
from concurrent.futures import ProcessPoolExecutor, as_completed
def run_one(args):
yaml_path, outdir, overrides = args
if os.path.exists(os.path.join(outdir, "metrics.json")):
return outdir # skip completed
return run_scenario(yaml_path, outdir, **overrides)
jobs = []
for sched in schedulers:
for seed in range(1, 101):
outdir = f"/tmp/large_sweep/{sched}_s{seed}"
jobs.append(("scenarios/parallel_spread.yaml", outdir,
{"scheduler": sched, "seed": seed}))
with ProcessPoolExecutor(max_workers=4) as pool:
futures = [pool.submit(run_one, job) for job in jobs]
for f in as_completed(futures):
result = f.result()
if result:
print(f"Completed: {result}")