Results¶
This section provides detailed information on how to view, analyze, and interpret the results generated by the project. The results include training metrics, evaluation metrics, visualizations, and safety analysis.
1. Viewing Results¶
1.1. Weights & Biases Integration¶
The project uses Weights & Biases (W&B) to log and visualize training and evaluation metrics. You can view the results by logging into your W&B account and navigating to the project dashboard.
1.1.1. Metrics Logged¶
- Cumulative Reward: The total reward accumulated over the course of an episode.
- Average Reward: The mean reward per episode.
- Discounted Reward: The reward calculated with a discount factor applied to future rewards.
- Sample Efficiency: The number of unique states visited during an episode.
- Policy Entropy: The entropy of the policy, representing the uncertainty in action selection.
- Space Complexity: The total number of parameters in the model.
1.2. Local CSV Files¶
For each training and evaluation run, metrics are also saved locally as CSV files in the results directory. The following CSV files are generated:
training_metrics_<run_name>.csv: Contains training metrics for each episode.evaluation_metrics_<run_name>.csv: Contains evaluation metrics for each step within an episode.mean_allowed_infected.csv: Summarizes the mean allowed and infected values across all episodes.
These files can be found in the results subdirectory specific to each run, typically located at <results_directory>/<agent_type>/<run_name>/<timestamp>/.
2. Visualizations¶
All visualizations are saved in the results subdirectory for each run. The specific path is <results_directory>/<agent_type>/<run_name>/<timestamp>/.
2.1. Tolerance Interval Curve¶
The Tolerance Interval Curve shows the range of expected returns within a specified confidence level (alpha) and proportion (beta). This curve helps visualize the performance consistency of the model across different runs.
tolerance_interval_mean.png: The mean performance with the tolerance interval.tolerance_interval_median.png: The median performance with the tolerance interval.
2.2. Confidence Interval Curve¶
The Confidence Interval Curve shows the mean performance of the model with a confidence interval, typically at 95%. This visualization helps assess the reliability of the model's performance.
confidence_interval.png: The confidence interval for mean performance.
2.3. Safety Set Identification¶
The Safety Set Identification plot shows the states where the model's policy maintains safety constraints, such as keeping the number of infections below a threshold.
safety_set_plot_episode_<run_name>.png: A plot showing the safety set for a specific episode.
2.4. Lyapunov Function Behavior¶
The Lyapunov Function Behavior plots demonstrate the stability of the system, particularly in the Disease-Free Equilibrium (DFE) and Endemic Equilibrium (EE) regions.
lyapunov_loss_plot.png: The loss function during the training of the Lyapunov function.lyapunov_function_with_context_<method_label>.png: The Lyapunov function behavior with context for DFE and EE regions.
2.5. Transition Matrix¶
The Transition Matrix visualizes the probability of transitioning from one state to another based on the community risk values. This matrix is useful for understanding the model's behavior under varying conditions.
transition_matrix_<run_name>.png: The transition probability matrix based on community risk.
2.6. Q-Table Visualization (Q-learning only)¶
For Q-learning, a heatmap of the Q-table is generated to visualize the learned state-action values.
q_table_heatmap.png: Heatmap visualization of the Q-table.
2.7. States Visited Visualization¶
Heatmaps showing the frequency of visited states during training are generated for both DQN and Q-learning.
- Multiple PNG files with names containing
states_visitedare generated in the results subdirectory.
2.8. Evaluation Results Plot¶
For Q-learning, an evaluation plot is generated showing the allowed students, infected individuals, and community risk over time.
evaluation_plot_<run_name>.png: Plot of evaluation results.
3. Safety Analysis¶
3.1. Safety Set Conditions¶
The safety set conditions are logged in the safety_conditions_<run_name>.csv file. This file contains the following columns:
- Episode: The episode number.
- Infections > Threshold (%): The percentage of time the number of infections exceeded the threshold.
- Safety Condition Met (Infection): Indicates whether the infection safety condition was met.
- Allowed Students ≥ Threshold (%): The percentage of time the allowed students met or exceeded the threshold.
- Safety Condition Met (Attendance): Indicates whether the attendance safety condition was met.
3.2. Control Barrier Functions (CBFs)¶
The CBFs are used to ensure that the system maintains safety constraints over time. The verification of forward invariance is logged in the cbf_verification.txt file.
- CBF for Infections: Ensures that the number of infected individuals does not exceed the threshold.
- CBF for Attendance: Ensures that the number of allowed students meets the required threshold.
4. Final Results Summary¶
At the end of the evaluation process, a summary of the final results is saved in the final_results.txt file. This summary includes:
- Cumulative Reward across all episodes
- Average Reward per episode
- Safety condition percentages for infections and attendance
- Forward Invariance verification
- Stability assessment in the DFE and EE regions
This file provides a quick overview of the model's performance and safety compliance across the entire evaluation period.
Use the above results and visualizations to assess and refine your model. The combination of metrics, visualizations, and safety analysis will help you understand the model's strengths and weaknesses and guide further development. Remember to check the specific results subdirectory for each run to find all the generated files and visualizations.