# Reliable Benchmarking Benchmark results can vary significantly due to system-level factors. This guide covers best practices for obtaining reproducible and reliable measurements. ## What ZeroPyBench Does Automatically zeropybench already implements several best practices: - **Multiple repetitions with median**: Reduces the impact of outliers - **Auto-scaling**: Automatically determines the number of iterations for reliable measurements - **JAX compilation separation**: Reports compilation time separately from execution time - **Proper synchronization**: Uses `block_until_ready()` for accurate JAX timing ## CPU Benchmarking ### Disable Frequency Scaling Modern CPUs dynamically adjust their frequency based on load and temperature. This can cause significant variance in benchmark results. ```bash # Set the CPU governor to performance mode (requires root) sudo cpupower frequency-set -g performance # Verify the setting cpupower frequency-info ``` To revert to the default: ```bash sudo cpupower frequency-set -g powersave # or ondemand ``` ### Disable Turbo Boost Turbo boost can cause inconsistent results as the CPU may throttle under sustained load. **Intel CPUs:** ```bash # Disable turbo boost echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo # Re-enable turbo boost echo 0 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo ``` **AMD CPUs:** ```bash # Disable turbo boost echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost # Re-enable turbo boost echo 1 | sudo tee /sys/devices/system/cpu/cpufreq/boost ``` :::{warning} These settings require root privileges and will be reset after reboot unless made persistent. ::: ### CPU Isolation Isolate CPU cores to prevent the OS scheduler from interrupting your benchmark. **Runtime isolation with taskset:** ```bash # Run on cores 0-3 only taskset -c 0-3 python benchmark.py ``` **Boot-time isolation (more effective):** :::{warning} Modifying GRUB parameters incorrectly can prevent your system from booting. Always keep a backup boot option. ::: Add to your kernel boot parameters in `/etc/default/grub`: ``` GRUB_CMDLINE_LINUX="isolcpus=0-3 nohz_full=0-3" ``` Then update GRUB and reboot: ```bash sudo update-grub sudo reboot ``` ### Process Priority Increase the priority of your benchmark process: ```bash # Run with highest priority (requires root) sudo nice -n -20 python benchmark.py # Or with real-time scheduling sudo chrt -f 99 python benchmark.py ``` ### Disable Hyperthreading Hyperthreading can introduce variability. Disable it in BIOS or at runtime: ```bash # Disable hyperthreading (example for 8 physical cores with HT) echo 0 | sudo tee /sys/devices/system/cpu/cpu{8..15}/online ``` ## GPU Benchmarking (NVIDIA) ### Enable Persistence Mode Keeps the GPU initialized between runs, reducing startup overhead: ```bash sudo nvidia-smi -pm 1 ``` ### Lock GPU Clocks Prevent dynamic frequency scaling on the GPU: ```bash # Query supported clocks nvidia-smi -q -d SUPPORTED_CLOCKS # Lock graphics clocks (example: 1500 MHz) sudo nvidia-smi -lgc 1500,1500 # Lock memory clocks (example: 5001 MHz) sudo nvidia-smi -lmc 5001 # Reset to default sudo nvidia-smi -rgc sudo nvidia-smi -rmc ``` ### Exclusive Process Mode Ensure only one process can use the GPU: ```bash # Set exclusive process mode sudo nvidia-smi -c EXCLUSIVE_PROCESS # Reset to default (shared mode) sudo nvidia-smi -c DEFAULT ``` ### Disable ECC Memory (Optional) On GPUs with ECC memory, disabling it can provide ~10% more memory bandwidth. This is a persistent setting that requires a reboot: ```bash # Check current ECC status nvidia-smi -q | grep -i ecc # Disable ECC (requires reboot) sudo nvidia-smi -e 0 ``` ### Monitor GPU State Before running benchmarks, verify GPU state: ```bash # Check temperatures, clocks, and utilization nvidia-smi -q -d PERFORMANCE # Monitor in real-time watch -n 1 nvidia-smi ``` ## XLA GPU Autotuning When benchmarking JAX code on GPU, XLA may spend a prohibitive time autotuning kernels (matrix multiplications, convolutions, etc.) during compilation. This autotuning overhead happens for every compilation, so when comparing methods where only some benefit from autotuning, the compilation times may not be comparable. The autotune level maps to the `--xla_gpu_autotune_level` XLA flag: - **0**: No autotuning. Fastest compilation, may use suboptimal kernels. - **1–4**: Increasing autotuning effort. Higher levels try more algorithm variants, increasing compilation time but potentially finding faster kernels. :::{important} XLA reads the `XLA_FLAGS` environment variable **once**, when the JAX backend is initialized (typically at the first JAX operation). Changing it at runtime has no effect. It must be set **before importing JAX**: ::: ```python import os os.environ['XLA_FLAGS'] = '--xla_gpu_autotune_level=0' import jax # backend initialization reads XLA_FLAGS here ``` Or from the shell: ```bash XLA_FLAGS=--xla_gpu_autotune_level=0 python benchmark.py ``` ## Environment Variables ### JAX-specific ```bash # Disable JAX memory preallocation (useful for memory profiling) export XLA_PYTHON_CLIENT_PREALLOCATE=false # Set specific GPU export CUDA_VISIBLE_DEVICES=0 # Disable JAX compilation cache (for cold-start benchmarks) export JAX_ENABLE_COMPILATION_CACHE=false ``` ### General Python ```bash # Disable Python's hash randomization for reproducibility export PYTHONHASHSEED=0 ``` ## Quick Setup Script Here's a script that applies common optimizations: ```bash #!/bin/bash # setup_benchmark_env.sh - Run as root set -e echo "Setting up benchmark environment..." # CPU optimizations cpupower frequency-set -g performance echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo 2>/dev/null || \ echo 0 > /sys/devices/system/cpu/cpufreq/boost 2>/dev/null || \ echo "Could not disable turbo boost" # GPU optimizations (if NVIDIA GPU present) if command -v nvidia-smi &> /dev/null; then nvidia-smi -pm 1 # Optionally lock clocks here fi echo "Benchmark environment ready." echo "Run your benchmark with: taskset -c 0-3 nice -n -20 python benchmark.py" ``` ## Verification Checklist Before running benchmarks, verify: - [ ] CPU governor is set to `performance` - [ ] Turbo boost is disabled - [ ] GPU clocks are locked (for GPU benchmarks) - [ ] No other intensive processes are running - [ ] System temperature is stable - [ ] Sufficient warm-up iterations have been run ## Interpreting Results :::{tip} Use `verbose=True` to inspect the actual code being benchmarked and verify it matches your expectations. ::: Even with all optimizations, some variance is expected: - **< 1% variance**: Excellent, highly reproducible - **1-5% variance**: Good, typical for well-controlled environments - **5-10% variance**: Acceptable, may indicate some system noise - **> 10% variance**: Investigate system configuration zeropybench reports the interquartile range (IQR) as a percentage, which helps identify unstable measurements.