============================================================================== HPC Challenge Benchmark ============================================================================== 1 - Linpack ============================================================================== The Linpack benchmark is a measure of computer floating point execution efficiency and is the basis for the Top 500 supercomputer rankings. 1.1 - Usage ----------------------- Example Repo : Linpack_ .. _Linpack: https://netlib.org/benchmark/hpl/index.html Example Script:: cat < batchScript.sh #!/bin/bash #SBATCH --partition=arm-kunpeng920 #SBATCH --time=00:25:00 #SBATCH --ntasks=128 #SBATCH --nodes=1 module load GCC/12.1.0 OpenBLAS/0.3.21 OpenMPI/4.1.3 mpirun --allow-run-as-root -npernode 8 -x OMP_NUM_THREADS=16 ./xhpl EOF sbatch batchScript.sh 1.2 - Results ----------------------- .. list-table:: LINPACK-Test :widths: 25 25 25 55 25 :header-rows: 1 * - CPU - Compiler Combination - Number of Nodes - Number of Cores - Test Result * - arm-kunpeng920 - GCC/12.1.0 - 1 - 128 (16 processes, 8 threads per process) - 3.0346e+02Gflops * - arm-kunpeng920 - GCC/12.1.0 - 1 - 128 (128 processes, 1 threads per process) - 3.9471e+01Gflops * - arm-kunpeng920 - GCC/12.1.0 - 1 - 128 (8 processes, 16 threads per process) - 7.2177e+02Gflops * - arm-kunpeng920 - GCC/12.1.0 - 1 - 128 (4 processes, 32 threads per process) - 6.0959e+02Gflops * - arm-kunpeng920 - GCC/12.1.0 - 1 - 128 (64 processes, 2 threads per process) - 7.8394e+01Gflops 2 - STREAM: Sustainable Memory Bandwidth in High Performance Computers ============================================================================== 2.1 - Usage ----------------------------------- Example Repo : STREAM_ .. _STREAM: https://github.com/jeffhammond/STREAM Test Script:: cat < batchscript.sh #!/bin/bash #SBATCH --partition=arm-kunpeng920 #SBATCH --time=00:10:00 #SBATCH --ntasks=128 #SBATCH --nodes=1 module load GCC/12.1.0 OpenMPI gcc -fopenmp -O3 -DSTREAM_ARRAY_SIZE=80000000 -DNTIMES=20 -mcmodel=large stream.c -o stream_c ./stream_c EOF sbatch batchscript.sh 2.2 - Results ----------------------------------- Output:: *Not optimised for performance* ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 80000000 (elements), Offset = 0 (elements) Memory per array = 610.4 MiB (= 0.6 GiB). Total memory required = 1831.1 MiB (= 1.8 GiB). Each kernel will be executed 20 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Number of Threads requested = 128 Number of Threads counted = 128 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 10418 microseconds. (= 10418 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 141140.7 0.011449 0.009069 0.013668 Scale: 150675.2 0.010752 0.008495 0.013169 Add: 128894.4 0.016744 0.014896 0.020257 Triad: 143679.0 0.016380 0.013363 0.023785 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------