Pre-Alpha

All Images

Introduction

Resource Requirements

Figure 1

Figure 2

Scheduler Tools

Scaling Study

Figure 1

Speedup and efficiency of strong scaling example

Figure 2

Three snowmen in 800x800 with 128 samples per pixel

Figure 3

Three snowmen in 800x800 with 8192 samples per pixel

Figure 4

Speedup and efficiency of weak scaling example

Performance Overview

Figure 1

Diagram to visualize the data hierarchy of CPU architectures. Network, local disks, memory, and CPU caches have decreasing amounts of storage capacity, but increasing bandwidths and shorter latencies. Calculations occur in CPUs, possibly in multiple CPU cores, which may have multiple threads each, and even apply vectorized instructions.

Figure 2

"My Jobs" tab in the ClusterCockpit web UI

Figure 3

ClusterCockpit Job Info panel

Figure 4

Cluster Cockpit Footprint panel summarizing central job characteristics

Figure 5

Cluster Cockpit Roofline plot of a job

Figure 6

Select Metrics button in the ClusterCockpit job view

Figure 7

Linaro perf-report overview 1

Figure 8

ClusterCockpit cpu load per core

Figure 9

ClusterCockpit flops_any metric

Figure 10

Figure 11

Figure 12

Figure 13

Figure 14

Figure 15

Figure 16

Figure 17

Figure 18

Figure 19

Figure 20

Performance Reports also summarizes the applications behavior in terms of MPI calls, e.g. time spent in collective calls involving all processors, or point-to-point communications.

Figure 21

The I/O block summarizes measurements of interactions with the local file systems. Here, no I/O operations are affecting the applications performance at all.

Pinning

How to identify a bottleneck?

Performance of Accelerators

Next Steps