HowTo: Monitor your resource－GPU Burn Testing

This tutorial demonstrates how to use GPU stress test tools to check whether the GPU is working properly when the GPU is fully loaded.

The GPU is working normally if the final result is OK, while FAULTY means that there is a problem with the GPU.

If you don’t have an account yet, please refer to Sign up for TWSC.

Step 2. Create an Interactive Container

Please refer to Interactive Container to create an Interactive Container.
Please select TensorFlow for the image type, select an image version that supports tensorflow-21.11-tf2-py3:latest, and select 1 GPU for the hardware.

Step 3. Connect to the container, download the training program

Use Jupyter Notebook to connect to the container and open Terminal.

info

📖 See Using Jupyter Notebook

Enter the following command to download the NCHC_GitHub training program to the container.

git clone https://github.com/TW-NCHC/AI-Services.git

Step 4. Perform GPU Burn Testing

Enter the following command to access the Tutorial_Two directory.

cd AI-Services/Tutorial_Two

Enter the following command to download GPU_Burn program and start execution.

bash gpu_testing.sh

Step 5. Computing capability

View computing capability
The GPU used by the GPU container service is NVIDIA V100 32GB, which has powerful computing capabilities. The test running GPU-burn showed that the container has a computing capability of 13198 Gflop/s.

Monitor Interactive Container

a. Interactive Container monitoring: Utilization of CPU, GPU and memory

b. In the Terminal of Jupyter Notebook in the container, you can run the following command to monitor the GPU temperature and power.

nvidia-smi

GPU quantity Displayed in increments of number 0, the example in the figure below is 1 GPU
GPU temperature Displayed in Celsius, the example in the figure below is 31 degrees C
GPU power usage Displayed in wattage, the example in the figure below is 43W

HowTo: Monitor your resource－GPU Burn Testing

Step 1. Sign in to TWSC​

Step 2. Create an Interactive Container​

Step 3. Connect to the container, download the training program​

Step 4. Perform GPU Burn Testing​

Step 5. Computing capability​

Step 1. Sign in to TWSC

Step 2. Create an Interactive Container

Step 3. Connect to the container, download the training program

Step 4. Perform GPU Burn Testing

Step 5. Computing capability