FAQs | Taiwania2 (HPC CLI)
Login
Q1. How to login to Taiwania2 (HPC CLI)?
Login using SSH connection with the login node of ln01.twcc.ai. Enter your system account and system password to complete the login. For detailed steps, refer to this document for more information.
Q2. What is the available open-source software for the SSH connection to TWSC's CCS, VCS and HPC?
Available open-source software includes MobaXterm, PuTTY, VSCode, etc.
Q3. Failed to log in recently, but able to log in to Taiwania2 (HPC CLI) half a year ago?
The validity period of system password is 180 days. Please reset the expired password in Member Center before you can re-login successfully.
Resources allocation and monitoring
Q1. Can I run multi-node parallel jobs on Taiwania2 (HPC CLI)?
You can request resources using Slurm and run multi-node parallel jobs with equal distribution of high workloads to improve the processing efficiency.
Q2. The node will be given by the system automatically or need to obtain manually when using cross nodes computing?
You can use Slurm command to obtain nodes, refer to this document for more information.
Q3. Why error occurs when I try to request multiple CPUs?
Please make sure the ratio of the resources is based on 1 GPU : 4 CPU : 90 GB Memory. For example, the number of GPU should be 8 if you need 32 CPUs.
Packages
Q1. How to deploy environment and run my program on Taiwania2 (HPC CLI)?
- Conda: Use simple Conda commands to install packages and switch to your specific virtual environment. Also, with different versions of Python, Conda can reduce the compatibility problems of multiple package versions. For more information, please refer to this document.
- Singularity: By using Singularity to pack the packages and programs you need, you can create a container environment on Taiwania2 (HPC CLI) and deploy, move and share your packages rapidly. For more information, please refer to this document.
Q2. Can you help me install the packages?
You have the permission to install the package freely, please install it yourself according to your needs. In addition, we recommend that you use Conda or Singularity container to manage your packages.
Q3. What is the Slurm scheduling system?
Please refer to this document for the detailed introduction to Slurm system architecture.
Q4. Is it possible to install Rclone, the synchronization tool, on Taiwania2 (HPC CLI)?
Taiwania2 (HPC CLI) has the latest version of Rclone installed. You can use the module load rclone
command to obtain the environment of Rclone. Rclone is written in Go language and can be used directly after unzipping in your home directory.
Q5. Does Taiwania2 (HPC CLI) support Nvidia's CUDA computing architecture?
Yes, you can use the module avail
command on Taiwania2 (HPC CLI) to list available modules and use the module load
command to select the the CUDA version you need.
Q6. Why can some packages be used in an Interactive container but not in Taiwania2 (HPC CLI)? Don't the two share Hyper File System (HFS)?
The storage environment of the two is the same, but the computing environment is different:
- The computing environment of the Interactive container is built using the TWSC container image file.
- Taiwania2 (HPC CLI) requires users to deploy their own computing environment.
Tip: Taiwania2 (HPC CLI) may use themodule
command to load the required packages. Refer to this document for usage.
Data storage and transfer
Q1. Why is the directory of /home/$USER empty after login?
The storage space of Taiwania2 (HPC CLI) is Hyper File System. Only you have full permission, so the storage space will be empty if you never load any data.
Q2. Will the files in Taiwania2 (HPC CLI) be deleted once the project expires?
The storage space is bound with your personal account, so the files will not be deleted along with the project after the project expires.
Important: The system will regularly clean up the resources under the TWSC account that have not been used for a long time. Please be sure to back up your data regularly.Networking and security
Q1. What is the IP address of Taiwania2 (HPC CLI)?
203.145.219.98
Execution errors
Q1. Occurs this error QOSMaxSubmitJobPerUserLimit Error
when using Taiwania2 (HPC CLI)?
QOSMaxSubmitJobPerUserLimit Error
when using Taiwania2 (HPC CLI)?This error message showed that you have submitted over 20 computing job (queue gtest is for experimental use, only able to submit 5 jobs).
When the error occurs, you are recommended to use the squeue
command to check the job state and cancel the pending or running job using the scancel
command to reduce the quantity of the job.
For instructions on using Queues and computing resources, please refer to this document.
Q2. Occurs this error QOSMaxGRESPerUser
in NODELIST(REASON)
after job submission?
QOSMaxGRESPerUser
in NODELIST(REASON)
after job submission? This error message showed that you have reached the maximum number of GPUs calculated by Taiwania2 (HPC CLI) (the base system is set to 40 GPUs).
For instructions on using Queues and computing resources, please refer to this document. If you reach the limit, you will not be able to submit your work, so please ask for less GPU resources.
Q3. When running a slurm command, an error message Socket timed out on send/recv operation
appears?
Socket timed out on send/recv operation
appears?This error message occurs because the login node system is busy. Please wait for a while and try running the command again.