跳至主要内容

FFM 規格說明

under construction

本文件中文版尚在籌備中,敬請期待。

2. Specifications

Hardware specifications

FFM (Formosa Foundation Model) has been optimized for Taiwania 2, the Top-1 supercomputer in Taiwan. It leverages hardware configuration for rapid and low-carbon emission GREEN train jobs.

Taiwania 2 is the TOP-1 high-performance computing platform designed to deliver exceptional computing power. With a shared infrastructure of 252 compute nodes and 2,016 NVIDIA® Tesla® V100 GPUs provided in collaboration with TWSC computing services, Taiwania 2 offers an impressive performance of 9 petaflops (9 quadrillion, 10^15, floating-point operations per second). The system utilizes NVLINK technology, enabling seamless communication between GPUs at a maximum speed of 300 GB/s, ensuring high bandwidth and low-latency node-to-node communication.

Each individual node of Taiwania 2 boasts the following specifications:

  • CPU: Dual Intel Xeon Gold 6154, featuring 18 cores running at 3.0GHz.
  • Memory: 24 RDIMMs of 32GB DDR4-2666, providing ample memory capacity.
  • System Drives: Dual 2.5" 240GB SATAIII SSDs configured in RAID 1 for enhanced data redundancy.
  • Data Storage Drive: A blazing-fast 4TB NVMe SSD to handle large-scale data storage requirements.
  • GPU: Equipped with 8 NVIDIA® Tesla® V100 SXM2 GPUs, delivering exceptional computational performance.
  • Network Interface: Fitted with 4 Mellanox InfiniBand EDR 100Gb network cards for high-speed interconnectivity.

Taiwania 2 is a cutting-edge computing platform that empowers researchers, scientists and engineers with its tremendous computational capabilities. Whether tackling complex simulations, data-intensive tasks, or deep learning applications, Taiwania 2 provides the necessary performance, memory, and storage resources to accelerate your projects and drive innovation.

Computing Efficiency

Peta-FLOP / s-day

This computing power unit is introduced by OpenAI in 20181, meaning "A petaflop/s-day (pfs-day) consists of performing 1015 neural net operations per second for one day, or a total of about 1020 operations." By using this unit, it is easy to compare required computing power for deep learning model, for example:

  • GPT-3, Language Models are Few-Shot Learners: 3640 pfs-days (7/2020)
  • Transformer Model, Attention is all you need: 0.089 pfs-days (6/2017)
  • Stochastic Optimization, Adam Optimizer: less than 0.0007 pfs-days (12/2014)
  • Neural Machine Translation, Learning to Align and Translate: 0.018 pfs-days (9/2014)
  • Continuous Skip-gram model for text, Word2Vec: less than 0.00045 pfs-days (10/2013)
  • Auto-Encoder Task, Variational Auto Encoders: less than 0.0000055 pfs-days (12/2013)
  • Assuming that Bard is trained on 100TB of data and can perform 100 billion operations per second, then its pfs-day would be about 31.5 million. This means that Bard could perform about 31.5 million petaflops of operations in one day. (According to Google Bard)

Based on the aforementioned definition of petaflops/s-day, we define the computational power of Taiwania 2 as 9 petaflops, specifically tailored for AI learning applications. Please see the following caculation:

Taiwania 2 petaflop/s-day  = (9 peta-flops/second * 86,400 seconds/day) / 1020 operations
= 762.35 petaflop/s-day

Based on the aforementioned calculation, training GPT-3 on Taiwania 2 (3640 pfs-days) would require approximately 4.77 days of computation time. Taiwania 2 is currently a government-owned public cloud service in Taiwan, capable of training large language models, meeting the demands of the nation's various tasks and assignments in the field of Language Model Learning.

HFS (Hyper File System)

Unique to TWSC, HFS provides a fully managed file system designed specifically for AIHPC High Performance Computing. HFS can be used as a general independent data storage, which is applicable for all types of TWSC Container Compute Services. The high performance parallel file system provides fast access to each data operation, and supports an online real-time expansion of HFS storage space, thereby providing a great user experience.

It supports TWSC High Performance Computing (HPC) and Container Compute Service (CCS) simultaneously to help you:

  • Eliminate the need to manage your own file system, duplicate files or move data around.
  • Have high IO performance for both large and small files, with high bandwidth data throughput to meet various AI and HPC computing needs.
  • Set the online capacity expansion freely in /home and /work file storage directories according to actual usage with immediate availability.
  • Pay-as-you-go billing approach provides flexibility and cost-efficiency, customers only pay for the resources they actually use. There is no need for upfront payments or unnecessary financial commitments. This precise billing method ensures cost transparency and accuracy, as customers only pay for the resources they truly need, avoiding wastage. This also provides businesses with the flexibility to adapt to changing requirements and avoid unnecessary costs.

COS (Cloud Object Storage)

TWSC Cloud Object Storage (COS) service provides an object storage system that is compatible with Amazon S3. Users may upload files from local computers to TWSC Cloud Object Storage system, and connect them to HFS service for AI HPC (High Performance Computing) with the following features:

  • COS is an efficient way to store large amounts of data. It can be used to store any type of file, including images, audio, video and documents.
  • COS service offers high availability and reliability. Your data will always be available, even in the event of a failure.
  • You can use COS service to store confidential data for training AI models. You can upload the models to the cloud, and use the container service to run them.

To use COS service, please read COS Service Overview for more details.