Download PDF
Optimizing Compute Performance: A Case Study on Nanyang Technological University
Technology Category
- Infrastructure as a Service (IaaS) - Hybrid Cloud
- Networks & Connectivity - Ethernet
Applicable Industries
- Education
- Semiconductors
Applicable Functions
- Maintenance
- Product Research & Development
Use Cases
- Smart Campus
- Time Sensitive Networking
Services
- Cloud Planning, Design & Implementation Services
The Challenge
Nanyang Technological University's High Performance Computing Centre (HPCC) was facing a significant challenge. With over 4,500 CPU cores, 40 NVIDIA Tesla GPGPU cards, 2,700TB storage, 100GB InfiniBand interconnect, and 40G/100G Ethernet backbone with technical support, HPCC was producing nearly 19 million core CPU-hours and nearly 300,000 GPU-hours in 2021 to support more than 160 NTU researchers. The HPCC digital community had grown to nearly 800 NTU members, and as its ranks continued to increase, the number of HPC and AI applications was growing rapidly. The small, four-engineer team at HPCC needed cutting-edge tools to support their growing user community and evaluate scaling up to a hybrid cloud environment. They required job-level insights to understand runtime issues, metrics on I/O, CPU, and memory to identify bottlenecks, and the ability to detect problematic applications and rogue jobs with bad I/O patterns that could overload shared storage.
About The Customer
Nanyang Technological University (NTU) Singapore is a research-intensive public institution that supports around 33,000 students and 10,000 staff in engineering, science, business and humanities, arts, social sciences, and medicine. NTU is one of the world’s most prestigious universities and it’s among the oldest in Singapore, with the nation’s largest campus at nearly 500 acres. NTU’s High Performance Computing Centre (HPCC) was established in 2010 to support the university’s large-scale and data-intensive computing needs, and the need for resources continues to grow.
The Solution
To address these challenges, the HPCC team deployed Altair Mistral to profile application I/O and determine the most efficient options to optimize HPC at NTU. They measured the performance of the popular Gaussian chemistry application with three different types of storage: local NVMe, tier 1 scale-out all-flash NAS, and tier 2 scale-out NAS with SSD/HDD. Mistral measured the application’s job-run characteristics based on several parameters including read and write counts, read and write bytes, memory usage, processing time, and I/O latency. The metrics revealed the strengths and weaknesses of each type of storage. With I/O profiling using Mistral, NTU’s HPCC team can now find the best-fitted nodes for application requirements and determine the most affordable, best-performing storage for different application types — and know which are best-suited for cloud vs. on-premises infrastructure.
Operational Impact
Quantitative Benefit
Related Case Studies.
Case Study
KINESYS Semiconductor Factory Automation Software
KINESYS Software provides both Integrated Device Manufacturer (IDM) and Original Equipment Manufacturer (OEM) customers world-class software products and solutions for advanced wafer and device traceability and process management. KINESYS offers state of the art database technology with a core focus on SEMI standards. KINESYS’ challenge was to make back-end processing failure-free and easy to use for clients while supporting licensing models more adaptable to changing industry needs.
Case Study
IoT platform Enables Safety Solutions for U.S. School Districts
Designed to alert drivers when schoolchildren are present, especially in low-visibility conditions, school-zone flasher signals are typically updated manually at each school. The switching is based on the school calendar and manually changed when an unexpected early dismissal occurs, as in the case of a weather-event altering the normal schedule. The process to reprogram the flashers requires a significant effort by school district personnel to implement due to the large number of warning flashers installed across an entire school district.
Case Study
Revolutionizing Medical Training in India: GSL Smart Lab and the LAP Mentor
The GSL SMART Lab, a collective effort of the GSL College of Medicine and the GSL College of Nursing and Health Science, was facing a challenge in providing superior training to healthcare professionals. As clinical medicine was becoming more focused on patient safety and quality of care, the need for medical simulation to bridge the educational gap between the classroom and the clinical environment was becoming increasingly apparent. Dr. Sandeep Ganni, the director of the GSL SMART Lab, envisioned a world-class surgical and medical training center where physicians and healthcare professionals could learn skills through simulation training. He was looking for different simulators for different specialties to provide both basic and advanced simulation training. For laparoscopic surgery, he was interested in a high fidelity simulator that could provide basic surgical and suturing skills training for international accreditation as well as specific hands-on training in complex laparoscopic procedures for practicing physicians in India.
Case Study
Implementing Robotic Surgery Training Simulator for Enhanced Surgical Proficiency
Fundacio Puigvert, a leading European medical center specializing in Urology, Nephrology, and Andrology, faced a significant challenge in training its surgical residents. The institution recognized the need for a more standardized and comprehensive training curriculum, particularly in the area of robotic surgery. The challenge was underscored by two independent studies showing that less than 5% of residents in Italian and German residency programs could perform major or complex procedures by the end of their residency. The institution sought to establish a virtual reality simulation lab that would include endourological, laparoscopic, and robotic platforms. However, they needed a simulator that could replicate both the hardware and software of the robotic Da Vinci console used in the operating room, without being connected to the actual physical console. They also required a system that could provide both basic and advanced simulation training, and a metrics system to assess the proficiency of the trainees before they performed surgical procedures in the operating theater.
Case Study
Edinburgh Napier University streamlines long-distance learning with Cisco WebEX
• Geographically dispersed campus made in-person meetings costly and inconvenient.• Distance-learning programs in Malaysia, India, and China required dependable, user-friendly online tools to maximize interaction in collaborative workspaces.• Virtual learning environment required a separate sign-in process, resulting in a significant administrative burden for IT staff and limited adoption of collaboration technology.