| Exam Name: | NVIDIA AI Infrastructure | ||
| Exam Code: | NCP-AII Dumps | ||
| Vendor: | NVIDIA | Certification: | NVIDIA-Certified Professional |
| Questions: | 71 Q&A's | Shared By: | miguel |
During multi-node HPL burn-in, GPUs show uneven utilization. Which configuration ensures balanced workload distribution?
As the infrastructure lead for an NVIDIA AI Factory deployment, you have just uploaded the latest supported firmware packages to your DGX system. It is now critical to ensure all hardware components run the new firmware and the DGX returns to full operational capability. Which sequence best guarantees that all relevant components are correctly running updated firmware?
Refer to the output:
~ $ sudo nvsm show healthinfo
—Timestamp: Sat Dec 16 16:26:32 2017 -0800
Version: 17.12-5
Checks—BIOS Revision [5.11].........................
DGX Serial Number [YSY72800016)..................
Verify installed DIMM memory sticks........................Healthy
...[output truncated)
Verify Ethernet controllers...........................Healthy
Verify installed GPU's..............................Unhealthy
Checking output of 'lspci' for expected GPU's
Missing GPU at PCI address '07:00.0'
Verify installed InfiniBand controllers....................Healthy
Verify PCIe switches..................................Healthy
...[output truncated)
What insights can a system administrator gain regarding the DGX system's health?
An engineer needs to validate 400G DAC cable signal integrity in a DGX cluster. Which CVT metric best identifies marginal cables needing replacement?