NVIDIA Unified Fabric Manager (UFM)

Explore the network management platforms for cyber intelligence and analytics.

The NVIDIA UFM® platform revolutionizes data center networking management by combining enhanced, real-time network telemetry with AI-powered cyber intelligence and analytics to support scale-out, InfiniBand-connected data centers.

 

Data Center Management Made Easy

The UFM platform empowers research and industrial data center operators to efficiently provision, monitor, manage, and preventively troubleshoot and maintain their high-performance InfiniBand networking fabric. The UFM platform is made up of multiple solution levels and a comprehensive feature set to meet the broadest range of modern, scale-out data center requirements. Using UFM, you can realize higher utilization of fabric resources and gain a competitive advantage, while reducing opex.

UFM platforms feature robust graphical user interfaces (GUIs)

The UFM platform features robust graphical user interfaces (GUIs).

Find out how easy it is to manage, monitor, and maintain your InfiniBand-connected data center with a free 60-day trial of UFM enterprise software.

UFM Platforms Product Suite

UFM Telemetry
Real-Time Monitoring

UFM Telemetry provides network validation tools to monitor network performance and conditions. It also captures and streams rich, real-time network telemetry information, application workload usage, and system configuration to an on-premises or cloud-based database for further analysis.

 

It’s available via software containers or dedicated appliances.

 

Key features:

  • Switches, adapters, and cables telemetry
  • System validation
  • Network performance tests
  • Streaming of telemetry information to on-premises or cloud-based database

UFM Enterprise
Fabric Visibility and Control

UFM Enterprise combines the benefits of UFM Telemetry with enhanced network monitoring and management. It performs automated network discovery and provisioning, traffic monitoring, and congestion discovery.

 

It’s available via software containers or dedicated appliances.

 

Key features:

  • Includes UFM Telemetry features
  • Automated network discovery and validation
  • Secure cable management
  • Congestion tracking to identify traffic bottlenecks
  • Problem identification and resolution
  • Global software updates
  • Job scheduler provisioning, integrated with Slurm and IBM Spectrum LSF
  • Advanced reporting and comprehensive representational state transfer (REST) APIs
  • Rich web-based GUI

UFM Cyber-AI
Cyber Intelligence and Analytics

UFM Cyber-AI enhances the benefits of UFM Telemetry and UFM Enterprise, providing preventive maintenance and cybersecurity for lowering supercomputing opex.

 

It’s available via a dedicated UFM Cyber-AI appliance on premises.

 

Key features:

  • Includes UFM Telemetry and UFM Enterprise features
  • Detects performance degradations or usage profile changes over time
  • Detects abnormal cluster behavior
  • Uses AI to make correlations between phenomena (that may seem non-related)
  • Alerts when preventive maintenance is required
  • Optimizes predictability with continuous system data collection

NVIDIA UFM SDK

NVIDIA Net working Care—Monitoring and Network Operations Center (NOC) Services

A Comprehensive Suite of Tools and Plug-Ins for NVIDIA InfiniBand-Connected Clusters

The NVIDIA UFM SDK offers an extensive range of third-party plug-ins designed for open-source platforms, such as Grafana, FluentD, Zabbix, and Slurm. These tools and plug-ins enhance developer productivity and offer an efficient, user-friendly integration with the UFM REST API. Check out our Application Lifecycle Management (ALM) and Problem Detection and Resolution (PDR) predictive maintenance plug-ins. Anticipate issues before they arise and maintain peak network performance.

Resources

Configure Your Cluster

Take Networking Courses

Ready to Purchase?