NetworkX is a package for the Python programming language that’s used to create, manipulate, and study the structure, dynamics, and functions of complex graph networks.
NetworkX is a package for the Python programming language that’s used to create, manipulate, and study the structure, dynamics, and functions of complex graph networks.
NetworkX is a Python package for complex graph network analysis. In order to understand NetworkX functionality, you first need to understand graphs. Graphs are mathematical structures used to model many types of relationships and processes in physical, biological, social and information systems. A graph consists of nodes or vertices (representing the entities in the system) that are connected by edges (representing relationships between those entities). Working with graphs is a function of navigating edges and nodes to discover and understand complex relationships and/or optimize paths between linked data in a network.
There are many uses of graph network analysis, such as analyzing relationships in social networks, cyber threat detection, and identifying the people most likely to buy a product based on shared preferences.
In the real world, nodes can be people, groups, places, or things such as customers, products, members, cities, stores, airports, ports, bank accounts, devices, mobile phones, molecules, or web pages.
Examples of edges, or relationships between nodes, include friendships, network connections, hyperlinks, roads, routes, wires, phone calls, emails, “likes,” payments, transactions, phone calls, and social networking messages. Edges can have a one-way direction arrow to represent a relationship from one node to another, like if Janet “liked” a social media post of Jeanette’s. But they can also be non-directional, like if Bob is a Facebook friend of Alice, then Alice is also a friend of Bob.
NetworkX nodes can be any object that is hashable, meaning that its value never changes. These can be text strings, images, XML objects, entire graphs, and customized nodes. The base package includes many functions to generate, read, and write graphs in multiple formats.
NetworkX has the capacity to operate on very large graphs with more than 10 million nodes and 100 million edges. The core package, which is free software under the BSD license, includes data structures for representing such things as simple graphs, directed graphs, and graphs with parallel edges and self-loops. NetworkX also has a large community of developers who maintain the core package and contribute to a third-party ecosystem.
Among the principal uses of NetworkX are:
NetworkX is considered relatively easy to install and use, particularly for Python developers.
Graph analytics can be used to determine the strength and direction of relationships between objects in a graph. The demand for tools to analyze relationships has nearly limitless potential given the growing role of networks in our information ecosystem. The influence of social networks on everything from buying decisions to national elections has catalyzed interest in graph analysis. It’s particularly useful in discovering relationships that aren’t obvious because of the complexity of the network or the number of paths between nodes.
Graph analytics has been useful to achieve the following:
NetworkX provides a standardized way for data scientists and other users of graph mathematics to collaborate, build, design, analyze, and share graph network models. As free software that’s notable for its scalability and portability, NetworkX has been widely adopted by Python enthusiasts. It’s also the most popular graph framework used by data scientists, who contribute to a vibrant ecosystem of Python packages that extend NetworkX with features such as numerical linear algebra and drawing.
Data Science Teams
Big data science projects like machine learning and deep learning often require collaboration between many team members. The availability of standardized tools and formats greatly simplifies information sharing. With its roots in Python, one of the most popular data science languages, NetworkX provides a graph analysis extension to Python libraries that requires minimal training for Python users and can be deployed across teams in different companies and continents.
GPUs provide a great way to accelerate data-intensive analytics—and graph analytics in particular—because of the massive degree of parallelism and the memory access bandwidth advantages. A GPU’s massively parallel architecture, consisting of thousands of small cores designed for handling multiple tasks simultaneously, makes it well suited for the computational task of “for every X do Y”, which can apply to sets of vertices or edges within a large graph.
NVIDIA RAPIDS™ cuGraph delivers an accelerated graph analytics library that integrates the RAPIDS ecosystem with NetworkX. The vision of RAPIDS cuGraph is to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks.
The compute power of the latest NVIDIA GPUs make graph analytics faster. Moreover , the internal memory speed within a GPU allows cuGraph to rapidly switch the data structure to best suit the needs of the analytic rather than being restricted to a single data structure.
RAPIDS’s graph algorithms like PageRank and functions like NetworkX make efficient use of the massive parallelism of GPUs to accelerate analysis of large graphs by over 1000X. Users can explore up to 200 million edges on a single NVIDIA A100 Tensor Core GPU and scale to billions of edges on NVIDIA DGX™ A100 clusters.
RAPIDS combines the ability to perform high-speed ETL, graph analytics, machine learning, and deep learning. It’s a suite of open-source software libraries and APIs for executing data science pipelines entirely on GPUs—and can reduce training times from days to minutes. RAPIDS relies on NVIDIA CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high memory bandwidth through user-friendly Python interfaces.
Rapids cuGraph seamlessly integrates into the RAPIDS data science ecosystem to enable data scientists to easily call graph algorithms using data stored in a GPU DataFrame. With the RAPIDS GPU DataFrame, data can be loaded onto GPUs using a Pandas-like interface, and then used for various connected machine learning and graph analytics algorithms without ever leaving the GPU. This level of interoperability is made possible through libraries like Apache Arrow. This allows acceleration for end-to-end pipelines—from data prep to machine learning to deep learning. RAPIDS and DASK allow cuGraph to scale to multiple GPUs to support multi-billion edge graphs.