Data scientists
One of the challenges of data science is to create repeatable experiments in reproducible environments with the ability to track and monitor metrics in production. Containers offer the ability to create repeatable pipelines with multiple coordinated stages that work together in a reproducible way for processing, feature extraction, and testing.
Declarative configurations in Kubernetes describe connections between services. Microservice architectures enable easier debugging and improve collaboration between members of a data science team. Data scientists can also take advantage of extensions like BinderHub, which lets them build and register container images from a repository and publish them as a shared notebook that other users can interact with.
Other extensions like Kubeflow streamline the process of setting up and maintaining machine learning workflows and pipelines in Kubernetes. The portability benefits of the orchestrator make it possible for a data scientist to develop on a laptop computer and deploy anywhere.
Devops
Putting machine learning models into production can be a struggle for data engineers. They spend time editing configuration files, allocating server resources, and worrying about how to scale models and incorporate GPUs without causing the project to crash. The container ecosystem has introduced many tools that are intended to make the data engineer’s life easier.
For example, Istio is a configurable, open-source service-mesh layer that makes it easy to create a network of deployed services with automated load balancing, service-to-service authentication, and monitoring with little or no change to service code. It provides fine-grained control of traffic behavior, rich routing rules, retries, failovers, and fault injection, along with a pluggable policy layer and configuration API for access controls, rate limits, and quotas.
The Kubernetes ecosystem is continuing to evolve with such specialized tools to make server configuration invisible and to enable data engineers to visualize dependencies that make configuration and troubleshooting easier.