Kubeflow and Kubernetes – Create a Standardized Machine Learning Environment
Machine learning is a subject with a lot of potential. That potential is hard to exploit because of the complex pipelines and the challenges of scalability. Tools such as TensorFlow and Jupyter do make life easier for developers, but machine learning and AI are two areas of computing where the ideas of Continuous Integration and Deployment have not yet been embraced.
Kubeflow aims to change that and streamline machine learning, creating a cohesive pipeline and making scalable environments, automating the application lifecycle to a greater extent, and making the process of development, deployment and modification that little bit simpler. Kubeflow runs in Kubernetes containers, simplifying the management of large and complex machine learning jobs.
What is Kubeflow?
Kubeflow is an open-source project that aims to make the process of deploying machine learning workflows on Kubernetes simpler, more portable, and more scalable. It includes support for TensorFlow Serving Containers for exporting TensorFlow models to Kubernetes, TensorFlow model training, and services for creating Jupyter notebooks. Kubeflow pipelines allow for efficient deployment and management of end-to-end machine learning workflows. End-to-end streaming using Kubeflow, Kafka and Redis can help create complex models at scale, quickly.
The project was first released in 2017 and has an active community of developers. The team is working to extend support beyond TensorFlow, to include PyTorch, Chainer, Apache MXNet and other platforms. There is a large community surrounding Kubeflow, and the developers are open and welcoming to newcomers looking to contribute to the project.
Common Issues in Corporate Environments
It’s natural for development teams in corporate environments to be risk-averse when it comes to trying new tools. However, around 50% of Kubeflow’s users are in an enterprise environment, and a lot of those users have on-premise Kubernetes clusters. The development team has responded to the wishes of those users and added support for enterprise-grade auth, making the tool a one that enterprise teams can deploy with confidence.
Getting enterprise users to move over to Kubeflow for their AI and machine learning is an easy sale, too. Consider the challenges that are often faced in an enterprise environment:
- Managing library versions across multiple, large teams
- Keeping everyone on the same stack as the project evolves
- Keeping track of versioning data and model iterations
- Safely managing datasets that may be several terabytes (or bigger) in size
- Managing the lifecycle of the application
Those issues are challenging enough in an academic setting with a small team of developers and researchers, but as the team grows there becomes more points of failure and more opportunities for errors.
Kubeflow’s composable format means that it is easy to define a stack and ensure that everyone is using that stack across the whole company. Should the stack need to be changed, Kubeflow can update the stack for everyone using the model, avoiding ‘broken models’ or usual results because someone in the team forgot to update their stack.
The hyperparameter feature allows data scientists to keep track of their model’s iterations so that they can compare models and variables, and see at-a-glance which settings gave the best results. Automation tools reduce the risk of errors and issues as an application moves through its lifecycle, speeding up testing, training, and development, and freeing up researcher time.
Why Choose Kubernetes and Kubeflow?
Kubernetes is a powerful tool that can automate container provisioning, networking, scalability, security, and load balancing for machine learning and indeed for many other applications. It makes managing clusters of machines much easier.
Easier, however, does not necessarily mean “simple”. The principles of Kubernetes are easy enough to understand and to get a feel for by running a small test setup in the lab. However, once you start to scale up and run Kubernetes setups in a production environment, requiring significant scalability, elasticity, and security, it becomes more challenging to manage. Data scientists don’t want to spend their time thinking about security, resource management, config files and logging. They want to focus on training data, test data and models.
Using Kubeflow helps to avoid common “gotchas” that occur when a relative novice is in charge of scalable architecture, such as:
- Issues with day-two operations
- Failure to account for the complexities of running Kubernetes at scale
- Attempting to turn a monolithic application into something that will run on K8s
- Failure to account for update lifecycles
- Incorrect networking setups causing communication issues between master and worker nodes
- Poorly deployed updates/configs preventing admins from managing containers
- Underestimating the learning curve of security and networking
- A lack of policies about external load balancers potentially exposing sensitive data to the Internet.
The above are just some of the issues that can occur with poorly managed ‘raw’ Kubernetes. Kubeflow reduces the administration overhead and allows the data scientists to get straight to making Jupyter notebooks in the cloud. It does not promise to fix every potential issue with deploying new or untested technologies in an enterprise, however, it can reduce the risks significantly.
Tests can be run on small datasets, and should the need arise, the model can be scaled up to use more machines if necessary. Kubeflow takes the model and optimizes it, breaking it down into smaller tasks so that each task can be processed in parallel. The person managing the data doesn’t need to care whether the model is being run on a couple of servers or thousands. The API abstracts that part, making the decisions about opening new instances when more jobs are added, and providing a simple interface to query results when jobs are completed.
All a team needs to get started with Kubeflow on Kubernetes is reliable hosting and high-speed data storage to allow containers to run smoothly. With reliable hosting and sound deployment of Kubeflow, administrators and machine learning/artificial intelligence data scientists can feel more confident about the infrastructure that they are modeling on, and the safety and security of the data that they are dealing with.
If you would like help getting started with Kubernetes or Kubeflow, contact us to learn how to get started!