Introduction

Overview

Quick Start



Using Brev for AI & ML

Using Custom Containers?


Creating Launchables on Brev

Getting Started

Leveraging Metrics



Getting Started

Welcome to NVIDIA Brev

NVIDIA Brev is an AI/ML development platform that allows you to run, build, train, and deploy ML models on the cloud. Brev allows you to start small on a CPU instance and effortlessly scale to larger GPU clusters for any workload.

Why should I use NVIDIA Brev?

Build for devs: Brev is built for developers by developers. We understand the challenges of building and scaling ML projects and built Brev to a lightweight but powerful platform for any type of workload.

Cost-effective: Brev aggregates multiple cloud providers to find the right type of GPU for the best price possible.

Powerful: Brev allows you to run any code on your GPU instance. You can use Brev to train and deploy ML models on the cloud.

Introduction

Quick Start

Let's create our first Brev instance

  1. Create an account

Make an account in the Brev console.

  1. Create your first instance

You can create your own instance from scratch or use a Launchable. The Discover page on the console provides a couple of starting Launchables if you're not sure where to get started or need inspiration!

  1. Connect to your instance and start coding

Brev wraps ssh to make it easy to hop into your instance, so with the CLI, run one of the following commands in your terminal.

To SSH directly into your dev instance:

brev shell stable-diffusion

or use VS Code to access your instance with:

brev open stable-diffusion

That's it 

Try creating your own instance here and reach out to us in the Discord for help! We're here for anything you need. Build something great.

Running instance might still be installing your dependencies

If the instance is running, that means the machine itself has been started. However, the instance may still be installing your dependencies and it might take a few minutes for the Jupyter Notebook button to become available after the instance has started.

Reference

Brev CLI Reference

The Brev CLI wraps SSH to quickly get you in your instance, while also letting you do almost everything you can do from the console from the command line.

Every command has a

--help

flag if you need to see options.

Installation Instructions

Install the Brev CLI to easily jump in and out of your instances, scale hardware and manage your org:

brew install brevdev/homebrew-brev/brev && brev login

If you had trouble installing let us know in our Discord!

Instance Commands

refresh

Syntax

brev refresh

Description

This command syncs instances that you've created through the UI allowing you to access them through the CLI

Example

$ brev refresh
refreshing brev...
brev has been refreshed

list

Syntax

# print orgs
brev list orgs

# print workspaces within active organization
brev list

# print workspaces within any organization
brev list --org org_name

Description

The list commands lets you see all the workspaces you have created or all workspaces within your organization. You can also use brev ls as an alias

Example

$ brev list

NAME STATUS ID MACHINE
more-gpus STOPPED qn643e4lo a2-highgpu-1g:nvidia-tesla-a100:1 (gpu)
brev-deployment-box STOPPED 3xq8vmxn6 n1-standard-16 (gpu)
pytorch-container-finetune RUNNING m4tvipac7 n1-highmem-2:nvidia-tesla-t4:1 (gpu)

start

Syntax

brev start {-n | --name}

Description

The brev start lets you start a stopped instance.

Example

$ brev start stable-diffusion-ui

Workspace stable-diffusion-ui is starting.

Note: this can take about a minute. Run 'brev ls' to check status

stop

Syntax

brev stop workspace_name

Description

If you don't plan on using your Brev workspace, you can temporarily pause it by running

brev stop workspace_name

Everything in /home/verb-workspace will be saved when it boots up again.

You can stop multiple workspaces by listing each workspace name

$ brev stop brev-deploy naive-pubsub bar euler54 merge-json

Workspace brev-deploy is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace naive-pubsub is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace bar is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace euler54 is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace merge-json is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

port-forward

Syntax

brev port-forward WS_NAME [--port LOCAL_PORT:REMOTE_PORT]

Description

port forward allows you to forward a port from a brev workspace to a port on your local machine. For example, if you're running a jupyter notebook on port 8888 on your vm, you could use brev port-forward WS_NAME --port 8888:8888 to access it at localhost:8888

Example

$ brev port-forward brev-docs --port 3000:3000

portforwarding...

localhost:3000 -> brev-docs-xp43:3000

Interactively port forward a workspace:

To interactively select which port to forward from a brev workspace to your localhost, run brev-port-forward with no flag

$ brev port-forward brev-docs

Ports flag was omitted, running interactive mode!

What port on your Brev machine would you like to forward? 3333
What port should it be on your local machine? 3000

-p 3000:3333
2022/07/14 11:31:30 creating new ssh config
portforwarding...
localhost:3000 -> brev-docs-xp43:3333

delete

Syntax

brev delete [ Workspace Name or ID...

Description

The delete command lets you delete a workspace from your account.

Example

$ brev delete payments-frontend

Deleting workspace payments-frontend.

This can take a few minutes. Run 'brev ls' to check status

Organization Commands

set

Syntax

brev set <org name> [--token] [--skip-browser]

Description

The set command lets you set the organization context for your commands.

Example

$ brev set <org name>

login

Syntax

brev login [--token] [--skip-browser]

Description

This function logs you in to your brev account, and performs some actions that brev needs to function with you user account such as set up config files and

  • creates ~/.brev/ directory if it does not exist
  • if you don't have an account on brev, the browser step will create one for you
  • on first run asks you onboarding questions
  • on first run asks you to configure ssh keys
  • creates your first org if one does not exist

Example

$ brev logout

logout

Syntax

brev logout

Description

Remove your keys and logout

Example

$ brev logout

ssh-key

Get your ssh keys to add to your git provider.

brev ssh-key

Quick links to add it to Github or Gitlab

sudo bash -c "$(curl -fsSL
https://raw.githubusercontent.com/brevdev/brev-cli/main/bin/install-latest.sh)"

brev login

if you had trouble installing let us know in our Discord!

Instance Commands

refresh

Syntax

brev refresh

Description

This command syncs instances that you've created through the UI allowing you to access them through the CLI

Example

$ brev refresh
refreshing brev...
brev has been refreshed

list

Syntax

# print orgs
brev list orgs

# print workspaces within active organization
brev list

# print workspaces within any organization
brev list --org org_name

Description

The list commands lets you see all the workspaces you have created or all workspaces within your organization. You can also use brev ls as an alias

Example

$ brev list

NAME STATUS ID MACHINE
more-gpus STOPPED qn643e4lo a2-highgpu-1g:nvidia-tesla-a100:1 (gpu)
brev-deployment-box STOPPED 3xq8vmxn6 n1-standard-16 (gpu)
pytorch-container-finetune RUNNING m4tvipac7 n1-highmem-2:nvidia-tesla-t4:1 (gpu)

start

Syntax

brev start {-n | --name}

Description

The brev start lets you start a stopped instance.

Example

$ brev start stable-diffusion-ui

Workspace stable-diffusion-ui is starting.

Note: this can take about a minute. Run 'brev ls' to check status

stop

Syntax

brev stop workspace_name

Description

If you don't plan on using your Brev workspace, you can temporarily pause it by running

brev stop workspace_name

Everything in /home/verb-workspace will be saved when it boots up again.

You can stop multiple workspaces by listing each workspace name

$ brev stop brev-deploy naive-pubsub bar euler54 merge-json

Workspace brev-deploy is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace naive-pubsub is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace bar is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace euler54 is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace merge-json is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

port-forward

Syntax

brev port-forward WS_NAME [--port LOCAL_PORT:REMOTE_PORT]

Description

port forward allows you to forward a port from a brev workspace to a port on your local machine. For example, if you're running a jupyter notebook on port 8888 on your vm, you could use brev port-forward WS_NAME --port 8888:8888 to access it at localhost:8888

Example

$ brev port-forward brev-docs --port 3000:3000

portforwarding...

localhost:3000 -> brev-docs-xp43:3000

Interactively port forward a workspace:

To interactively select which port to forward from a brev workspace to your localhost, run brev-port-forward with no flag

$ brev port-forward brev-docs

Ports flag was omitted, running interactive mode!

What port on your Brev machine would you like to forward? 3333
What port should it be on your local machine? 3000

-p 3000:3333
2022/07/14 11:31:30 creating new ssh config
portforwarding...
localhost:3000 -> brev-docs-xp43:3333

delete

Syntax

brev delete [ Workspace Name or ID...

Description

The delete command lets you delete a workspace from your account.

Example

$ brev delete payments-frontend

Deleting workspace payments-frontend.

This can take a few minutes. Run 'brev ls' to check status

Organization Commands

set

Syntax

brev set <org name> [--token] [--skip-browser]

Description

The set command lets you set the organization context for your commands.

Example

$ brev set <org name>

login

Syntax

brev login [--token] [--skip-browser]

Description

This function logs you in to your brev account, and performs some actions that brev needs to function with you user account such as set up config files and

  • creates ~/.brev/ directory if it does not exist
  • if you don't have an account on brev, the browser step will create one for you
  • on first run asks you onboarding questions
  • on first run asks you to configure ssh keys
  • creates your first org if one does not exist

Example

$ brev logout

logout

Syntax

brev logout

Description

Remove your keys and logout

Example

$ brev logout

ssh-key

Get your ssh keys to add to your git provider.

brev ssh-key

Quick links to add it to Github or Gitlab

Using Brev With Windows Subsystem for Linux (WSL)

Brev is supported on windows currently through the Windows Subsystem for Linux (WSL). This guide will walk you through the steps to get Brev up and running on your Windows machine.

Prerequisites

  • WSL installed and configured
  • Virtualization enabled in your BIOS
  • Ubuntu 20.04 installed from the Microsoft Store

Once you have WSL installed and configured, you can install Brev by running the following command in your terminal:

sudo bash -c "$(curl -fsSL
https://raw.githubusercontent.com/brevdev/brev-cli/main/bin/install-latest.sh)"

Next Steps

Log in to your Brev account:

brev login

you can start using it to create and manage your instances. Check out our Getting Started guide to get started with Brev.

If you had trouble installing let us know in our Discord!

Instance Commands

refresh

Syntax

brev refresh

Description

This command syncs instances that you've created through the UI allowing you to access them through the CLI

Example

$ brev refresh
refreshing brev...
brev has been refreshed

list

Syntax

# print orgs
brev list orgs

# print workspaces within active organization
brev list

# print workspaces within any organization
brev list --org org_name

Description

The list commands lets you see all the workspaces you have created or all workspaces within your organization. You can also use brev ls as an alias

Example

$ brev list

NAME STATUS ID MACHINE
more-gpus STOPPED qn643e4lo a2-highgpu-1g:nvidia-tesla-a100:1 (gpu)
brev-deployment-box STOPPED 3xq8vmxn6 n1-standard-16 (gpu)
pytorch-container-finetune RUNNING m4tvipac7 n1-highmem-2:nvidia-tesla-t4:1 (gpu)

start

Syntax

brev start {-n | --name}

Description

The brev start lets you start a stopped instance.

Example

$ brev start stable-diffusion-ui

Workspace stable-diffusion-ui is starting.

Note: this can take about a minute. Run 'brev ls' to check status

stop

Syntax

brev stop workspace_name

Description

If you don't plan on using your Brev workspace, you can temporarily pause it by running

brev stop workspace_name

Everything in /home/verb-workspace will be saved when it boots up again.

You can stop multiple workspaces by listing each workspace name

$ brev stop brev-deploy naive-pubsub bar euler54 merge-json

Workspace brev-deploy is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace naive-pubsub is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace bar is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace euler54 is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

Workspace merge-json is stopping.

Note: this can take a few seconds. Run 'brev ls' to check status

port-forward

Syntax

brev port-forward WS_NAME [--port LOCAL_PORT:REMOTE_PORT]

Description

port forward allows you to forward a port from a brev workspace to a port on your local machine. For example, if you're running a jupyter notebook on port 8888 on your vm, you could use brev port-forward WS_NAME --port 8888:8888 to access it at localhost:8888

Example

$ brev port-forward brev-docs --port 3000:3000

portforwarding...

localhost:3000 -> brev-docs-xp43:3000

Interactively port forward a workspace:

To interactively select which port to forward from a brev workspace to your localhost, run brev-port-forward with no flag

$ brev port-forward brev-docs

Ports flag was omitted, running interactive mode!

What port on your Brev machine would you like to forward? 3333
What port should it be on your local machine? 3000

-p 3000:3333
2022/07/14 11:31:30 creating new ssh config
portforwarding...
localhost:3000 -> brev-docs-xp43:3333

delete

Syntax

brev delete [ Workspace Name or ID...

Description

The delete command lets you delete a workspace from your account.

Example

$ brev delete payments-frontend

Deleting workspace payments-frontend.

This can take a few minutes. Run 'brev ls' to check status

Organization Commands

set

Syntax

brev set <org name> [--token] [--skip-browser]

Description

The set command lets you set the organization context for your commands.

Example

$ brev set <org name>

login

Syntax

brev login [--token] [--skip-browser]

Description

This function logs you in to your brev account, and performs some actions that brev needs to function with you user account such as set up config files and

  • creates ~/.brev/ directory if it does not exist
  • if you don't have an account on brev, the browser step will create one for you
  • on first run asks you onboarding questions
  • on first run asks you to configure ssh keys
  • creates your first org if one does not exist

Example

$ brev logout

logout

Syntax

brev logout

Description

Remove your keys and logout

Example

$ brev logout

ssh-key

Get your ssh keys to add to your git provider.

brev ssh-key

Quick links to add it to Github or Gitlab

Reference

Brev Console Reference

The Brev Console is a web-based interface for creating, managing, and accessing your instances. It's a great way to get started with Brev and get started with your first AI project.

 

Quick Start

To get started, you'll need to create an account on Brev. You can do this by clicking the "Create an account" button in the top right corner of the console.

Once you've created an account, you can start creating your first instance by clicking the "Create a new instance" button in the top right corner of the console.

 

Navigation

The Brev Console is divided into several sections, including:

  • Instances: This section allows you to create and manage your instances.
  • Discover: This section allows you to discover and deploy our premade Launchables for various AI use cases including Image Generation, Text Generation, and more.
  • People: This section allows you to manage team members in your organization and invite new members.
  • Billing: This section allows you to view and manage your billing information.
  • Docs: This section brings you right back here :)

 

Instances

The Instances section allows you to create, start, stop, and delete your instances. You can also view your instance's status and logs.

 

Create a new instance

To create a new instance, click the "+ New" button in the top right corner of the console. This will bring up our Instance Creation page where you can select your hardware (GPU or CPU), select a base image (optional), configure storage, and deploy the instance.

 

Instance Deployment

Once you've deployed your instance, you'll be taken to the Instance Details page. Here, you can view your instance's status, logs, and access your Jupyter Notebook. On this page, view your instance's status and logs and access any Jupyter Lab connected to your machine. The Access section shows you how to access your instance from your local machine.

 

Exposing ports

In the Access section of the Instance Details page, you can expose ports to your instance. This is helpful if you want to expose an API to your model for inference or simply want to access your instance from a different machine. We use Cloudflare tunnels to expose ports. If you expose and port but cannot access it, reach out to us!

 

Notebooks

The Notebooks section allows you to deploy our premade notebooks for various AI use cases. We have notebooks for most open-source models including Llama3, Mixtral 8x7b, Stable Diffusion, and more.

 

Deploy a Notebook

To deploy a notebook, click the "Deploy Now 🚀" button in the top right corner of the console. This will bring up a stepper where you can interactively view the hardware, deployment logs, and eventually access your notebook. This notebook can also be viewed through the Instance Details page.

 

People

The People section allows you to manage team members in your organization and invite new members.

 

Invite a new member

There are 2 ways to invite a new member to your organization:

  1. Generate an invite link
  2. Invite by username. Here you can choose read only or read and write access.

 

Billing

The Billing section allows you to view and manage your billing information. Billing is organized by organization. Here you can add a credit card to your account and view your billing history. We charge using credits and you can refill your account balance at any time.

Reference

Brev Debugging

This page has common hurdles or known issues that we're addressing.

Reset Brev

If you're noticing some workspaces to be missing, use the Brev refresh command to try forcing a refresh to the ssh config and ensuring the daemon is started:

brev refresh

500 error when running brev start

If you run brev start and see the following 500 error:

➜ ~ brev start https://github.com/brevdev/hello-react

Name flag omitted, using auto generated name: brevdev/hello-react

Workspace is starting. This can take up to 2 minutes the first time.

[error] /home/runner/work/brev-cli/brev-cli/pkg/cmd/start/start.go:260

: https://ade5dtvtaa.execute-api.us-east-1.amazonaws.com/api/organizations/ejmrvoj8m/workspaces?utm_source=cli 500 Internal Server Error

It is likely that you just deleted the workspace and it is still deleting. Please wait 5 seconds and try again.

Workspace version issue

If you've used Brev with an older version of the CLI, it's likely that your workspaces need to be upgraded. You might see an error like this:

workspace of version v1.6.8 is not supported with this cli version\n upgrade your workspace or downgrade your cli"

There isn't a command to upgrade the workspaces yet, so in the interim, please delete the workspace and create it again.

brev delete brev-cli

# if someone else in your org has the workspace

brev start https://github.com/brevdev/brev-cli

or if someone else in your org has the same workspace, you can break start by name to create your own workspace. Run brev ls --all to see workspaces already in your org.

# if someone else in your org has the workspace

brev start brev-cli

Global npm install issues

If you try npm installing something globally, it might not work without sudo.

Rerun the command with sudo, ex sudo npm install http-server -g. Reach out to support if you're still having issues: https://discord.gg/NVDyv7TUgJ

See all the startup logs

Sometimes weird issues happen when configuring the machine, for example, the project folder is empty because ssh keys weren't configured and the repo couldn't be cloned.

You can view the full startup logs by running sudo cat /var/log/brev-workspace.log inside your workspace.

Run Brev without internet

Do you want to run Brev.dev locally without needing internet? We're launching the V2 in late June and would love for you to try it! Hop in the discord and let us know you're waiting for it! https://discord.gg/NVDyv7TUgJ

Reference

Using SCP to load data on your Instance

What is SCP?

SCP, or Secure Copy Protocol, is a network protocol that enables secure file transfers between a local host and a remote host. Whether you wanna copy a script or datasets, SCP can be used to transfer these files to your remote development machine.

 

Using SCP With Brev

Prerequisites

In order to use scp, you'll need to have running instances. For this tutorial, I'll have two running instances named: hf-lora-diffusion-be619e and awesome-gpu-name-testing

Install the Brev CLI and log in. This process will automatically set up your SSH configuration for Brev, which is necessary for SCP to function properly with your Brev account.

brew install brevdev/homebrew-brev/brev

brev login

You can verify that it’s set up correctly by running:

brev ls

You should see your running instances like so:

Once this is done, you're ready to use SCP!

Running the command

The general format for the scp command on your local machine is as follows:

scp <local-file-path> <brev-instance-name>:<remote-file-path>

Example:

scp ./test.json awesome-gpu-name:/home/ubuntu

In this example, we copied a file called test.json from our current local directory to the home directory of our Host machine.

You can verify that your file exists on the remote machine by sshing into your host machine by running the following:

If your container is not built

brev shell <brev-instance-name>

If your container is built

brev shell <brev-instance-name> --host

If you want to transfer files into the Container/Jupyter Notebook in your machine you'll need to change <remote-file-path> from /home/ubuntu to /home/ubuntu/verb-workspace like such.

Container SCP Caveats

For almost all cases /home/ubuntu/verb-workspace is the remote file path where the container mounts. However, this is not the case if the vm uses a custom container which will come from some of our featured notebooks.

You’ll be able to check if your instance is using a custom container from your instances list:

Instead of denoting the python and cuda version, it's the name of an image from a docker registry.

If you want to copy files to the custom container, change <remote-file-path> from /home/ubuntu/verb-workspace to /root/verb-workspace like such:

scp ./test.json hf-lora-diffusion-be619e:/root/verb-workspace

That’s it! You’re now all set to use SCP with Brev! Happy Building!

Using Brev for AI & ML

How to pull your own container?

Let's add a custom container when spinning up a new Brev instance!

 

  1. Create a new instance with + New button on the console
    This kickstarts the GPU provision.
  2. Click on Container Mode
    This mode allows you to specify a container to be pulled onto the instance.

    If you don't care about setting up a container on your GPU, use VM mode instead!
  3. Choose between Recommended & Custom Containers
    Recommended: These are curated containers that developers commonly use on Brev! Use our default container Verb to auto-setup Python and CUDA drivers for your AI/ML workload.

    Custom Containers: Here you can specify a container to be pulled into your instance. If you are pulling from a private registry, specify an entrypoint command + credentials. Some custom containers may produce unexpected results if they interfere with the host system. Use this feature with caution!
  4. Choose your GPU type now!
    Now that you've specified what container will get pulled into the instance, select the GPU type you want to use. The container will build on that GPU after you click Deploy.

Creating Launchables on Brev

Quick Start

Let's create our first Brev Launchable!

Launchables are a way to easily share software/code with others. By pre-configuring an environment, you can create a reproducible template that anyone can use to launch your work. If you have an existing Github repo or Colab Notebook, you can easily add it to your Launchable and share it with others.

Brev provides metrics around your Launchable's usage, so you can see how many people are viewing and using it!

  1. Create an account
    Make an account in the Brev console.
  2. Create your first Launchable
    Head over to the Launchables tab on the Brev console. Click Create Launchable and begin configuring your first Launchable!

    You can customize five different aspects of your Launchable:
    1. Compute: specify the GPU(s) needed to power the Launchable task
    2. Container: specify a container to get pulled onto the GPU (or default to Verb to auto-setup Python & CUDA).
      1. You can configure your own custom container, but it must be public!
    3. Files: add any public Colab Notebook or GitHub Repo/Notebook to be pre-loaded
    4. Expose Ports: specify a port for a service like ComfyUI. vLLM, etc.
    5. Launchable Name: add a fun name to your Launchable!

    Once you're done, click Generate Launchable to finish creating.

    For inspiration from other Launchable examples

    Check out the Discover page on the console!

    1. Share your Launchable

    You can copy the link to share with others (including in X/Twitter posts, blogs, Discord, etc.) and anyone can use it to launch your Launchable.

    By clicking View Launchable, you can see what your Launchable looks like to others.

    If you want to see metrics of how your Launchables are being used, head back to the Launchables tab and click View All Metrics.

     

    That's it 

    Try creating your own Launchable and reach out to us in the Brev Discord if you need help! We're here for anything you need. Share something amazing with the world. 

     

    Coming Soon!

    • Copy markdown code to add a badge to your README or blog post
    • Ability to edit a Launchable after creation

Creating Launchables on Brev

Launchable Metrics

Leveraging Metrics from Launchables

 

  1. Create a Launchable and share it with others
    In order to get/use metrics from a Launchable, you need to share it with others and get activity on it. Text, post, and share your Launchable link!
  2. View your Launchables and metrics
    Once people view, deploy, and interact with your Launchable, you can see metrics of how it's being used! If you want to see more metrics around usage, reach out to us partners@brev.dev and we'll be happy to help make sure you can get them.

NVIDIA on Brev

Deploying a NVIDIA NIM Inference Microservice on Brev

Launch a NIM on Brev!

 

First off, a short background on NVIDIA NIMs

At their core, NIMs are an easy way to deploy AI on GPUs. Built on the NVIDIA software platform, they are containers that incorporate CUDA, TensorRT, TensorRT-LLM, and Triton Inference Server. NIMs are designed to be highly performant and can be used to accelerate a wide range of applications.

A NIM is a container that provides an interactive API for running blazing fast inference. Deploying a large language model NIM requires 2 key things – the NIM container (which holds the API, server, and runtime layers) and the model engine.

Let's get started with deploying it on Brev!

  1. Create an account
    Make an account on the Brev console.
  2. Launch an instance
    There's 2 ways to deploy a NIM; via a 1-click Launchable or directly yourself on a VM.

    1-click this Launchable and run through the notebook that gets launched. The notebook will walk you through creating a LoRA adapter with NVIDIA's NeMo framework and deploying it via a NIM!

    You can also set up a NIM yourself on a VM. To begin, head over to the Instances tab in the Brev console and click on the blue New + button.

    When creating your instance, select None (VM Mode) in the Select your Container.

    To deploy a NIM, we recommend using an A100 80GB GPU! A NIM has significant VRAM requirements. You can see the different GPUs compatible with running model NIMs here.

    Select a GPU type from the Sandbox tab, or feel free to head over to Advanced to see all of the instance types available on Brev.dev.

    Now, enter a name for your instance and click on the Deploy button. It'll take a few minutes for the instance to deploy - once it says Running, you can access your instance with the Brev CLI.
  3. Connect to and setup your instance
    Brev wraps SSH to make it easy to hop into your instance, so after installing the Brev CLI, run the following command in your terminal.

    SSH into your VM and use default Docker:

 

brev shell <instance-name>

Verify that the VM setup is correct:

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

You'll need to get an NGC API Key to use NIMs.

Let's create an environment variable for it:

export NGC_CLI_API_KEY=<value>

Run one of the following commands to make the key available at startup:

# If using bash
echo "export NGC_CLI_API_KEY=<value>" >> ~/.bashrc

# If using zsh
echo "export NGC_CLI_API_KEY=<value>" >> ~/.zshrc

Docker Login to NGC (to pull the NIM)

echo "$NGC_CLI_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin

Set up the NGC CLI This documentation uses the ngc CLI tool in a number of examples. See the NGC CLI documentation and follow the AMD64 documentation for downloading and configuring the tool.

  1. Time to deploy your first NIM!
    List available NIMs

 

ngc registry image list --format_type csv nvcr.io/nim/meta/*

The following command launches a Docker container for the llama3-8b-instruct model.

# Choose a container name for bookkeeping
export CONTAINER_NAME=Llama3-8B-Instruct

# The container name from the previous ngc registry image list command
Repository=nim/meta/llama3-8b-instruct
Latest_Tag=1.0

# Choose a LLM NIM Image from NGC
export IMG_NAME="nvcr.io/nim/meta/${Repository}:${Latest_Tag}"

# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
# Start the LLM NIM
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-e $NGC_CLI_API_KEY \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME

Note: if you face permission issues, re-try using sudo.

Let's run the NIM!

NIMs are set to run on port 8000 by default (as specified in the above Docker command). In order to expose this port and provide public access, go to your Brev.dev console. In the Access tab in your instance details page, scroll down to Using Tunnels to expose Port 8000 in Deployments.

Click on the URL to copy the link to your clipboard - this URL is your <brev-tunnel-link>.

Run the following command to prompt Llama3-8b-instruct to generate a response to "Once upon a time":

curl -X 'POST' \

    '<brev-tunnel-link>/v1/completions' \

    -H 'accept: application/json' \

    -H 'Content-Type: application/json' \

    -d '{

"model": "meta-llama3-8b-instruct",

"prompt": "Once upon a time",

"max_tokens": 225

}'

You can replace /vi/completions with /v1/chat/completions, /v1/models, /v1/health/ready, or /v1/metrics!

 

You just deployed your first NIM!

Working with NIMs gives you a quick way to get production-grade, OpenAI API specs for your GenAI/LLM apps.

NVIDIA on Brev

NVIDIA x Brev NIMs Hackathon

Let's launch NVIDIA's Llama3 NIM on Brev. This is still in early access!

 

This documentation was referenced and intended for the Llama3 NIM Hackameetup hosted by Brev.dev & NVIDIA in May 2024. For more up to date info, please check out this link.

  1. Create an account
    Make an account on the Brev console.

  2. Redeem your credits
    We'll have the redemption code to get compute credits in your account at the hackathon! Reach out to a Brev team member if you need help finding this. You'll need to redeem your credits here before you can progress.

  3. Launch an instance
    We've set 2 ways to deploy a NIMs instance; an easy way and a more advanced way.

    The easy way is to click this Launchable and to run through the notebook that gets launched. The notebook will walk you through the process of fine-tuning Llama3 with DPO and demonstrate how the NIM deploys it for you!

    The advanced way takes a few more steps, but gives you more clarity on how to set up your own NIMs instance. To begin, head over to the Instances tab in the Brev console and click on the green Create Instance for NIMS Hackathon button.

 

To deploy the Llama3 NIM, we recommend using either an A100 or L40S GPU during the hackathon!

Once you've selected your GPU, you'll need to configure the instance container settings. Click on Advanced Container Settings and then click the slider to enable VM-only mode.

Now, enter a name for your instance and click on the Deploy button. It'll take a few minutes for the instance to deploy - once it says Running, you can access your instance with the Brev CLI.

  1. Connect to your instance
    Brev wraps SSH to make it easy to hop into your instance, so after installing the Brev CLI, run the following command in your terminal.

    To SSH into your VM and use default Docker:

brev shell <instance-name> --host

  1. Time to deploy your first NIM!
    We've already authenticated your instance with NVIDIA's Container Registry.

    First, let's choose a container name for bookkeeping

export CONTAINER_NAME=meta-llama3-8b-instruct

Grab the Llama3-8b-instruct NIM Image from NGC

export IMG_NAME="nvcr.io/mphexwv2ysej/${CONTAINER_NAME}:24.05.rc7"

Choose a system path to cache downloaded models

export NGC_HOME=${NGC_HOME:-~/nim-cache}

mkdir -p $NGC_HOME && chmod 777 $NGC_HOME

Run our tunnel setup script

sh ~/.tunnel-setup.sh

Start the LLM NIM

docker run -ti --rm --name=meta-llama3-8b-instruct \

    --gpus all \

    -e NGC_API_KEY=$NGC_API_KEY \

    -e NIM_MODEL_NAME=nvcr.io/mphexwv2ysej/meta-llama3-8b-instruct \

    -e NIM_MODEL_PROFILE=15fc17f889b55aedaccb1869bfeadd6cb540ab26a36c4a7056d0f7a983bb114f \

    -v $NGC_HOME:/home/nvs/.cache \

    -p 8000:8000 \

    nvcr.io/mphexwv2ysej/meta-llama3-8b-instruct:24.05.rc7

Note: if you face permission issues, re-try using sudo.

Let's run the NIM!

The NIM is set to run on port 8000 by default (as specified in the above Docker command). In order to expose this port and provide public access, go to your Brev.dev console. In the Access tab in your instance details page, scroll down to Using Tunnels to expose Port 8000 in Deployments.

Click on the URL to copy the link to your clipboard - this URL is your <brev-tunnel-link>.

Run the following command to prompt Llama3-8b-instruct to generate a response to "Once upon a time":

curl -X 'POST' \

    '<brev-tunnel-link>/v1/completions' \

    -H 'accept: application/json' \

    -H 'Content-Type: application/json' \

    -d '{

"model": "meta-llama3-8b-instruct",

"prompt": "Once upon a time",

"max_tokens": 225

}'

You can replace /vi/completions with /v1/chat/completions, /v1/models, /v1/health/ready, or /v1/metrics!

 

You just deployed your first NIM!

Working with NIMs gives you a quick way to get production-level, OpenAI API specs during your testing/iteration process. Even with this early access Llama3 NIM, it's easy to see how powerful and fast running this model is! Stay tuned for even more guides using NVIDIA NIMs 

Ollama on Brev

Get Started with Ollama!

Let's launch Ollama on Brev with just one command

 

First off, what is Ollama?

Ollama is an open-source tool that democratizes LLMs by enabling anyone to run them locally on their own machines. Ollama simplifies the complex process of setting up LLMs by bundling model weights, configurations, and datasets into a unified "Modelfile", which you can download and run on your own computer.

 

Why run Ollama on Brev.dev?

Brev allows users to easily provision a GPU and set up a Linux VM. This setup is super ideal for running multiple, sophisticated models via Ollama, providing a seamless experience from model selection to execution.

Together, Ollama and Brev.dev offer a powerful combination for anyone looking to use LLMs without the traditional complexities of setup and optimization. Let's dive into how to get started with Ollama on Brev!

  1. Create an account
    Make an account on the Brev console.
  2. Launch an instance
    Go to your terminal and download the Brev CLI

brew install brevdev/homebrew-brev/brev && brev login

Check out the installation instructions if you need help.

Now run the following command to launch Ollama with a specific model

brev ollama -m <model name>

You can see the full list of available models here.

Hang tight for a couple of minutes, while we provision an instance and load Ollama into it!

  1. Use your Ollama endpoint!
    If you want to use your Ollama endpoint, we'll give you the curl command in your terminal after the instance is ready.

    You just deployed Ollama with one command!

    Working with Ollama gives you a quick way to get a model running. We'll be adding a lot more support for Ollama in the coming months - if you have any special requests, feel free to email us eng@brev.dev and we'll be sure to add it as a feature!

Ollama on Brev

Convert a model to GGUF and deploy on Ollama!

Convert a model to GGUF format!

You can take the code below and run it in a Jupyter notebook.

This guide assumes you already have a model you want to convert to GGUF format and have it in on your Brev GPU instance.

Make sure to fine-tune a model on Brev (or have a model handy that you want to convert to GGUF format) before you start!

We need to pull the llama.cpp repo from GitHub. This step might take a while, so be patient!

!git clone https://github.com/ggerganov/llama.cpp

!cd llama.cpp && git pull && make clean && LLAMA_CUDA=1 make

!pip install -r llama.cpp/requirements.txt

In the following code-block, llama-brev is an example Llama3 LLM that I fine-tuned on Brev. You can replace it with your own model.

!python llama.cpp/convert-hf-to-gguf.py llama-brev

This will quantize your model to 4-bit quantization.

!cd llama.cpp && ./quantize ../llama-brev/ggml-model-f16.gguf
../llama-brev/ggml-model-Q4_K_M.gguf Q4_K_M

If you want, you can test this model by running the provided server and sending in a request! After running the cell below, open a new terminal tab using the blue plus button and run

curl --request POST \

    --url http://localhost:8080/completion \

    --header "Content-Type: application/json" \

    --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'

!cd llama.cpp && ./server -m ../merged_adapters/ggml-model-Q4_K_M.gguf -c 2048

Let's create the Ollama modelfile!

Here, we're going to start by pointing the modelfile to where our quantized model is located. We also add a fun system message to make the model talk like a pirate when you prompt it!

tuned_model_path = "/home/ubuntu/verb-workspace/llama-brev/ggml-model-Q4_K_M.gguf"

cmds = []

base_model = f"FROM {tuned_model_path}"

template = '''TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"
"""'''

params = '''PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"'''

system = f'''SYSTEM """{sys_message}"""'''

cmds.append(base_model)

cmds.append(template)

cmds.append(params)

cmds.append(system)

def generate_modelfile(cmds):

    content = ""

    for command in cmds:

        content += command + "\n"

    print(content)

    with open("Modelfile", "w") as file:

        file.write(content)

generate_modelfile(cmds)

!curl -fsSL https://ollama.com/install.sh | sh

Let's start the Ollama server and push our modelfile to the Ollama registry so you can now run it locally!

!ollama create llama-brev -f Modelfile

Let's run the model on Ollama!

Now that we have our modelfile and Ollama server running, we should use it to run our fine-tuned model on Ollama! This guide assumes you have Ollama already installed and running on your laptop. If you don't, you can follow the instructions here.

To run our fine-tuned model on Ollama, open up your terminal and run:

ollama pull llama-brev

Remember, llama-brev is the name of my fine-tuned model and what I named my modelfile when I pushed it to the Ollama registry. You can replace it with your own model name and modelfile name.

To query it, run:

ollama run llama-brev

Since my system message is a pirate, when I said Hi!, my model responded with: "Ahoy, matey! Ye be lookin' mighty fine today. Hoist the colors and let's set sail on a grand adventure! Arrr!"

You've now taken your fine-tuned model from Brev, converted it to GGUF format, and ran it locally on Ollama!