LM Studio with Gemma 4

Posted Apr 5, 2026

By Kadek Sena Perdiana 3 min read

LM Studio enables local execution of large language models with a simple and efficient runtime environment. In this lab, we deploy Gemma 4 on a headless Linux server to provide high-performance, server-side inference. Using LM Link, the model can be securely accessed from a MacBook, allowing seamless remote interaction through a lightweight interface without requiring a full local setup.

Installing LM Studio

Here’s the hardware that we will use to run LMS, it has 20 CPU Cores and 100 GB of RAM, but with no dedicated GPU

Next run this command to install LM Studio

curl -fsSL https://lmstudio.ai/install.sh | bash

Then update our .bashrc configuration and refresh the LM Studio Bootstrap

source ~/.bashrc
lms bootstrap

Next to ensure our LM Studio will keep running, we will use tmux

Then inside the tmux, bring the LMS daemon service up

lms daemon up

the server status will stay off because we’re not using the API service

Loading LLM

Now that we have our LM Studio up and running, we can proceed to load the models. Here we use “lms get” to see the available models

Unfortunately because the Gemma models are quite new, the one that we’re looking for (the 2B model) is not yet available via command line, so we go to the lmstudio website to get the huggingface link for a direct download

Here we’re downloading the model with 2B Parameter and 8-bit quantization, meaning the model uses about 2 billion weights for language processing while compressing them into 8-bit precision to reduce memory usage and improve performance with minimal loss in accuracy.

8-bit retains higher accuracy but requires more memory, while 6-bit and especially 4-bit offer greater efficiency at the cost of more noticeable quality degradation.

Next we use “lms import” to import the downloaded model

Then use “lms ls” to see the imported models

Next we need to load the model into memory using “lms load”, this initializes it for inference, making it ready to accept prompts and generate responses

Use “lms ps” to see the loaded models

Now that the model is loaded, we can initiate chat using “lms chat”

Here we try to give it some coding tasks to see how much it’s using the host’s resources, this 2B model only uses around 7 cpu cores and 1 gb of memory

While generating responses at 15 tokens per second, which is not bad

Next, we will run a larger Gemma 4 variant—the 26B model. This model is interesting because, although it contains 26 billion parameters in total, it activates only about 4 billion parameters per token (hence the name 26B 4AB), allowing it to deliver performance comparable to larger models while maintaining inference speeds closer to smaller ones.

After finish downloading, we load the model into memory

And here we start stress testing the model, when given a similar task, it uses similar cpu usage but almost 10 times the memory compared to the 2B model

Its delivering a respectable 12 tokens per second, considering the size of the model, it’s quite speedy compared to the 2B

LM Link

Next we want to play around with this models using our LM Studio App from a remote Laptop, while using the headless linux as the server running the LLM. To do that first we need to login to LM Studio with “lms login”

Copy the generated URL and open on the web browser

After loggin in to the LMS account, go to profile and ensure the LM Link is enabled

Next on the MacBook side, open the LM Studio App and select LM Link then Add Device

Choose the Headless Machine

Back to our linux server, run “lms link enable” to finish the LM Link configuration

Then the linux server should show up on the LM Studio App

Now we can run the models remotely from our laptop, here we try giving it coding task to the 2B model

Because the larger model natively supports reasoning and vision, it can accept images as part of the prompt and use its multimodal understanding to analyze them and generate more informed responses.

Other, Artificial Intelligence

This post is licensed under CC BY 4.0 by the author.

Installing LM Studio

Loading LLM

LM Link

Trending Tags