Putting AI in your AI - Integrating Spring with Ollama AI Models

Even though many people call it a bubble about to burst, it's increasingly obvious that LLM's are here to stay in some form or another for the long term. Previously, working with generative AI applications was mostly a domain reserved for python developers but is getting more and more ubiquitous across many programming languages.

In this series, I'll explore how to integrate AI with Spring, using the new Spring AI project that is about to hit it's 1.0.0 release. For this, we will use Ollama, a popular AI model framework. However Spring AI lets you choose from a lot of different models, so pick your poison. Ollama is a pre-trained model that can be easily integrated into applications.

Conceptual View

Before we begin, let's have a look at some of the concepts we will need to understand to work effectively with Spring AI.

Models

Models are algorithms designed to process and generate information, often mimicking human cognitive functions. These models can make predictions and generate text, images or other form of outputs.

Prompts

If you have used ChatGPT, you might think of a prompt as a simple text input. However it's much more than that. It set's the context for the conversation to follow and can consist of multiple text inputs with different roles within a prompt. Prompt Engineering exists for a reason, and it is not yet clear how to work efficiently with these prompts. You can read more about prompt engineering in this paper on arXiv.

Embeddings

Embeddings are numerical representations of text, images, or videos that capture relationships between inputs. They convert the input to vectors which are designed to capture the meaning of them. By doing so, an application can determine the similarity between the objects. You can image Embeddings as some kind of numerical fingerprint for data. So the words "cat" and "dog" would be vecors close to each other.

The Spring Docs provide the following illustration to grasp the idea a little bit better:

Tokens

Tokens are the building blocks of how an AI model works. Sentences are converted to tokens and back again. A token corresponds to 75% of a word and you are usually charged by tokens. Models are subject to token limits, which restrict the amount of text processed in an API call. This threshold is called the "context window", and the model does not process any text that exceeds this limit.

Function Calling

LLMs are frozen after training, leading to stale knowledge. The function calling mechanism addresses these shortcomings and allows you to register your own functions to connect the LLM to the API of an external system. These Systems can then provide LLMs with real-time data and perform data processing actions on their behalf.

So, now that we have covered some of the fundamentals, let's start with a little walk through to integrate a Spring application with a Ollama AI Model.

Lets walk through this step-by-step.

Setting up your environment

We need the following dependencies for our project. Since the project isn't officially released yet, we also need the spring-milestones repository and add spring-ai to our spring-bom.


	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-actuator</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.ai</groupId>
			<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
		</dependency>
		<dependency>
			<groupId>org.apache.httpcomponents.client5</groupId>
			<artifactId>httpclient5</artifactId>
			<version>5.2.1</version>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
	</dependencies>

	<repositories>
		<repository>
			<id>spring-milestones</id>
			<name>Spring Milestones</name>
			<url>https://repo.spring.io/milestone</url>
			<snapshots>
				<enabled>false</enabled>
			</snapshots>
		</repository>
		<repository>
			<id>spring-snapshots</id>
			<name>Spring Snapshots</name>
			<url>https://repo.spring.io/snapshot</url>
			<releases>
				<enabled>false</enabled>
			</releases>
		</repository>
	</repositories>

	<dependencyManagement>
		<dependencies>
			<dependency>
				<groupId>org.springframework.ai</groupId>
				<artifactId>spring-ai-bom</artifactId>
				<version>1.0.0-SNAPSHOT</version>
				<type>pom</type>
				<scope>import</scope>
			</dependency>
		</dependencies>
	</dependencyManagement>

We also need the following docker-compose to start the ollama model.

services:
	ollama:
		image: ollama/ollama:latest
		ports:
		- 7869:11434
		volumes:
		- .:/code
		- ./ollama/ollama:/root/.ollama
		container_name: ollama
		pull_policy: always
		tty: true
		restart: always
		environment:
			- OLLAMA_KEEP_ALIVE=24h
			- OLLAMA_HOST=0.0.0.0

Since we use the 'mistrel' vector data base, you can exec into your ollama container and run 'ollama pull mistral' as well.

The application.properties look like this:

spring.ai.ollama.base-url=http://localhost:7869
spring.ai.ollama.chat.options.model=mistral
spring.ai.ollama.chat.options.temperature=0.7

Now that everything is into place, let's look how we can interact with the model.

Build a Restcontroller to interact with the Model

We will create a simple AIController class that defines the following endpoints:

/ai/generate: This endpoint will accept a custom message (or use a default) and return the model's response.
/ai/generateStream: This endpoint will stream results from the model as they are generated, useful for tasks like real-time chat or longer responses.

Let's dive into the code:

@RestController
class AIController {

	private final OllamaChatModel chatModel;

	public AIController(OllamaChatModel chatModel) {
		this.chatModel = chatModel;
	}

	@GetMapping("/ai/generate")
	public Map<String, String> generate(@RequestParam(value = "message", defaultValue = "What is the answer to the ultimate question of life, the universe, and everything") String message) {
	return Map.of("generation", chatModel.call(message));
}

@GetMapping("/ai/generateStream")
public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "What is the answer to the ultimate question of life, the universe, and everything") String message) {
	var prompt = new Prompt(new UserMessage(message));
	return chatModel.stream(prompt);
}

}

We inject an instance of OllamaChatModel, which will handle all interactions with the AI model. The model is set up using the configurations from our application.properties.

And that is basically it! Now we can start the application and test some sample requests against the controller. If we don't provide a message, we will get this standard prompt

~  curl localhost:8080/ai/generate
{"generation":" According to Douglas Adams' science fiction novel \"The Hitchhiker's Guide to the Galaxy\", the answer to the Ultimate Question of Life, the Universe, and Everything is 42. However, it's important to note
that this number isn't intended as a serious answer but rather a humorous commentary on humanity's search for meaning in a seemingly meaningless universe. The actual question itself was never revealed in the book."}

Or we can prompt with a different message that will return an incredibly hilarious joke.

~  curl -X GET "http://localhost:8080/ai/generate?message=Tell%20me%20a%20joke"
{"generation":" Why don't scientists trust atoms? Because they make up everything! (Except jokes, those are al  l protons and neutrons)"}%

6. Conclusion

Now we have our first little spring application integrated with an AI Model! From here on out we can build on this foundation and we will have a look at text-to-image generation in the next post. Stay tuned!

As always, if you want to play around with the project, you can find it on my Github. I'll continue this series with some more examples in the future. Hope you enjoyed!

Sources:

Spring Docs.