Chat

This page will teach you how to use our Chat endpoint

Overview

The Chat endpoint provides conversational AI responses. It allows for both streaming and non-streaming responses with multiple AI models.

Making a request

Non-Streaming

import requests
import json

# Define the API endpoint and headers
api_url = "http://ai.is-a.dev/v1/chat/completions"
headers = {
    "Content-Type": "application/json"
}

payload = {
    "model": "llama-3.1-70b-turbo",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about recursion in programming."}
    ]
    "tools": False    # Enables or disables Tools
}

response = requests.post(api_url, headers=headers, data=json.dumps(payload))
completion = response.json()
print(completion['choices'][0]['message']['content'])const api_url = "http://katz.is-a.dev/v1/chat/completions";

const headers = {
  "Content-Type": "application/json"
};

const payload = {
  "model": "llama-3.1-70b-turbo",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Write a haiku about recursion in programming."}
  ],
  "tools": false // Enables or disables Tools
};

async function getCompletion() {
  try {
    const response = await fetch(api_url, {
      method: "POST",
      headers: headers,
      body: JSON.stringify(payload)
    });
    const completion = await response.json();
    console.log(completion.choices[0].message.content);
  } catch (error) {
    console.error(error);
  }
}

curl "https://katz.is-a.dev/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama-3.1-70b-turbo",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Write a haiku about recursion in programming."
            }
        ]
    }'

Streaming

We also provide a way to get real-time responses while LLM is generating your response:

# pip install openai
from openai import OpenAI

client = OpenAI(api_key="", base_url="http://katz.is-a.dev/v1")

stream = client.chat.completions.create(
    model="llama-3.1-70b-turbo",
    messages=[{"role": "user", "content": "Say this is a test"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

// npm install openai
import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: 'https://katz.is-a.dev/v1'
});

async function main() {
    const stream = await openai.chat.completions.create({
        model: "llama-3.1-70b-turbo",
        messages: [{ role: "user", content: "Say this is a test" }],
        stream: true,
    });
    for await (const chunk of stream) {
        process.stdout.write(chunk.choices[0]?.delta?.content || "");
    }
}

main();

You may change the model by replacing model parameter with a model in the Model list

Response Examples

If you disabled streaming, the response is returned as a single JSON payload containing the LLM’s full reply.

{
  "id": "chatcmpl-ab12cd34ef",
  "object": "chat.completion",
  "created": 69420,
  "model": "llama-3.1-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 69,
    "completion_tokens": 420,
    "total_tokens": 69420
  },
  "system_fingerprint": null
}

If Streaming is enabled, the response is streamed as a sequence of JSON objects, each prefixed by data: for Server-Sent Events (SSE) format. Each chunk contains incremental parts of the assistant’s reply, allowing the client to process text as it is generated.

data: {
  "id": "chatcmpl-ab12cd34ef",
  "object": "chat.completion.chunk",
  "created": 69420,
  "model": "llama-3.1-70b-turbo",
  "choices": [
    {
      "delta": {
        "content": "Hello! "
      },
      "index": 0,
      "finish_reason": null
    }
  ]
}

data: {
  "id": "chatcmpl-ab12cd34ef",
  "object": "chat.completion.chunk",
  "created": 69420,
  "model": "llama-3.1-70b",
  "choices": [
    {
      "delta": {
        "content": "How can I assist you today?"
      },
      "index": 0,
      "finish_reason": "stop"
    }
  ]
}

data: [DONE]

Each chunk represents a part of the message content, and the response concludes with the line data: [DONE].

Error Handling

In the event of an error, the server responds with a 500 Internal Server Error. The response body includes a JSON object detailing the error message and status code.

Error Example

{
  "error": "Internal Server Error",
  "status_code": 500
}

This response may indicate various issues, such as missing or malformed input data or server-side processing errors.

PreviousAI NextImage Genaration

Last updated 1 year ago

hashtagOverview

hashtagMaking a request

hashtagNon-Streaming

hashtagStreaming

hashtagResponse Examples

hashtagError Handling

hashtagError Example