# Chat

### Overview

The Chat endpoint provides conversational AI responses. It allows for both streaming and non-streaming responses with multiple AI models.

***

### Making a request

#### Non-Streaming

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

# Define the API endpoint and headers
api_url = "http://ai.is-a.dev/v1/chat/completions"
headers = {
    "Content-Type": "application/json"
}

payload = {
    "model": "llama-3.1-70b-turbo",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about recursion in programming."}
    ]
    "tools": False    # Enables or disables Tools
}

response = requests.post(api_url, headers=headers, data=json.dumps(payload))
completion = response.json()
print(completion['choices'][0]['message']['content'])const api_url = "http://katz.is-a.dev/v1/chat/completions";
```

{% endtab %}

{% tab title="Javascript" %}

```javascript
const headers = {
  "Content-Type": "application/json"
};

const payload = {
  "model": "llama-3.1-70b-turbo",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Write a haiku about recursion in programming."}
  ],
  "tools": false // Enables or disables Tools
};

async function getCompletion() {
  try {
    const response = await fetch(api_url, {
      method: "POST",
      headers: headers,
      body: JSON.stringify(payload)
    });
    const completion = await response.json();
    console.log(completion.choices[0].message.content);
  } catch (error) {
    console.error(error);
  }
}
```

{% endtab %}

{% tab title="Curl" %}

<pre class="language-bash"><code class="lang-bash"><strong>curl "https://katz.is-a.dev/v1/chat/completions" \
</strong>    -H "Content-Type: application/json" \
    -d '{
        "model": "llama-3.1-70b-turbo",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Write a haiku about recursion in programming."
            }
        ]
    }'
</code></pre>

{% endtab %}
{% endtabs %}

#### Streaming

We also provide a way to get real-time responses while LLM is generating your response:

{% tabs %}
{% tab title="Python" %}

```py
# pip install openai
from openai import OpenAI

client = OpenAI(api_key="", base_url="http://katz.is-a.dev/v1")

stream = client.chat.completions.create(
    model="llama-3.1-70b-turbo",
    messages=[{"role": "user", "content": "Say this is a test"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")
```

{% endtab %}

{% tab title="JavaScript" %}

```javascript
// npm install openai
import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: 'https://katz.is-a.dev/v1'
});

async function main() {
    const stream = await openai.chat.completions.create({
        model: "llama-3.1-70b-turbo",
        messages: [{ role: "user", content: "Say this is a test" }],
        stream: true,
    });
    for await (const chunk of stream) {
        process.stdout.write(chunk.choices[0]?.delta?.content || "");
    }
}

main();
```

{% endtab %}
{% endtabs %}

You may change the model by replacing model parameter with a model in the [Model list](https://katz.is-a.dev/v1/models)

#### Response Examples

{% tabs %}
{% tab title="Non-Streaming" %}
If you disabled streaming, the response is returned as a single JSON payload containing the LLM’s full reply.

```json
{
  "id": "chatcmpl-ab12cd34ef",
  "object": "chat.completion",
  "created": 69420,
  "model": "llama-3.1-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 69,
    "completion_tokens": 420,
    "total_tokens": 69420
  },
  "system_fingerprint": null
}
```

{% endtab %}

{% tab title="Streaming" %}
If Streaming is enabled, the response is streamed as a sequence of JSON objects, each prefixed by `data:` for Server-Sent Events (SSE) format. Each chunk contains incremental parts of the assistant’s reply, allowing the client to process text as it is generated.

```json
data: {
  "id": "chatcmpl-ab12cd34ef",
  "object": "chat.completion.chunk",
  "created": 69420,
  "model": "llama-3.1-70b-turbo",
  "choices": [
    {
      "delta": {
        "content": "Hello! "
      },
      "index": 0,
      "finish_reason": null
    }
  ]
}

data: {
  "id": "chatcmpl-ab12cd34ef",
  "object": "chat.completion.chunk",
  "created": 69420,
  "model": "llama-3.1-70b",
  "choices": [
    {
      "delta": {
        "content": "How can I assist you today?"
      },
      "index": 0,
      "finish_reason": "stop"
    }
  ]
}

data: [DONE]
```

Each chunk represents a part of the message content, and the response concludes with the line `data: [DONE]`.
{% endtab %}
{% endtabs %}

### Error Handling

In the event of an error, the server responds with a `500 Internal Server Error`. The response body includes a JSON object detailing the error message and status code.

#### Error Example

```json
{
  "error": "Internal Server Error",
  "status_code": 500
}
```

This response may indicate various issues, such as missing or malformed input data or server-side processing errors.
