API del client#

Guida API completa: Guida API

Per utilizzare l’API Client, è necessario prima avviare il servizio Xinference con il seguente comando:

>>> xinference
2023-10-17 16:32:21,700 xinference   24584 INFO     Xinference successfully started. Endpoint: http://127.0.0.1:9997
2023-10-17 16:32:21,700 xinference.core.supervisor 24584 INFO     Worker 127.0.0.1:62590 has been added successfully
2023-10-17 16:32:21,701 xinference.deploy.worker 24584 INFO     Xinference worker successfully started.

Nel log dei comandi viene stampato l’indirizzo del servizio, che nel log sopra è http://127.0.0.1:9997. Gli utenti possono connettersi al servizio Xinference tramite Client.

Tutti i modelli sono suddivisi in tipi come LLM, embedding, rerank, ecc. In futuro potrebbero essere supportati ulteriori tipi di modelli.

LLM#

Elenco di tutti i modelli LLM supportati nativamente:

>>> xinference registrations -t LLM

Type    Name                     Language      Ability                        Is-built-in
------  -----------------------  ------------  -----------------------------  -------------
LLM     baichuan                 ['en', 'zh']  ['embed', 'generate']          True
LLM     baichuan-2               ['en', 'zh']  ['embed', 'generate']          True
LLM     baichuan-2-chat          ['en', 'zh']  ['embed', 'generate', 'chat']  True
...

Inizializzare un Large Language Model e conversare con esso.

Xinference Client#

from xinference.client import Client

client = Client("http://localhost:9997")
# The chatglm2 model has the capabilities of "chat" and "embed".
model_uid = client.launch_model(model_name="glm4-chat",
                                model_engine="llama.cpp",
                                model_format="ggufv2",
                                model_size_in_billions=9,
                                quantization="Q4_K")
model = client.get_model(model_uid)

messages = [{"role": "user", "content": "What is the largest animal?"}]
# If the model has "generate" capability, then you can call the
# model.generate API.
model.chat(
    messages,
    generate_config={"max_tokens": 1024}
)

OpenAI Client#

Quando si inviano richieste utilizzando OpenAI, tutte le richieste, ad eccezione della creazione del modello, mantengono la compatibilità con l’interfaccia di OpenAI. Per le modalità di utilizzo di OpenAI, fare riferimento a https://platform.openai.com/docs/api-reference/chat?lang=python

import openai

# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.chat.completions.create(
    model=model_uid,
    messages=[
        {
            "content": "What is the largest animal?",
            "role": "user",
        }
    ],
    max_tokens=1024
)

Chiamata agli strumenti di OpenAI#

import openai

tools = [
    {
        "type": "function",
        "function": {
            "name": "uber_ride",
            "description": "Find suitable ride for customers given the location, "
            "type of ride, and the amount of time the customer is "
            "willing to wait as parameters",
            "parameters": {
                "type": "object",
                "properties": {
                    "loc": {
                        "type": "int",
                        "description": "Location of the starting place of the Uber ride",
                    },
                    "type": {
                        "type": "string",
                        "enum": ["plus", "comfort", "black"],
                        "description": "Types of Uber ride user is ordering",
                    },
                    "time": {
                        "type": "int",
                        "description": "The amount of time in minutes the customer is willing to wait",
                    },
                },
            },
        },
    }
]

# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.chat.completions.create(
    model="chatglm3",
    messages=[{"role": "user", "content": "Call me an Uber ride type 'Plus' in Berkeley at zipcode 94704 in 10 minutes"}],
    tools=tools,
)

Ciao! Hai dimenticato di fornire il testo in cinese semplificato da tradurre. Fornisci il testo e lo tradurrò in italiano seguendo le tue regole.

ChatCompletion(id='chatcmpl-ad2f383f-31c7-47d9-87b7-3abe928e629c', choices=[Choice(finish_reason='tool_calls', index=0, message=ChatCompletionMessage(content="```python\ntool_call(loc=94704, type='plus', time=10)\n```", role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_ad2f383f-31c7-47d9-87b7-3abe928e629c', function=Function(arguments='{"loc": 94704, "type": "plus", "time": 10}', name='uber_ride'), type='function')]))], created=1704687803, model='chatglm3', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=-1, prompt_tokens=-1, total_tokens=-1))

Anthropic Client#

L’indirizzo di accesso all’API di Anthropic è: /anthropic/v1/messages

import anthropic

client = anthropic.Anthropic(
    # defaults to os.environ.get("ANTHROPIC_API_KEY")
    base_url="http://localhost:9997/anthropic",
)
message = client.messages.create(
    model="qwen3",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message.content)

Embedding#

Elenco di tutti i modelli di embedding supportati nativamente:

>>> xinference registrations -t embedding

Type       Name                     Language      Dimensions  Is-built-in
---------  -----------------------  ----------  ------------  -------------
embedding  bge-base-en              ['en']               768  True
embedding  bge-base-en-v1.5         ['en']               768  True
embedding  bge-base-zh              ['zh']               768  True
...

Avvia il modello di embedding e utilizza la vettorizzazione del testo:

Xinference Client#

from xinference.client import Client

client = Client("http://localhost:9997")
# The bge-small-en-v1.5 is an embedding model, so the `model_type` needs to be specified.
model_uid = client.launch_model(model_name="bge-small-en-v1.5", model_type="embedding")
model = client.get_model(model_uid)

input_text = "What is the capital of China?"
model.create_embedding(input_text)

Ciao! Hai dimenticato di fornire il testo in cinese semplificato da tradurre. Fornisci il testo e lo tradurrò in italiano seguendo le tue regole.

{'object': 'list',
 'model': 'da2a511c-6ccc-11ee-ad07-22c9969c1611-1-0',
 'data': [{'index': 0,
 'object': 'embedding',
 'embedding': [-0.014207549393177032,
    -0.01832585781812668,
    0.010556723922491074,
    ...
    -0.021243810653686523,
    -0.03009396605193615,
    0.05420297756791115]}],
 'usage': {'prompt_tokens': 37, 'total_tokens': 37}}

OpenAI Client#

Quando si inviano richieste utilizzando OpenAI, tutte le richieste tranne la creazione del modello rimangono compatibili con l’interfaccia di OpenAI. Per informazioni su come utilizzare OpenAI, fare riferimento a https://platform.openai.com/docs/api-reference/embeddings?lang=python

import openai

# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.embeddings.create(model=model_uid, input=["What is the capital of China?"])

Ciao! Hai dimenticato di fornire il testo in cinese semplificato da tradurre. Fornisci il testo e lo tradurrò in italiano seguendo le tue regole.

CreateEmbeddingResponse(data=[Embedding(embedding=[-0.014207549393177032, -0.01832585781812668, 0.010556723922491074, ..., -0.021243810653686523, -0.03009396605193615, 0.05420297756791115], index=0, object='embedding')], model='bge-small-en-v1.5-1-0', object='list', usage=Usage(prompt_tokens=37, total_tokens=37))

Immagine#

Elenco di tutti i modelli di text-to-image integrati:

>>> xinference registrations -t image

Type    Name                          Family            Is-built-in
------  ----------------------------  ----------------  -------------
image   sd-turbo                      stable_diffusion  True
image   sdxl-turbo                    stable_diffusion  True
image   stable-diffusion-v1.5         stable_diffusion  True
image   stable-diffusion-xl-base-1.0  stable_diffusion  True

Inizializza un modello text-to-image e genera un’immagine tramite prompt.

Xinference Client#

from xinference.client import Client

client = Client("http://localhost:9997")
# The stable-diffusion-v1.5 is an image model, so the `model_type` needs to be specified.
# Additional kwargs can be passed to AutoPipelineForText2Image.from_pretrained here.
model_uid = client.launch_model(model_name="stable-diffusion-v1.5", model_type="image")
model = client.get_model(model_uid)

input_text = "an apple"
model.text_to_image(input_text)

Ciao! Hai dimenticato di fornire il testo in cinese semplificato da tradurre. Fornisci il testo e lo tradurrò in italiano seguendo le tue regole.

{'created': 1697536913,
 'data': [{'url': '/home/admin/.xinference/image/605d2f545ac74142b8031455af31ee33.jpg',
 'b64_json': None}]}

OpenAI Client#

Quando si inviano richieste con OpenAI, ad eccezione della creazione del modello, tutte le altre richieste mantengono la compatibilità con l’interfaccia di OpenAI. Per le modalità di utilizzo di OpenAI, fare riferimento a https://platform.openai.com/docs/api-reference/images/create?lang=python

import openai

# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.images.generate(model=model_uid, prompt="an apple")

Ciao! Hai dimenticato di fornire il testo in cinese semplificato da tradurre. Fornisci il testo e lo tradurrò in italiano seguendo le tue regole.

ImagesResponse(created=1704445354, data=[Image(b64_json=None, revised_prompt=None, url='/home/admin/.xinference/image/605d2f545ac74142b8031455af31ee33.jpg')])

Audio#

Elenco di tutti i modelli di text-to-image integrati:

>>> xinference registrations -t audio

Type    Name               Family    Multilingual    Is-built-in
------  -----------------  --------  --------------  -------------
audio   whisper-base       whisper   True            True
audio   whisper-base.en    whisper   False           True
audio   whisper-large-v3   whisper   True            True
audio   whisper-medium     whisper   True            True
audio   whisper-medium.en  whisper   False           True
audio   whisper-tiny       whisper   True            True
audio   whisper-tiny.en    whisper   False           True

Inizializza un modello vocale e genera testo attraverso la voce:

Xinference Client#

from xinference.client import Client

client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="whisper-large-v3", model_type="audio")
model = client.get_model(model_uid)

input_text = "an apple"
with open("audio.mp3", "rb") as audio_file:
    model.transcriptions(audio_file.read())

Ciao! Hai dimenticato di fornire il testo in cinese semplificato da tradurre. Fornisci il testo e lo tradurrò in italiano seguendo le tue regole.

{
  "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}

OpenAI Client#

Quando si inviano richieste tramite OpenAI, oltre alla creazione del modello, tutte le altre richieste rimangono compatibili con l’interfaccia di OpenAI. Per le modalità di utilizzo di OpenAI, fare riferimento a https://platform.openai.com/docs/api-reference/images/create?lang=python

import openai

# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
with open("audio.mp3", "rb") as audio_file:
    completion = client.audio.transcriptions.create(model=model_uid, file=audio_file)

Ciao! Hai dimenticato di fornire il testo in cinese semplificato da tradurre. Fornisci il testo e lo tradurrò in italiano seguendo le tue regole.

Translation(text=' This list lists the airlines in Hong Kong.')

Rerank#

Caricare il modello di rerank e calcolare la similarità testuale:

from xinference.client import Client

client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="bge-reranker-base", model_type="rerank")
model = client.get_model(model_uid)

query = "A man is eating pasta."
corpus = [
    "A man is eating food.",
    "A man is eating a piece of bread.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
    "A woman is playing violin."
]
print(model.rerank(corpus, query))

Ciao! Hai dimenticato di fornire il testo in cinese semplificato da tradurre. Fornisci il testo e lo tradurrò in italiano seguendo le tue regole.

{'id': '480dca92-8910-11ee-b76a-c2c8e4cad3f5', 'results': [{'index': 0, 'relevance_score': 0.9999247789382935,
 'document': 'A man is eating food.'}, {'index': 1, 'relevance_score': 0.2564932405948639,
 'document': 'A man is eating a piece of bread.'}, {'index': 3, 'relevance_score': 3.955026841140352e-05,
 'document': 'A man is riding a horse.'}, {'index': 2, 'relevance_score': 3.742107219295576e-05,
 'document': 'The girl is carrying a baby.'}, {'index': 4, 'relevance_score': 3.739788007806055e-05,
 'document': 'A woman is playing violin.'}]}