GGUF

GGUF Interface

NexaTextInference

A class used for loading text models and running text generation.

Methods

run(): Run the text generation loop.
run_streamlit(): Run the Streamlit UI.
create_embedding(input): Embed a string.
create_chat_completion(messages): Generate completion for a chat conversation.
create_completion(prompt): Generate completion for a given prompt.

Arguments

model_path (str): Path or identifier for the model in Nexa Model Hub.
local_path (str, optional): Local path of the model.
embedding (bool): Enable embedding generation.
stop_words (list): List of stop words for early stopping.
temperature (float): Temperature for sampling.
max_new_tokens (int): Maximum number of new tokens to generate.
top_k (int): Top-k sampling parameter.
top_p (float): Top-p sampling parameter.
profiling (bool): Enable timing measurements for the generation process.
streamlit (bool): Run the inference in Streamlit UI.

Example Code

from nexa.gguf import NexaTextInference

model_path = "llama2"
inference = NexaTextInference(
    model_path=model_path,
    local_path=None,
    stop_words=[],
    temperature=0.7,
    max_new_tokens=512,
    top_k=50,
    top_p=0.9,
    profiling=True
)

# run() method
inference.run()

# run_streamlit() method
inference.run_streamlit(model_path)

# create_embedding(input) method
inference.create_embedding("Hello, world!")

# create_chat_completion(messages)
inference.create_chat_completion(
    messages=[{"role": "user", "content": "write a long 1000 word story about a detective"}]
)

# create_completion(prompt)
inference.create_completion("Q: Name the planets in the solar system? A:")

NexaImageInference

A class used for loading image models and running image generation.

Methods

txt2img(prompt): Generate images from text.
img2img(image_path, prompt): Generate images from an image.
run_txt2img(): Run the text-to-image generation loop.
run_img2img(): Run the image-to-image generation loop.
run_streamlit(): Run the Streamlit UI.

Arguments

model_path (str): Path or identifier for the model in Nexa Model Hub.
local_path(str, optional): Local path of the model.
output_path (str): Output path for the generated image.
num_inference_steps (int): Number of inference steps.
width (int): Width of the output image.
height (int): Height of the output image.
guidance_scale (float): Guidance scale for diffusion.
random_seed (int): Random seed for image generation.
streamlit (bool): Run the inference in Streamlit UI.

Example Code

from nexa.gguf import NexaImageInference

model_path = "lcm-dreamshaper"
inference = NexaImageInference(
    model_path=model_path,
    local_path=None,
    num_inference_steps=4,
    width=512,
    height=512,
    guidance_scale=1.0,
    random_seed=0,
)

# txt2img(prompt) method
inference.txt2img("a lovely cat")

# img2img(image_path, prompt) method
inference.img2img(image_path="path/to/local/image", prompt="blue sky")

# run_txt2img() method
inference.run_txt2img()

# run_img2img() method
inference.run_img2img()

# run_streamlit() method
inference.run_streamlit(model_path)

NexaVLMInference

A class used for loading VLM (Vision-Language Model) models and running text generation.

Methods

run(): Run the text generation loop.
run_streamlit(): Run the Streamlit UI.
create_chat_completion(messages): Generate text completion for a given chat prompt.
_chat(user_input, image_path): Generate text about the given image

Arguments

model_path (str): Path or identifier for the model in Nexa Model Hub.
local_path(str, optional): Local path of the model.
stop_words (list): List of stop words for early stopping.
temperature (float): Temperature for sampling.
max_new_tokens (int): Maximum number of new tokens to generate.
top_k (int): Top-k sampling parameter.
top_p (float): Top-p sampling parameter.
profiling (bool): Enable timing measurements for the generation process.
streamlit (bool): Run the inference in Streamlit UI.

Example Code

from nexa.gguf import NexaVLMInference

model_path = "nanollava"
inference = NexaVLMInference(
    model_path=model_path,
    local_path=None,
    stop_words=[],
    temperature=0.7,
    max_new_tokens=2048,
    top_k=50,
    top_p=1.0,
    profiling=True
)

# run() method
inference.run()

# run_streamlit() method
inference.run_streamlit()

# create_chat_completion(messages) method
inference.create_chat_completion(
    messages=[{"role": "user", "content": "write a long 1000 word story about a detective"}]
)

# _chat(user_input, image_path) method
inference._chat(user_input="Describe this image in detail.", image_path="path/to/local/image")

NexaVoiceInference

A class used for loading voice models and running voice transcription.

Methods

run(): Run the voice transcription loop.
run_streamlit(): Run the Streamlit UI.
transcribe(audio_path): Transcribe the audio file into text

Arguments

model_path (str): Path or identifier for the model in Nexa Model Hub.
local_path(str, optional): Local path of the model.
output_dir (str): Output directory for transcriptions.
compute_type (str): Type to use for computation (e.g., float16, int8, int8_float16).
beam_size (int): Beam size to use for transcription.
language (str): The language spoken in the audio.
task (str): Task to execute (transcribe or translate).
temperature (float): Temperature for sampling.

Example Code

from nexa.gguf import NexaVoiceInference

model_path = "faster-whisper-large"
inference = NexaVoiceInference(
    model_path=model_path,
    local_path=None,
    beam_size=5,
    language=None,
    task="transcribe",
    temperature=0.0,
    compute_type="default"
)

# run() method
inference.run()

# run_streamlit() method
inference.run_streamlit()

# _transcribe_audio(audio_path) method
inference.transcribe("path/to/your/audio.wav")

PreviousPython Interface NextONNX

Last updated 27 days ago