GGUF

GGUF Interface

NexaTextInference

A class used for loading text models and running text generation.

Methods

  • run(): Run the text generation loop.

  • run_streamlit(): Run the Streamlit UI.

  • create_embedding(input): Embed a string.

  • create_chat_completion(messages): Generate completion for a chat conversation.

  • create_completion(prompt): Generate completion for a given prompt.

Arguments

  • model_path (str): Path or identifier for the model in Nexa Model Hub.

  • local_path (str, optional): Local path of the model.

  • embedding (bool): Enable embedding generation.

  • stop_words (list): List of stop words for early stopping.

  • temperature (float): Temperature for sampling.

  • max_new_tokens (int): Maximum number of new tokens to generate.

  • top_k (int): Top-k sampling parameter.

  • top_p (float): Top-p sampling parameter.

  • profiling (bool): Enable timing measurements for the generation process.

  • streamlit (bool): Run the inference in Streamlit UI.

Example Code

from nexa.gguf import NexaTextInference

model_path = "llama2"
inference = NexaTextInference(
    model_path=model_path,
    local_path=None,
    stop_words=[],
    temperature=0.7,
    max_new_tokens=512,
    top_k=50,
    top_p=0.9,
    profiling=True
)

# run() method
inference.run()

# run_streamlit() method
inference.run_streamlit(model_path)

# create_embedding(input) method
inference.create_embedding("Hello, world!")

# create_chat_completion(messages)
inference.create_chat_completion(
    messages=[{"role": "user", "content": "write a long 1000 word story about a detective"}]
)

# create_completion(prompt)
inference.create_completion("Q: Name the planets in the solar system? A:")

NexaImageInference

A class used for loading image models and running image generation.

Methods

  • txt2img(prompt): Generate images from text.

  • img2img(image_path, prompt): Generate images from an image.

  • run_txt2img(): Run the text-to-image generation loop.

  • run_img2img(): Run the image-to-image generation loop.

  • run_streamlit(): Run the Streamlit UI.

Arguments

  • model_path (str): Path or identifier for the model in Nexa Model Hub.

  • local_path(str, optional): Local path of the model.

  • output_path (str): Output path for the generated image.

  • num_inference_steps (int): Number of inference steps.

  • width (int): Width of the output image.

  • height (int): Height of the output image.

  • guidance_scale (float): Guidance scale for diffusion.

  • random_seed (int): Random seed for image generation.

  • streamlit (bool): Run the inference in Streamlit UI.

Example Code

from nexa.gguf import NexaImageInference

model_path = "lcm-dreamshaper"
inference = NexaImageInference(
    model_path=model_path,
    local_path=None,
    num_inference_steps=4,
    width=512,
    height=512,
    guidance_scale=1.0,
    random_seed=0,
)

# txt2img(prompt) method
inference.txt2img("a lovely cat")

# img2img(image_path, prompt) method
inference.img2img(image_path="path/to/local/image", prompt="blue sky")

# run_txt2img() method
inference.run_txt2img()

# run_img2img() method
inference.run_img2img()

# run_streamlit() method
inference.run_streamlit(model_path)

NexaVLMInference

A class used for loading VLM (Vision-Language Model) models and running text generation.

Methods

  • run(): Run the text generation loop.

  • run_streamlit(): Run the Streamlit UI.

  • create_chat_completion(messages): Generate text completion for a given chat prompt.

  • _chat(user_input, image_path): Generate text about the given image

Arguments

  • model_path (str): Path or identifier for the model in Nexa Model Hub.

  • local_path(str, optional): Local path of the model.

  • stop_words (list): List of stop words for early stopping.

  • temperature (float): Temperature for sampling.

  • max_new_tokens (int): Maximum number of new tokens to generate.

  • top_k (int): Top-k sampling parameter.

  • top_p (float): Top-p sampling parameter.

  • profiling (bool): Enable timing measurements for the generation process.

  • streamlit (bool): Run the inference in Streamlit UI.

Example Code

from nexa.gguf import NexaVLMInference

model_path = "nanollava"
inference = NexaVLMInference(
    model_path=model_path,
    local_path=None,
    stop_words=[],
    temperature=0.7,
    max_new_tokens=2048,
    top_k=50,
    top_p=1.0,
    profiling=True
)

# run() method
inference.run()

# run_streamlit() method
inference.run_streamlit()

# create_chat_completion(messages) method
inference.create_chat_completion(
    messages=[{"role": "user", "content": "write a long 1000 word story about a detective"}]
)

# _chat(user_input, image_path) method
inference._chat(user_input="Describe this image in detail.", image_path="path/to/local/image")

NexaVoiceInference

A class used for loading voice models and running voice transcription.

Methods

  • run(): Run the voice transcription loop.

  • run_streamlit(): Run the Streamlit UI.

  • transcribe(audio_path): Transcribe the audio file into text

Arguments

  • model_path (str): Path or identifier for the model in Nexa Model Hub.

  • local_path(str, optional): Local path of the model.

  • output_dir (str): Output directory for transcriptions.

  • compute_type (str): Type to use for computation (e.g., float16, int8, int8_float16).

  • beam_size (int): Beam size to use for transcription.

  • language (str): The language spoken in the audio.

  • task (str): Task to execute (transcribe or translate).

  • temperature (float): Temperature for sampling.

Example Code

from nexa.gguf import NexaVoiceInference

model_path = "faster-whisper-large"
inference = NexaVoiceInference(
    model_path=model_path,
    local_path=None,
    beam_size=5,
    language=None,
    task="transcribe",
    temperature=0.0,
    compute_type="default"
)

# run() method
inference.run()

# run_streamlit() method
inference.run_streamlit()

# _transcribe_audio(audio_path) method
inference.transcribe("path/to/your/audio.wav")

Last updated