 Apple MLX

MLX

Use MLX for

Running large language models on local Apple Silicon hardware (M1, M2, M3) ARM with unified CPU/GPU memory)

1. Install MLX on MacOS

Mac MX series only

MLX supports GPU acceleration on Apple Metal backend via mlx-lm Python package. Follow Instructions at Install mlx-lm package

2. Load Models with MLX

MLX supports common HuggingFace models directly, but it's recommended to use converted and tested quantized models (depending on your hardware capability) provided by the mlx-community .

Follow Instructions at Install mlx-lm package

Browse the available models HuggingFace
Copy the text from the model page <author>/<model_id> (ex: mlx-community/Meta-Llama-3-8B-Instruct-4bit )
Check model size. Models that can run in CPU/GPU unified memory perform the best.
Follow the instructions to launch the model server Run OpenAI Compatible Server Locally

mlx_lm.server --model <author>/<model_id>

3. Configure LibreChat

Use librechat.yaml Configuration file (guide here) to add MLX as a separate endpoint, an example with Llama-3 is provided.