LlamaEdge
LlamaEdge is the easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge.
- Lightweight inference apps.
 LlamaEdgeis in MBs instead of GBs- Native and GPU accelerated performance
 - Supports many GPU and hardware accelerators
 - Supports many optimized inference libraries
 - Wide selection of AI / LLM models
 
Installation and Setup
See the installation instructions.
Chat models
See a usage example.
from langchain_community.chat_models.llama_edge import LlamaEdgeChatService
API Reference:LlamaEdgeChatService