overview
This solution architecture helps you understand how to build an LLM app powered by Snowpark Container Services and NVIDIA NeMo Inference Service (NIM)
- Download an open source foundation model such as Mistral-7b-instruct from HuggingFace
- Shrink the model size to fit in a smaller GPU (A10G->GPU_NV_M) for inference
- Generate a new model using a model generator on NIM Snowpark container
- Publish Mistral Inference App as internal Snowflake Native Application, that uses Streamlit for the app UI