Chat with AI Assistant

Model specs

Running a local RAG stack: Qwen3 4B Instruct quantized to Q4_K_M via llama.cpp, with MiniLM embeddings and FAISS on CPU only—no external inference calls. Responses may take a few seconds—this runs on a free CPU Space.

Suggested prompts

Note

This assistant runs on a lightweight quantized 4B model for cost-effective self-hosting. While optimized for efficiency, responses may occasionally be imprecise or hallucinate details not present in Bi's actual CV.

Model specs

Suggested prompts

Note

Bi's desk

Hi, I'm Bi's assistant.

Ask me anything about Bi (Bee).

Skills

Experience

Projects

Strengths