Running a local RAG stack: Qwen3 4B Instruct quantized to Q4_K_M via llama.cpp, with MiniLM embeddings and FAISS on CPU only—no external inference calls. Responses may take a few seconds—this runs on a free CPU Space.
This assistant runs on a lightweight quantized 4B model for cost-effective self-hosting. While optimized for efficiency, responses may occasionally be imprecise or hallucinate details not present in Bi's actual CV.
Start with a suggested prompt or type your own question to explore the portfolio instantly.
What are his technical skills and areas of expertise?
Tell me about his work experience.
What projects has he worked on recently?
What are his strengths?