I am currently working on interpretation of LLM. How to we construct the structure of LLM. I am also interested in the local deployment of LLM, offline version of ChatGPT. How to we use RAG to improve not only the accuracy but also the richness expression of LLM.