Replies: 2 comments
-
Hi there, Before any fine-tuning, I would probably also consider RAG (without and then with quantization) here because the answers are strictly based on the documents you provide. And then after RAG, I'd probably continue pretraining and/or finetune and see whether it makes it better or worse. |
Beta Was this translation helpful? Give feedback.
-
Yes, it’s definitely possible to run an offline learning app on mid-range Android phones (3–8 GB RAM). The trick is to use a small, efficient model. Models in the 1–3B range, quantized to 4-bit (like Llama-3-3B The smart setup is: Ship one base model once, then add tiny LoRA adapters (course packs, just a few MB each). Use a local retrieval system (RAG) so the model always refers to the actual textbook instead of guessing. Run it all with llama.cpp |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am exploring the development of an offline educational mobile app for students in areas where data internet is not really accessible.
The app would allow students (Grade 6 to University) to download the courses of a single year.
Each pack would include a small LLM model (or adapter) that runs fully offline on mid-range Android smartphones.
Once downloaded, the app should work 100% offline (no cloud access required), with good performance and minimal latency.
i want the LLM to be able to answer questions based on the course material and help students solve exercises, with minimal to low hallucinations.
My question:
Is this technically feasible on typical mid-range smartphones used in countries where the average phone is (3-8 GB RAM, ~128-256 GB storage) ?
Which model architecture strategy (quantization, LoRA adapters, small fine-tuned model, etc.) would you recommend for this use case?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions