- The Vision, Debugged;
- Posts
- Khoj AI: Self host your personal AI second brain
Khoj AI: Self host your personal AI second brain
PLUS: How bad is the model vs memory gap?
Howdy fellas!
Spark and Trouble are back after their khoj for more interesting AI scoops! Letās get right to it then, buckle up.
Hereās a sneak peek into todayās edition š
š Product Lab: Decoding Khoj AI
šŖ Magic and Wonder in the Age of AI
š¤ Interesting techniques to compress LLMs for efficient use
Time to jump in!š
PS: Got thoughts on our content? Share 'em through a quick survey at the end of every edition It helps us see how our product labs, insights & resources are landing, so we can make them even better.
Product Labsš¬: Decoding Khoj AI
Spark and Trouble are both Marvel fans and ever since the launch of ChatGPT have been dreaming of a Jarvis. With each new demo and product, we inch one step closer to having our own Jarvis. For the non-Marvel folks check out Jarvis in action here!
Now meet Khoj - the open-source AI assistant for your digital brain, yet another leap towards personalized AI assistants.
Product Labs: Decoding the AI Matrix - Khoj AI (source: Created by authors)
Tap the pic to get a better view
Whatās in it for you?
Spark is a bit offended by the URL of Khoj (https://khoj.dev/).
If you get it you get it šµāš«
Khoj is an AI co-pilot designed to assist you in finding answers to your questions. Whether you need information from your own notes or online sources, Khoj can help. It merges your personal data with fresh online information, making it a valuable tool for knowledge management. You can use Khoj on the cloud, via WhatsApp, or even self-host it in your own environment. Itās like having a second brain that understands your context and provides relevant insights!
The biggest highlight of Khoj is that itās Open Source! Yes, you read that right! This means you can always choose to self-host Khoj on your own machine for more privacy.
Khoj has 3 major features:
Chat: This is the OG feature of every AI chatbot. You can ask questions, discuss ideas, and seek information. Itās like having a knowledgeable teammate available 24/7. Whether youāre brainstorming, need clarification, or want to explore a topic, Khoj is there to assist you. You can also upload your files to Khoj and choose answers to be grounded from any document in your personal knowledge base. This is a differentiating feature from Copilot, Gemini, or ChatGPT as none of these allow you to maintain a knowledge base of all your files (yet). You can only upload a file that is retained for that conversation.
Following the standard paradigm to access commands & options, a simple ā/ā will show you a list of commands.
The OG chat on Khoj; notice the files uploaded to the left
Agents: Very similar to Microsoft Copilotās GPTs, which we covered in the very first edition of The Vision Debugged (check it out if you missed it). Khoj actually has a few interesting personalities for its Agents - a few regular ones like Health (a medical practitioner), Professor (a tenured professor), Simplify (explain in simple terms) but also some real upbeat ones like Marvin (a depressed robot), Sage (an ancient wise sage) and also a Therapist (a mental health professional).
The versatility of agents is just amazing! š¤Æ
Automation: This is really cool! You can set up Khoj to schedule updates right to your inbox at any frequency - daily, weekly, or even 3 times a week if you please. Again here, Khoj offers a few default ones such as - Daily Weather Update, Market News, the Front Page of Hacker News, and a Weekly Newsletter. You can also edit the Default templates to tailor them to your needs and itās equally simple to create your own.
Truly a genius, novel feature of setting up automation from any source across the web
The Automation feature is an excellent application of the Hooked framework.
The Hooked Framework is a four-step recipe for building engaging products. It starts with a trigger that prompts users to take action, like needing information or wanting clarity. This action should be effortless and lead to a variable reward, such as a surprising answer or a new perspective. Finally, to solidify the hook, users invest in the product by adding data or completing tasks, making them more likely to return for the cycle to repeat.
The Hooked Framework can be applied to Khojās Automation feature as follows:
Trigger: Triggers are pre-set times or events configured by the user. Examples include a specific time of day (e.g., 7 AM for a daily weather update), a day of the week (e.g., Sunday for a weekly newsletter), or an event (e.g., market news when the stock market opens).
Action: Users set up the automation by selecting the type of update (e.g., weather, market news), frequency (e.g., daily, weekly), and delivery method (e.g., email, WhatsApp). They can also edit templates or create custom automation.
Variable Reward: Users receive relevant and timely information without manual searching. Updates are personalized based on user settings, offering a unique and valuable reward.
Investment: Users invest time in setting up and customizing automation, increasing the featureās value. This setup effort makes Khojās updates more relevant, encouraging ongoing engagement and making the feature habit-forming.
Hooked Model (source: Hooked: How to Build Habit-Forming Products)
Whatās the intrigue?
Delving into the intriguing aspects of Khoj against other AI assistants like Gemini and Copilot.
Personalization:
Khoj stands out by integrating your personal data, such as notes and context, with online information. This personalized approach allows it to tailor responses based on your unique needs.
In comparison, Gemini and Copilot also aim for personalization, but their focus is more on generating code or text based on context rather than directly merging personal data.
Open Source:
Khoj takes a bold step as an open-source assistant. You can self-host it or use it on their cloud, providing you with more control and flexibility.
Gemini and Copilot, while powerful, remain proprietary. Their usage is tied to specific platforms, limiting customization options.
Knowledge Management:
Khoj serves as a knowledge management tool, helping you organize and retrieve information effectively. It acts as a second brain, understanding your context and providing relevant insights.
Gemini and Copilot excel in generating code and text, but they donāt actively manage your personal knowledge base or integrate it with external sources.
Contextual Understanding:
Khoj truly shines in understanding context. Its agents, like Marvin (the depressed robot) or Sage (the ancient wise sage), demonstrate this capability.
Gemini and Copilot focus more on the immediate context within a conversation, providing relevant responses based on the current input.
While Gemini and Copilot are remarkable in their own right, Khoj offers a refreshing blend of personalization, openness, knowledge management, and contextual understanding. Spark and Trouble are eagerly waiting for a Jarvis copilot next š
Whatcha Got There?!š«£
Buckle up, tech fam! Every week, our dynamic duo āSparkā āØ & āTroubleāš share some seriously cool learning resources we stumbled upon.
āØ Sparkās Selections |
š Troubleās Tidbits |
Your Wish, Our Command š
You Asked šāāļø, We Answered āļø
Question: LLMs are rapidly outgrowing available GPU memory. What are some interesting techniques that can help us make LLMs more memory efficient?
Answer: Running large language models (LLMs) on edge devices such as laptops or even microcontrollers, is a highly sought-after application, especially from the privacy angle. However, with LLMs ballooning in size with each new leap, our GPU capacity isn't keeping pace. This creates a bottleneck for researchers and developers.
Model size x Accelerator Memory till 2022
To address this, folks have come up with several clever approaches. Below, we give you an intuition about some of these techniques:
Model Pruning: Pruning involves removing unnecessary parameters from a pre-trained model. By identifying and trimming less important connections, we can significantly reduce the model size while maintaining performance.
Quantization: Quantization reduces the precision of model weights and activations. Instead of using 32-bit floating-point numbers, we can use 8-bit integers, which saves memory and speeds up inference.
Knowledge Distillation: Transfer knowledge from a large pre-trained model (the āteacherā) to a smaller model (the āstudentā). The student model learns from the teacherās predictions, resulting in a more compact yet effective model.
Mixture of Experts (MoE): MoE is a technique where multiple specialized small models (experts) collaborate to solve a problem. Each expert handles a specific subset of inputs. By dynamically selecting the most relevant expert for a given input, MoE reduces redundancy and improves efficiency.
Small Language Models: While large models grab headlines, smaller language models like LLaMA, Alpaca, Vicuna, Mistral, Phi, etc. are essential for practical deployment. Smaller models can fit within GPU memory constraints, making them accessible for various applications.
If youāre really curious & wish to tinker around, check out TinyChatEngine - an on-device LLM inference library that implements several intricate LLM compression techniques that allow you to run models like LLaMA efficiently even on your laptop! Crazy, isnāt it? š
Well, thatās a wrap! Until then, |
Reply