Khoj AI: Self host your personal AI second brain

PLUS: How bad is the model vs memory gap?

Howdy fellas!

Spark and Trouble are back after their khoj for more interesting AI scoops! Let’s get right to it then, buckle up.

Here’s a sneak peek into today’s edition 👀

  • 🔎 Product Lab: Decoding Khoj AI

  • 🪄 Magic and Wonder in the Age of AI

  • 🤏 Interesting techniques to compress LLMs for efficient use

Time to jump in!😄

PS: Got thoughts on our content? Share 'em through a quick survey at the end of every edition It helps us see how our product labs, insights & resources are landing, so we can make them even better.

Product Labs🔬: Decoding Khoj AI

Spark and Trouble are both Marvel fans and ever since the launch of ChatGPT have been dreaming of a Jarvis. With each new demo and product, we inch one step closer to having our own Jarvis. For the non-Marvel folks check out Jarvis in action here!

Now meet Khoj - the open-source AI assistant for your digital brain, yet another leap towards personalized AI assistants.

Product Labs: Decoding the AI Matrix - Khoj AI (source: Created by authors)
Tap the pic to get a better view

What’s in it for you?

Spark is a bit offended by the URL of Khoj (
If you get it you get it 😵‍💫

Khoj is an AI co-pilot designed to assist you in finding answers to your questions. Whether you need information from your own notes or online sources, Khoj can help. It merges your personal data with fresh online information, making it a valuable tool for knowledge management. You can use Khoj on the cloud, via WhatsApp, or even self-host it in your own environment. It’s like having a second brain that understands your context and provides relevant insights!

The biggest highlight of Khoj is that it’s Open Source! Yes, you read that right! This means you can always choose to self-host Khoj on your own machine for more privacy.

Khoj has 3 major features:

  • Chat: This is the OG feature of every AI chatbot. You can ask questions, discuss ideas, and seek information. It’s like having a knowledgeable teammate available 24/7. Whether you’re brainstorming, need clarification, or want to explore a topic, Khoj is there to assist you. You can also upload your files to Khoj and choose answers to be grounded from any document in your personal knowledge base. This is a differentiating feature from Copilot, Gemini, or ChatGPT as none of these allow you to maintain a knowledge base of all your files (yet). You can only upload a file that is retained for that conversation.

    Following the standard paradigm to access commands & options, a simple “/” will show you a list of commands.

    The OG chat on Khoj; notice the files uploaded to the left

  • Agents: Very similar to Microsoft Copilot’s GPTs, which we covered in the very first edition of The Vision Debugged (check it out if you missed it). Khoj actually has a few interesting personalities for its Agents - a few regular ones like Health (a medical practitioner), Professor (a tenured professor), Simplify (explain in simple terms) but also some real upbeat ones like Marvin (a depressed robot), Sage (an ancient wise sage) and also a Therapist (a mental health professional).

The versatility of agents is just amazing! 🤯

  • Automation: This is really cool! You can set up Khoj to schedule updates right to your inbox at any frequency - daily, weekly, or even 3 times a week if you please. Again here, Khoj offers a few default ones such as - Daily Weather Update, Market News, the Front Page of Hacker News, and a Weekly Newsletter. You can also edit the Default templates to tailor them to your needs and it’s equally simple to create your own.

Truly a genius, novel feature of setting up automation from any source across the web

The Automation feature is an excellent application of the Hooked framework.

The Hooked Framework is a four-step recipe for building engaging products. It starts with a trigger that prompts users to take action, like needing information or wanting clarity. This action should be effortless and lead to a variable reward, such as a surprising answer or a new perspective. Finally, to solidify the hook, users invest in the product by adding data or completing tasks, making them more likely to return for the cycle to repeat.

The Hooked Framework can be applied to Khoj’s Automation feature as follows:

  1. Trigger: Triggers are pre-set times or events configured by the user. Examples include a specific time of day (e.g., 7 AM for a daily weather update), a day of the week (e.g., Sunday for a weekly newsletter), or an event (e.g., market news when the stock market opens).

  2. Action: Users set up the automation by selecting the type of update (e.g., weather, market news), frequency (e.g., daily, weekly), and delivery method (e.g., email, WhatsApp). They can also edit templates or create custom automation.

  3. Variable Reward: Users receive relevant and timely information without manual searching. Updates are personalized based on user settings, offering a unique and valuable reward.

  4. Investment: Users invest time in setting up and customizing automation, increasing the feature’s value. This setup effort makes Khoj’s updates more relevant, encouraging ongoing engagement and making the feature habit-forming.

Hooked Model (source: Hooked: How to Build Habit-Forming Products)

What’s the intrigue?

Delving into the intriguing aspects of Khoj against other AI assistants like Gemini and Copilot.

  1. Personalization:

    • Khoj stands out by integrating your personal data, such as notes and context, with online information. This personalized approach allows it to tailor responses based on your unique needs.

    • In comparison, Gemini and Copilot also aim for personalization, but their focus is more on generating code or text based on context rather than directly merging personal data.

  2. Open Source:

    • Khoj takes a bold step as an open-source assistant. You can self-host it or use it on their cloud, providing you with more control and flexibility.

    • Gemini and Copilot, while powerful, remain proprietary. Their usage is tied to specific platforms, limiting customization options.

  3. Knowledge Management:

    • Khoj serves as a knowledge management tool, helping you organize and retrieve information effectively. It acts as a second brain, understanding your context and providing relevant insights.

    • Gemini and Copilot excel in generating code and text, but they don’t actively manage your personal knowledge base or integrate it with external sources.

  4. Contextual Understanding:

    • Khoj truly shines in understanding context. Its agents, like Marvin (the depressed robot) or Sage (the ancient wise sage), demonstrate this capability.

    • Gemini and Copilot focus more on the immediate context within a conversation, providing relevant responses based on the current input.

While Gemini and Copilot are remarkable in their own right, Khoj offers a refreshing blend of personalization, openness, knowledge management, and contextual understanding. Spark and Trouble are eagerly waiting for a Jarvis copilot next 😋

Whatcha Got There?!🫣

Buckle up, tech fam! Every week, our dynamic duo “Spark”  & “Trouble”😉 share some seriously cool learning resources we stumbled upon.

Spark’s Selections

😉 Trouble’s Tidbits

Your Wish, Our Command 🙌

You Asked 🙋‍♀️, We Answered ✔️

Question: LLMs are rapidly outgrowing available GPU memory. What are some interesting techniques that can help us make LLMs more memory efficient?

Answer: Running large language models (LLMs) on edge devices such as laptops or even microcontrollers, is a highly sought-after application, especially from the privacy angle. However, with LLMs ballooning in size with each new leap, our GPU capacity isn't keeping pace. This creates a bottleneck for researchers and developers.

Model size x Accelerator Memory till 2022

To address this, folks have come up with several clever approaches. Below, we give you an intuition about some of these techniques:

  1. Model Pruning: Pruning involves removing unnecessary parameters from a pre-trained model. By identifying and trimming less important connections, we can significantly reduce the model size while maintaining performance.

  2. Quantization: Quantization reduces the precision of model weights and activations. Instead of using 32-bit floating-point numbers, we can use 8-bit integers, which saves memory and speeds up inference.

  3. Knowledge Distillation: Transfer knowledge from a large pre-trained model (the “teacher”) to a smaller model (the “student”). The student model learns from the teacher’s predictions, resulting in a more compact yet effective model.

  4. Mixture of Experts (MoE): MoE is a technique where multiple specialized small models (experts) collaborate to solve a problem. Each expert handles a specific subset of inputs. By dynamically selecting the most relevant expert for a given input, MoE reduces redundancy and improves efficiency.

  5. Small Language Models: While large models grab headlines, smaller language models like LLaMA, Alpaca, Vicuna, Mistral, Phi, etc. are essential for practical deployment. Smaller models can fit within GPU memory constraints, making them accessible for various applications.

If you’re really curious & wish to tinker around, check out TinyChatEngine - an on-device LLM inference library that implements several intricate LLM compression techniques that allow you to run models like LLaMA efficiently even on your laptop! Crazy, isn’t it? 😍

Well, that’s a wrap!
Thanks for reading 😊

See you next week with more mind-blowing tech insights 💻

Until then,
Stay Curious🧠 Stay Awesome🤩

PS: Do catch us on LinkedIn - Sandra & Tezan


or to participate.