What’s ahead for generative AI in 2024? While we don’t foresee true breakthroughs this year, like GPT-4, we do expect steady improvements—and it will be fun to see how innovation starts to harmonize and scale. So, what should you expect? Let’s dive in.
Retrieval-augmented generation (RAG) will mature to become mainstream and power most enterprise use cases. Large language models (LLMs) are approximate databases that mimic patterns in text rather than knowing anything. Developers work around this problem by using retrieval-augmented generation to combine search with LLMs to generate answers. This technique reduces hallucinations and enables users to verify answers through citations embedded in the model’s answers.
Most current proof-of-concepts use simple retrievers, but these are based on cosine similarity, and they often fall short for several use cases. While advanced retrievers exist, there are too many narrow retrievers right now. We expect a consolidation across these retrievers and then the development of orchestrators to pick the right combination of retrievers automatically based on the use case. This step will eventually drive companies to replace enterprise search apps with RAG-based enterprise answers apps.
Multimodality will be a new frontier, evolving chatbots to assistants that can see, listen and talk back. Multimodal learning has become necessary to improve the effectiveness of AI and thus has emerged as a new frontier. We expect a significant focus on multimodality from all generative AI model developers in 2024. With steady improvements, multimodal performance on benchmarks like MMMU, a test designed to assess college-level understanding of a range of tasks, will likely come closer to 80% (current benchmarks are 59.6%). Remember that “fake” Gemini Ultra demo? Those capabilities will become real. We’ll start to see more versatile and engaging interactions and a wider range of tasks that chatbots can effectively perform, including how they understand and respond to users.
10B open-source models will perform at par with GPT-4, enabling ubiquitous local deployment. Smallish models like Mixtral, Solar and Phi-2 have been punching way above their weight. So far, reinforcement learning from human feedback has been the primary limitation for the open-source software (OSS) community because collecting data for fine-tuning is expensive. However, self-play and sample-efficient fine-tuning techniques have led to smaller OSS models performing at par with proprietary models. Once reached, this milestone will make generative AI models ubiquitous because developers can deploy them on any local device, owing to their small size.
We’ll see a wave of wearables powered by LLMs. Most of them will fail. While the software space is crowded, the hardware space is still predominantly mobile phones, so we expect significant interest in imagining and developing new generative AI-based devices ahead. However, these will face severe headwinds on privacy, security, safety and subpar user experiences (UX). UX designers will require several iterations to strike gold—if they have any to find.
Enterprises won’t use OpenAI’s app store for GPTs. We anticipate that the GPT marketplace will be hosted in OpenAI cloud in SaaS mode, just like GPT models. This step is a spin-off of the plugins that OpenAI shut down recently. However, we don’t think enterprises will widely use these GPTs, primarily due to data security concerns (see number 7 for what they will do). Consumers will need help navigating the marketplace to buy GPTs (like their experiences with Alexa Skills).
Domain-specific reasoning and planning AI will trigger yet another wave of enterprise adoption. Generative AI models are still far from achieving reasoning and planning capabilities, and we don’t see a path that will lead them to that anytime soon. However, companies will be interested in how generative AI models could blend with traditional planning and simulation software and learn domain-specific reasoning and planning capabilities with self-play.
Enterprises will start leveraging OSS models and develop custom LLMs with their data. So far, most enterprise adoption has been for proprietary models, like OpenAI and AWS Bedrock, but their adoption has been stunted primarily because of data security and privacy concerns. Once the OSS models cross the GPT-4 threshold, we expect a significant shift toward OSS models. This step will drive finetuning with in-house data to develop custom models.
Enterprises will start realizing productivity gains from generative AI. Here, insights teams will transform data analytics with generative AI-based copilots. We expect turn-around times for analytics questions to be five times faster than they are today. Content generation will heavily leverage LLMs, with 75% of new content based on AI-generated drafts. In customer service, AI agents will become the first “person” a customer talks to, routing only complex queries to human agents.
Existential risk voices will quiet, and the focus will shift to regulation. The EU’s AI Act will pass, and institutions like NIST and OECD will develop standards around risks involved in generative AI models and standards for data used for training these models. Overall, policymakers and regulators will favor content creators more than model developers.
Alignment will continue to be a tough nut to crack, as the work to align models to human expectations needs breakthroughs. New companies and jobs will emerge around teaching LLMs for alignment, including how to curate and create data for alignment. Until there are better answers, we expect companies to live with the costs of ensuring people who use models in the workplace recognize how they may be biased, untruthful or potentially harmful.
Related Reads You’ll Enjoy