VentureBeat

Napkin AI’s ‘design agency’ of AI agents is changing how professionals create graphics

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Graphic design company Napkin AI is carving out a unique path in an exciting frontier area of vertical AI agent applications. A user can type text in Napkin AI’s web site, and its model generates a graphic that represents text within five seconds. What’s fascinating is that under the hood, Napkin is doing this by taking the different traditional jobs of a design agency — copywriter, designer, illustrator, brand stylist — and replicating those discrete functions with individual AI agents, instead of with humans. The product has gotten impressive traction since launching in August. It has 2 million beta users, double the number of users just six weeks ago, according to Pramod Sharma, Napkin co-founder and CEO. “We’ve taken a slightly different angle,” he said in an interview with VentureBeat. “We didn’t start with: ‘Let’s look at an image model and see what it can do.’ In fact, for us that was an afterthought. It’s really about what it takes to create a graphic, and how it’s done today, and work backwards.” Napkin AI is part of a trend toward vertical AI agents Napkin is part of a growing number of startups that are popping up to serve vertical areas with products that are not driven by the incumbent model of SaaS, but by vertical AI agents that are under the hood. Napkin shows how productive these agentic companies be. The company has a team of 12 working remotely, with Sharma the only one living in the SF Bay Area. These companies also promise to be highly disruptive, because they are so much more customizable and powerful for their specific use cases. For a deeper dive into Napkin AI’s approach, including insights from its co-founders on how their agentic system works, check out my conversation with Sam Witteveen, an AI agent developer, and the Napkin team: What seems to set Napkin apart from the competition is its focus on serving a specific need: Helping professionals who aren’t graphic design experts to create pretty designs, mainly for PowerPoint presentations. These users want diagrams and other illustrations, and not just the slick images produced by a lot of generative AI providers — and they want to be able to edit these images easily and simply. And that’s what Napkin does: After providing its best shot back to the user within five seconds, it lets the user edit it for things like style, color and design type. Example of an image generated by Napkin AI Napkin AI represents a third way Napkin doesn’t use diffusion AI models like most other image providers, Sharma said, because those models don’t allow users to easily edit unique elements of illustrations, for example the slices of a pie chart, or surrounding text. By undergirding the Napkin product with agents that serve specific, useful functions, Napkin’s approach represents a “third way.”  The “first way,” taken by incumbent graphic-design contemporaries like Adobe or Canva, is to bolt AI tools onto traditional design workflows. Napkin doesn’t do this. It is gen AI-first, in that it uses the technology to create the best visual first-draft that it can, based on a user’s prompt. It then simplifies the remaining editing process, keeping in mind that most users don’t have advanced design skills — the kind you need, for example, to figure out Adobe Creative Cloud. Neither is Napkin following the “second way,” that of the new breed of AI image and video companies — like MidJourney, Stable Diffusion, Runway, Ideogram and others — that pride themselves on being AI-first, and use massive diffusion models to bamboozle users with high-quality images or videos. It’s often not clear how they differentiate from each other. Napkin, however, is determined not to fall under the swoon of marvelous technology for the sake of it, because that doesn’t put users first, Sharma noted. Here’s how Napkin AI works: It allows users to paste a text description — whether it’s a presentation prompt, a blog excerpt or brainstorming notes — and receive multiple high-quality graphic options in seconds. These graphics are not mere templates but customizable designs, with editable fonts, colors and layouts — but they are easy to use, with sliding tools. The product eschews the huge menu bar with the hundreds of options provided by more complex tools like Figma or Canva. After creating an image, Napkin allows you to export it in an PNG, PDF or SVG format. Napkin AI has four sub-agents under the hood More interesting, though, is how the agents are working under the hood: Napkin uses an orchestrator large language model (LLM), driven mainly by OpenAI’s GPT-4o mini, to respond to a user’s prompt. This LLM acts as an agent, delegating jobs to a series of other sub-agents that have specific responsibilities. The first “text” agent suggests some text that can be used in the design. The second “layout” agent looks at the text, and decides on a specific design layout that would be best for that text. A third “icon and illustration” agent checks a database to see if there’s an icon that matches the text request, and if there isn’t, it might generate an icon on the fly. Finally, there’s a fourth “style” agent, which lets users customize the design with their own corporate colors and style. As Sharma explains it, Napkin doesn’t put too many constraints on these four agents, other than to maximize for quality and speed. Responding within five seconds is key to meeting customer need, said Sharma. Each agent contributes to the overall composition, ensuring the generated graphic is not only aesthetically pleasing but tailored to the user’s intent. The fourth styling agent will be introduced into the product next week, and there will be improvements over time, Sharma explained. Soon, users will be able to upload a screenshot or other documents of their corporate styling, so that an image model can automatically generate images in that style. Sharma

Napkin AI’s ‘design agency’ of AI agents is changing how professionals create graphics Read More »

Replit and Anthropic’s AI just helped Zillow build production software—without a single engineer

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Replit has transformed non-technical employees at Zillow into software developers. The real estate giant now routes over 100,000 home shoppers to agents using applications built by team members who had never written code before. This breakthrough stems from Replit’s new partnership with Anthropic and Google Cloud, which has enabled over 100,000 applications on Google Cloud Run. The collaboration integrates Anthropic’s Claude AI model with Google Cloud’s Vertex AI platform, allowing anyone with an idea to create custom software. Wow — Replit powers production routes on Zillow, which was built by a non-coder!! https://t.co/mgtDYLfbg6 — Amjad Masad (@amasad) November 22, 2024 How Zillow’s marketing team became software developers overnight “We’re witnessing a transformation in how businesses create software solutions,” said Michele Catasta, Replit’s president, in an exclusive interview with VentureBeat. “Our platform is increasingly being adopted by teams across marketing, sales and operations who need custom solutions that pre-built software can’t provide.” The initiative addresses the growing global developer shortage, expected to reach 4 million by 2025. Companies can now empower non-technical teams to build their own solutions rather than waiting for scarce developer resources. Claude’s sophisticated approach to code generation sets this partnership apart. “Claude excels at producing clean, maintainable code while understanding complex systems across multiple languages and frameworks,” Michael Gerstenhaber, Anthropic’s product VP, told VentureBeat. “It approaches problems strategically, often stepping back to analyze the bigger picture rather than rushing to add code.” Built 2 new internal systems for my team this week (leave requests/customer support) using code generated by Claude. Took me 1 day in total & saved us $5-10K in consultant costs. If an english/psychology grad like me can use code to build stuff, any wordcel can. — Claire Lehmann (@clairlemon) February 7, 2025 How Replit, Anthropic and Google Cloud are making AI coding secure and scalable Replit handles security and reliability concerns through Google Cloud’s enterprise infrastructure. “We’ve built our security framework on a foundation of enterprise-grade infrastructure through Google Cloud’s Vertex AI platform,” Catasta said. “This allows us to offer accessible AI development tools while maintaining stringent security standards.” The partnership demonstrates significant advances in AI capabilities. Claude 3.5 Sonnet improved performance on SWE-bench Verified from 33% to 49%, surpassing many publicly available models. These technical improvements enable users to create everything from personal productivity tools to enterprise applications. Both companies emphasize AI augmentation over automation. “AI’s biggest potential is to augment and enhance human capabilities, rather than simply replacing them,” Gerstenhaber said. “For developer teams, Claude acts as an expert virtual assistant that can dramatically accelerate project timelines — reducing weeks-long projects to days.” Almost paid $100/year for an app I needed (export 1000s of saved posts/bookmarks to a spreadsheet), then thought “hmm I wonder if Claude could make this for me.” 10 minutes later, the app works and I have a CSV of everything I’ve ever saved. Wild! — Kevin Roose (@kevinroose) February 7, 2025 The future of software development: Replit’s AI puts coding in everyone’s hands Replit’s tools could transform who gets to build and sell software. A teenager in rural India recently created an app using just their smartphone, earned enough to buy their first laptop, and now builds software for companies worldwide. Stories like this suggest a future where anyone with an internet connection can turn their ideas into working software — regardless of their technical background or location. Challenges persist. The platform must balance accessibility with code quality and security while ensuring AI-generated solutions remain maintainable and scalable. Success could establish new standards for custom software development in the AI era. The global custom software development market will reach more than $700 billion by 2028, according to industry analysts. Replit’s AI-powered approach could determine who participates in this expanding market. Early results show promise. Companies have built their own employee time-off trackers and help-desk systems within days, tasks that previously took months of development. Some independent developers have created and launched new applications using just their phones, showing how the platform makes software development accessible to more people. In an industry known for high barriers to entry, this partnership between Replit, Anthropic and Google Cloud opens software development to anyone with an idea. The implications extend beyond traditional technology companies to reshape how businesses across industries build and deploy custom solutions. The next billion software creators may not know how to code — and that might be exactly the point. source

Replit and Anthropic’s AI just helped Zillow build production software—without a single engineer Read More »

Researchers find you don’t need a ton of data to train LLMs for reasoning tasks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Large language models (LLMs) can learn complex reasoning tasks without relying on large datasets, according to a new study by researchers at Shanghai Jiao Tong University. Their findings show that with just a small batch of well-curated examples, you can train an LLM for tasks that were thought to require tens of thousands of training instances.  This efficiency is due to the inherent knowledge that modern LLMs obtain during the pre-training phase. With new training methods becoming more data- and compute-efficient, enterprises might be able to create customized models without requiring access to the resources of large AI labs. Less is more (LIMO) In their study, the researchers challenge the assumption that you need large amounts of data to train LLMs for reasoning tasks. They introduce the concept of “less is more” (LIMO). Their work builds on top of previous research that showed LLMs could be aligned with human preferences with a few examples. Less is More (LIMO) for reasoning (source: arXiv) In their experiments, they demonstrated that they could create a LIMO dataset for complex mathematical reasoning tasks with a few hundred training examples. An LLM fine-tuned on the dataset was able to create complex chain-of-thought (CoT) reasoning chains that enabled it to accomplish the tasks at a very high success rate. For example, a Qwen2.5-32B-Instruct model fine-tuned on 817 training examples chosen based on LIMO reached 57.1% accuracy on the highly challenging AIME benchmark and 94.8% on MATH, outperforming models that were trained on a hundred times more examples. It also scored higher on the benchmarks than reasoning models such as QwQ-32B-Preview (a version of the Qwen model that has been trained for reasoning) and OpenAI o1-preview, both of which have been trained with larger data and compute resources. Moreover, LIMO-trained models generalize to examples drastically different from their training data. For example, on the OlympiadBench scientific benchmark, the LIMO model outperformed QwQ-32B-Preview, and on the challenging GPQA benchmark, it achieved 66.7% accuracy, close to OpenAI-o1-preview’s leading score of 73.3%. What does it mean for enterprise AI? Customizing LLMs is an attractive use case for enterprise applications. Thanks to techniques such as retrieval-augmented generation (RAG) and in-context learning, LLMs can be customized to use bespoke data or perform new tasks without the need for expensive fine-tuning.  However, reasoning tasks often require training and fine-tuning LLMs. The widely-held belief has been that such tasks require large volumes of training examples with highly detailed reasoning chains and solutions. Creating such datasets is slow and impractical for many applications and companies. More recently, researchers have shown that pure reinforcement learning approaches can enable models to train themselves for reasoning tasks by generating many solutions and choosing the ones that work best. While this approach requires less manual effort, it still demands expensive compute resources that are beyond the reach of many enterprises. On the other hand, crafting a few hundred examples is an endeavor that many companies can tackle, bringing specialized reasoning models within the reach of a wider range of organizations. “This discovery has profound implications for artificial intelligence research: It suggests that even competition-level complex reasoning abilities can be effectively elicited through minimal but curated training samples,” the researchers write. Why LIMO works In their experiments, the researchers identify two key reasons why LLMs can learn complex reasoning tasks with fewer examples. First, state-of-the-art foundation models have been trained on a very large amount of mathematical content and code during pre-training. This means that these LLMs already possess rich reasoning knowledge in their parameters that can be activated through carefully-crafted examples. Second, new post-training techniques have shown that allowing models to generate extended reasoning chains significantly improves their reasoning ability. In essence, giving the models more time to “think” allows them to unpack and apply their pre-trained knowledge more effectively. “We hypothesize that successful reasoning emerges from the synergy of these two factors: rich pre-trained knowledge and sufficient computational resources at inference time,” the researchers write. “These developments collectively suggest a striking possibility: If models possess rich reasoning knowledge and are given adequate computational space, then activating their reasoning capabilities may require only a small number of high-quality training samples that encourage extended deliberation, rather than massive fine-tuning datasets.” Choosing more complex problems to include in the training dataset can have a significant effect on the trained model’s accuracy in reasoning tasks (source: arXiv) According to the researchers’ findings, creating useful LIMO datasets hinges on choosing the right problems and solutions. Data curators should prioritize challenging problems that require complex reasoning chains, diverse thought processes and knowledge integration. The problems should also deviate from the model’s training distribution to encourage new reasoning approaches and force it toward generalization. Accordingly, solutions should be clearly and well-organized, with the reasoning steps adapted to the complexity of the problem. High-quality solutions should also provide strategic educational support by gradually building understanding through carefully structured explanations.  “By focusing on a minimal yet meticulously curated set of reasoning chains, we embody the core principle of LIMO: High-quality demonstrations, rather than sheer data volume, are key to unlocking complex reasoning capabilities,” the researchers write. The researchers have released the code and data used to train the LIMO models in their experiments. In the future, they plan to expand the concept to other domains and applications. source

Researchers find you don’t need a ton of data to train LLMs for reasoning tasks Read More »

Create the future with AI: Join Microsoft at NVIDIA GTC

Presented by Microsoft and NVIDIA AI is producing tangible results for business at an astonishing rate and scale, which means the new question becomes, how do we tap into that potential?  But building a robust AI strategy is more than just adopting new technology. It’s about fostering a culture that prioritizes innovation, ensuring security at scale, equipping developers with the tools to succeed, and balancing cutting-edge innovation, secure deployment and developer empowerment. By leveraging a broad selection of models, ensuring high-quality deployment, and harnessing the power of strategic partnerships, organizations can create AI solutions that drive real business value. Microsoft is an elite sponsor at this year’s NVIDIA GTC AI Conference March 17 – 21, where company leaders will showcase the power of Microsoft Azure AI, an end-to-end AI platform that lets businesses of every size innovate quickly, securely and responsibly. The NBA chose Azure OpenAI Service accelerated by NVIDIA to easily incorporate OpenAI models into its applications, speeding up the time to market for new, innovative features. Helping fans connect with the league in the way they want, with personalized, localized insights, the NBA is keeping at the forefront of a great fan experience. BMW created a mobile data recorder (MDR) solution, placing an IoT device in each development car to transmit data over a cellular connection to an Azure cloud platform, where Azure AI solutions facilitate efficient data analysis. The vehicle data covered by the system has doubled, and data delivery and analysis happen 10 times faster. New York City–based software developer OriGen is revolutionizing the energy industry with proprietary AI models supported by Microsoft Azure AI infrastructure. Using Azure AI infrastructure, OriGen has fast, easy access to the compute resources required to drive its NVIDIA GPU-based solutions and the means to deploy its powerful offering as a software as a service platform. Microsoft and NVIDIA’s powerful technology partnership elevates the performance and scale of Azure AI services in a way other cloud providers can’t match — and it’s available to every Azure customer. Developers can leverage the latest AI models from Azure OpenAI Service, NVIDIA NIM and NVIDIA Foundation Models, all accessible via simple, up-to-the-minute APIs. Come see the Azure AI and NVIDIA AI  in action and get hands on with the latest technologies. Microsoft is a proud elite sponsor of NVIDIA GTC 2025, the premier developer conference at the heart of AI. What’s happening at NVIDIA GTC, March 17-21, San Jose Whether you’re a developer, researcher or business leader, the NVIDIA GTC AI Conference is the best opportunity to explore the future of AI, experience Azure and NVIDIA AI solutions and interact with 25,000 of the brightest minds in AI and accelerated computing. Taking place March 17-21 at the San Jose Convention Center, the conference offers over 900 sessions, 300+ exhibits, unique networking events and free two-hour workshops and training labs, sponsored by Microsoft Azure. Visit Microsoft at booth #514 and experience the latest in AI services and infrastructure. Join live discussion sessions, connect with Microsoft and partner AI experts and try out the latest AI technology and hardware. Plus, attend Microsoft talks and panels to learn about Azure’s end-to-end AI platform and how to accelerate the development and delivery of your AI innovations. To see a full list of conference sessions and add them to your calendar, visit the Microsoft Azure blog. Talks and panels S71145: Wired for AI: Lessons from Networking 100K+ GPU AI Data Centers and Clouds S73232: Physical AI for the Next Frontier of Industrial Digitalization S72355: Harnessing AI Agents for Enterprise Success: insights from AI Experts S72436: Building a 3D image-based search system for medical images: how foundational models can help? S71521: Build Secure and Scalable GenAI Applications with Databases and NVIDIA AI S71676: Accelerating AI Pipelines: How NVIDIA Tools Boost Bing Visual Search Efficiency S72905: Accelerating DiskANN Vector Search on GPUs S72435: Explore AI-Assisted Developer Tools for Accelerated Computing Application Development Registration is open now! Join Microsoft at GTC to discover what’s next in AI and accelerated computing. Visit the Microsoft blog for more details and register today. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact [email protected]. source

Create the future with AI: Join Microsoft at NVIDIA GTC Read More »

A look under the hood of transfomers, the engine driving AI model evolution

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Today, virtually every cutting-edge AI product and model uses a transformer architecture. Large language models (LLMs) such as GPT-4o, LLaMA, Gemini and Claude are all transformer-based, and other AI applications such as text-to-speech, automatic speech recognition, image generation and text-to-video models have transformers as their underlying technology.   With the hype around AI not likely to slow down anytime soon, it’s time to give transformers their due, which is why I’d like to explain a little about how they work, why they are so important for the growth of scalable solutions and why they are the backbone of LLMs.   Transformers are more than meets the eye  In brief, a transformer is a neural network architecture designed to model sequences of data, making them ideal for tasks such as language translation, sentence completion, automatic speech recognition and more. Transformers have really become the dominant architecture for many of these sequence modeling tasks because the underlying attention-mechanism can be easily parallelized, allowing for massive scale when training and performing inference.   Originally introduced in a 2017 paper, “Attention Is All You Need” from researchers at Google, the transformer was introduced as an encoder-decoder architecture specifically designed for language translation. The following year, Google released bidirectional encoder representations from transformers (BERT), which could be considered one of the first LLMs — although it’s now considered small by today’s standards.  Since then — and especially accelerated with the advent of GPT models from OpenAI — the trend has been to train bigger and bigger models with more data, more parameters and longer context windows.    To facilitate this evolution, there have been many innovations such as: more advanced GPU hardware and better software for multi-GPU training; techniques like quantization and mixture of experts (MoE) for reducing memory consumption; new optimizers for training, like Shampoo and AdamW; techniques for efficiently computing attention, like FlashAttention and KV Caching. The trend will likely continue for the foreseeable future.  The importance of self-attention in transformers Depending on the application, a transformer model follows an encoder-decoder architecture. The encoder component learns a vector representation of data that can then be used for downstream tasks like classification and sentiment analysis. The decoder component takes a vector or latent representation of the text or image and uses it to generate new text, making it useful for tasks like sentence completion and summarization. For this reason, many familiar state-of-the-art models, such the GPT family, are decoder only.    Encoder-decoder models combine both components, making them useful for translation and other sequence-to-sequence tasks. For both encoder and decoder architectures, the core component is the attention layer, as this is what allows a model to retain context from words that appear much earlier in the text.   Attention comes in two flavors: self-attention and cross-attention. Self-attention is used for capturing relationships between words within the same sequence, whereas cross-attention is used for capturing relationships between words across two different sequences. Cross-attention connects encoder and decoder components in a model and during translation. For example, it allows the English word “strawberry” to relate to the French word “fraise.”  Mathematically, both self-attention and cross-attention are different forms of matrix multiplication, which can be done extremely efficiently using a GPU.  Because of the attention layer, transformers can better capture relationships between words separated by long amounts of text, whereas previous models such as recurrent neural networks (RNN) and long short-term memory (LSTM) models lose track of the context of words from earlier in the text.  The future of models  Currently, transformers are the dominant architecture for many use cases that require LLMs and benefit from the most research and development. Although this does not seem likely to change anytime soon, one different class of model that has gained interest recently is state-space models (SSMs) such as Mamba. This highly efficient algorithm can handle very long sequences of data, whereas transformers are limited by a context window.   For me, the most exciting applications of transformer models are multimodal models. OpenAI’s GPT-4o, for instance, is capable of handling text, audio and images — and other providers are starting to follow. Multimodal applications are very diverse, ranging from video captioning to voice cloning to image segmentation (and more). They also present an opportunity to make AI more accessible to those with disabilities. For example, a blind person could be greatly served by the ability to interact through voice and audio components of a multimodal application.   It’s an exciting space with plenty of potential to uncover new use cases. But do remember that, at least for the foreseeable future, are largely underpinned by transformer architecture.  Terrence Alsup is a senior data scientist at Finastra. DataDecisionMakers Welcome to the VentureBeat community! DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation. If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers. You might even consider contributing an article of your own! Read More From DataDecisionMakers source

A look under the hood of transfomers, the engine driving AI model evolution Read More »

Taking AI to the playground: LinkedIn combines LLMs, LangChain and Jupyter Notebooks to improve prompt engineering

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More For enterprises, figuring out the right prompt to get the best result from a generative AI model is not always an easy task. In some organizations, that has fallen to the newfound position of prompt engineer, but that’s not quite what has happened at LinkedIn. The professional networking platform is owned by Microsoft and currently has more than 1 billion user accounts. Although LinkedIn is a large organization, it faced the same basic challenge that organizations of nearly any size faces with gen AI —  bridging the gap between technical and non-technical business users. For LinkedIn, the gen AI use case is both end-user and internal user facing.  While some organizations might choose to just share prompts with spreadsheets or even just in Slack and messaging channels, LinkedIn took a somewhat novel approach. The company built what it calls a “collaborative prompt engineering playground” that enables technical and non-technical users to work together. The system uses a really interesting combination of technologies including large language models (LLMs), LangChain and Jupyter Notebooks. LinkedIn has already used the approach to help improve its sales navigator product with AI features, specifically focusing on AccountIQ — a tool that reduces company research time from 2 hours to 5 minutes. Much like every other organization on the planet, LinkedIn’s initial gen AI journey started out by just trying to figure out what works. “When we started working on projects using gen AI, product managers always had too many ideas, like ‘Hey, why can’t we try this? Why can’t we try that,’” Ajay Prakash, LinkedIn staff software engineer, told VentureBeat. “The whole idea was to make it possible for them to do the prompt engineering and try out different things, and not have the engineers be the bottleneck for everything.” The organizational challenge of deploying gen AI in a technical enterprise To be sure, LinkedIn is no stranger to the world of machine learning (ML) and AI. Before ChatGPT ever came onto the scene, LinkedIn had already built a toolkit to measure AI model fairness. At VB Transform in 2022, the company outlined its AI strategy (at that time). Gen AI, however is a bit different. It doesn’t specifically require engineers to use and is more broadly accessible. That’s the revolution that ChatGPT sparked. Building gen AI-powered applications is not entirely the same as building a traditional application. Prakash explained that before gen AI, engineers would typically get a set of product requirements from product management staff. They would then go out and build the product.  With gen AI, by contrast, product managers are trying out different things to see what’s possible and what works. As opposed to traditional ML that wasn’t accessible to non-technical staff, gen AI is easier for all types of users. Traditional prompt engineering often creates bottlenecks, with engineers serving as gatekeepers for any changes or experiments. LinkedIn’s approach transforms this dynamic by providing a user-friendly interface through customized Jupyter Notebooks, which have traditionally been used for data science and ML tasks. What’s inside the LinkedIn prompt engineering playground It should come as no surprise that the default LLM vendor used by LinkedIn is OpenAI. After all, LinkedIn is part of Microsoft, which hosts the Azure OpenAI platform. Lukasz Karolewski, LinkedIn’s senior engineering manager, explained that it was just more convenient to use OpenAI, as his team had easier access within the LinkedIn/Microsoft environment. He noted that using other models would require additional security and legal review processes, which would take longer to make them available. The team initially prioritized getting the product and idea validated rather than optimizing for the best model.   The LLM is only one part of the system, which also includes: Jupyter Notebooks for the interface layer; LangChain for prompt orchestration; Trino for data lake queries during testing; Container-based deployment for easy access; Custom UI elements for non-technical users. How LinkedIn’s collaborative prompt engineering playground works Jupyter Notebooks have been widely-used in the ML community for nearly a decade as a way to help define models and data using an interactive Python language interface. Karolewski explained that LinkedIn pre-programmed Jupyter Notebooks to make them more accessible for non-technical users. The notebooks include UI elements like text boxes and buttons that make it easier for any type of user to get started. The notebooks are packaged in a way that allows users to easily launch the environment with minimal instructions, and without having to set up a complex development environment. The main purpose is to let both technical and non-technical users experiment with different prompts and ideas for using gen AI. To make this work, the team also integrated access to data from LinkedIn’s internal data lake. This allows users to pull in  data in a secure way to use in prompts and experiments. LangChain serves as the library for orchestrating gen AI applications. The framework helps the team to easily chain together different prompts and steps, such as fetching data from external sources, filtering and synthesizing the final output.  While LinkedIn is not currently focused on building fully autonomous, agent-based applications, Karolewski said he sees LangChain as a foundation for potentially moving in that direction in the future. LinkedIn’s approach also includes multi-layered evaluation mechanisms: Embedding-based relevance-checking for output validation; Automated harm detection through pre-built evaluators; LLM-based evaluation using larger models to assess smaller ones; Integrated human expert review processes. From hours to minutes: Real-world impact for the prompt engineering playground The effectiveness of this approach is demonstrated through LinkedIn’s AccountIQ feature, which reduced company research time from two hours to five minutes. This improvement wasn’t just about faster processing — it represented a fundamental shift in how AI features could be developed and refined with direct input from domain experts. “We’re not domain experts in sales,” said Karolewski. “This platform allows sales experts to directly validate and refine AI features, creating a tight feedback loop that wasn’t possible before.”

Taking AI to the playground: LinkedIn combines LLMs, LangChain and Jupyter Notebooks to improve prompt engineering Read More »

PIN AI launches mobile app letting you make your own personalized, private DeepSeek or Llama-powered AI model on your phone

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Thanks to Her and numerous other works of science fiction, it’s pretty easy to imagine a world in which everyone has their own personalized AI assistant — a helper who knows who we are, our occupations, our hobbies, our goals and passions, our likes and dislikes…what makes us “tick,” essentially. Some AI tools today offer a fairly bare bones, limited version of this functionality, such as CharacterAI and ChatGPT’s memory feature. But these still rely on your information flowing up to corporate servers outside of your control for analysis and processing. They also don’t allow for many third-party transactions, meaning your AI assistant can’t make purchases on your behalf. For those especially concerned about privacy, or who want an AI model that actually retrains itself to adapt to individual preferences — making a unique AI assistant unlike any in the entire world — you’re basically on your own. Until now: A new startup PIN AI (not to be confused with the poorly reviewed hardware device the AI Pin by Humane) has emerged from stealth to launch its first mobile app, which lets a user select an underlying open-source AI model that runs directly on their smartphone (iOS/Apple iPhone and Google Android supported) and remains private and totally customized to their preferences. Video of PIN AI mobile app in action. Credit: PIN AI Built with a decentralized infrastructure that prioritizes privacy, PIN AI aims to challenge big tech’s dominance over user data by ensuring that personal AI serves individuals — not corporate interests. Founded by AI and blockchain experts from Columbia, MIT and Stanford, PIN AI is led by Davide Crapis, Ben Wu and Bill Sun, who bring deep experience in AI research, large-scale data infrastructure and blockchain security. The company is backed by major investors, including a16z Crypto (CSX), Hack VC, Sequoia Capital U.S. Scout and prominent blockchain pioneers like Near founder Illia Polosukhin, SOL Foundation president Lily Liu, SUI founder Evan Cheng and Polygon co-founder Sandeep Nailwal. Personal AI realized PIN AI introduces an alternative to centralized AI models that collect and monetize user data. Unlike cloud-based AI controlled by large tech firms, PIN AI’s personal AI runs locally on user devices, allowing for secure, customized AI experiences without third-party surveillance. At the heart of PIN AI is a user-controlled data bank, which enables individuals to store and manage their personal information while allowing developers access to anonymized, multi-category insights — ranging from shopping habits to investment strategies. This approach ensures that AI-powered services can benefit from high-quality contextual data without compromising user privacy. “The problem today is that all the big players claim they do personal AI — Apple, Google, Meta — but what are they really doing?” Davide Crapis, co-founder of PIN AI, said in an in-person interview with VentureBeat earlier this month. “They’re taking the gold mine in your phone and exploiting all that information to figure out what to push to you.” Desktop view of PIN AI user dashboard. PIN AI launched a web-only version late last year that has already gained tremendous traction, with more than 2 million alpha users via Telegram and a Discord community of 220,000 members. The new mobile app launched in the U.S. and multiple regions also includes key features such as: The “God model” (guardian of data): Helps users track how well their AI understands them, ensuring it aligns with their preferences. Ask PIN AI: A personalized AI assistant capable of handling tasks like financial planning, travel coordination and product recommendations. Open-source integrations: Users can connect apps like Gmail, social media platforms and financial services to their personal AI, training it to better serve them without exposing data to third parties. “With our app, you have a personal AI that is your model,” Crapis added. “You own the weights, and it’s completely private, with privacy-preserving fine-tuning.” He told VentureBeat that the app currently supports several open-source AI models as the base model from which users can begin personalizing their assistant, including small versions of DeepSeek and Meta’s Llama. Promotional screenshot of PIN AI mobile app. Credit: PIN AI Blockchain-based ledger for credentials and data access PIN AI’s infrastructure is built on blockchain protocols, ensuring security, transparency and user control. Data is stored locally: Unlike cloud-based AI systems, PIN AI keeps all user data on personal devices rather than centralized servers. Trusted execution environment (TEE) for authentication: Credentials and sensitive computations occur within a secure enclave, preventing external access — even from PIN AI itself. Blockchain registry for financial transparency: Key actions are authenticated on-chain while user data remains private and locally stored. Interoperability with emerging AI protocols: PIN AI is designed to integrate with future decentralized AI and blockchain projects, ensuring long-term adaptability. By decentralizing AI infrastructure, PIN AI aims to balance privacy, security and efficiency, allowing users to retain ownership of their digital footprint while still benefiting from AI-driven insights and automation. “We designed our protocol around privacy using modern cryptographic methods like TTE,” said Crapis. No one — not even us — can see your authentication keys,” User-based AI focus The launch of PIN AI comes at a time when concerns over data privacy and AI monopolization are at an all-time high. Co-founder Wu emphasized the importance of data sovereignty, stating, “We’re uniting open-source AI builders and developers to build a foundation for open personal AI, where the user owns the AI 100%.” Sun explained the broader vision: “Think of it like J.A.R.V.I.S. from Iron Man — the most loyal executive system that evolves into your personal AI assistant.” Crapis further elaborated on PIN AI’s approach, stating, “We’re creating a data bank that lets you reclaim your personal data from big tech — your Google data, Facebook data, even Robinhood and financial data — so your personal AI can run on it.” Beyond personal use, PIN AI envisions a network of personal AI agents that can interact

PIN AI launches mobile app letting you make your own personalized, private DeepSeek or Llama-powered AI model on your phone Read More »

Less supervision, better results: Study shows AI models generalize more effectively on their own

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Language models can generalize better when left to create their own solutions, a new study by Hong Kong University and University of California, Berkeley, shows. The findings, which apply to both large language models (LLMs) and vision language models (VLMs), challenge one of the main beliefs of the LLM community — that models require hand-labeled training examples. In fact, the researchers show that training models on too many hand-crafted examples can have adverse effects on the model’s ability to generalize to unseen data. SFT vs RL in model training For a long time, supervised fine-tuning (SFT) has been the gold standard for training LLMs and VLMs. Once a model is pre-trained on raw text and image data, companies and AI labs usually post-train it on a large dataset of hand-crafted examples in question/answer or request/response format. After SFT, the model can undergo additional training stages, such as reinforcement learning from human feedback (RLHF), where the model tries to learn implicit human preferences based on signals such as answer rankings or liking/disliking the model’s responses. SFT is useful for steering a model’s behavior toward the kind of tasks the model creators have designed it for. However, gathering the data is a slow and costly process, which is a bottleneck for many companies and labs. Recent developments in LLMs have created interest in pure reinforcement learning (RL) approaches, where the model is given a task and left to learn it on its own without hand-crafted examples. The most important instance is DeepSeek-R1, the OpenAI o1 competitor that mostly used reinforcement learning to learn complex reasoning tasks. Generalization vs memorization One of the key problems of machine learning (ML) systems is overfitting, where the model performs well on its training data but fails to generalize to unseen examples. During training, the model gives the false impression of having learned the task, while in practice it has just memorized its training examples. In large and complex AI models, separating generalization from memorization can be difficult. The new study focuses on the generalization abilities of RL and SFT training in textual and visual reasoning tasks. For textual reasoning, an LLM trained on a set of rules should be able to generalize to variants of those rules. In visual reasoning, a VLM should remain consistent in task performance against changes to different aspects of visual input, such as color and spatial layout. In their experiments, the researchers used two representative tasks. First was GeneralPoints, a benchmark that evaluates a model’s arithmetic reasoning capabilities. The model is given four cards, as textual descriptions or images, and is asked to combine them to reach a target number. For studying ruled-based generalization, the researchers trained the model using one set of rules, then evaluated it using a different rule. For visual generalization, they trained the model using cards of one color and tested its performance on cards of other colors and numbering schemes. The second task is V-IRL, which tests the model’s spatial reasoning capabilities in an open-world navigation domain that uses realistic visual input. This task also comes in pure-language and vision-language versions. The researchers evaluated generalization by changing the kind of instructions and visual representations the model was trained and tested on. They ran their tests on Llama-3.2-Vision-11B, warming the model up by training it on a small SFT dataset, then creating separate versions for each task and training paradigm. For each task, they separately scaled the training on RL and SFT. The SFT process trains the model on additional hand-crafted solutions, while RL lets the model generate many solutions for each problem, evaluate the results and train itself on the correct answers. The findings show that reinforcement learning consistently improves performance on examples that are drastically different from training data. On the other hand, SFT seems to memorize the training rules and doesn’t generalize to out-of-distribution (OOD) examples. These observations apply to both text-only and multimodal settings. SFT-trained models perform well on training examples (in-distribution) while showing poor performance on unseen examples (out-of-distribution) (source: arXiv) Implications for real-world applications While their experiments show that RL is better at generalizing than SFT, the researchers also found that SFT is helpful for stabilizing the model’s output format, and is crucial to enabling RL to achieve its performance gains. The researchers found that, without the initial SFT stage, RL training did not achieve desirable results. This is a bit different from the results obtained by DeepSeek-R1-Zero, which was post-trained on pure RL. The researchers suggest that this can be due to the different backbone model they used in their experiments. It is clear that there is a lot of untapped potential in RL-heavy approaches. For use cases that have verifiable results, letting the models learn on their own can often lead to unanticipated results that humans could not have crafted themselves. This could come in very handy in settings where creating hand-crafed examples can be tedious and expensive. source

Less supervision, better results: Study shows AI models generalize more effectively on their own Read More »

Adobe Firefly AI video generator debuts—the most ‘IP-safe’ AI tool yet?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Adobe is expanding its generative AI capabilities with the release of a new video generation model, marking a significant step in the company’s push to provide professional creators with AI tools they can safely use in commercial projects. The company announced today that its Firefly Video Model is entering public beta, offering AI-powered video generation tools that Adobe claims are trained only on licensed content — a key differentiator in the increasingly crowded AI video generation market. “We’re the most useful solution because we’re IP friendly, commercially safe model,” Alexandru Costin, who leads Adobe’s AI initiatives, said in an interview with VentureBeat. “You can use our model. There is no risk of IP infringement. More than anybody else, we’re passionate about solving professional videographer needs.” (Credit: Adobe) How Adobe’s new pricing strategy makes AI video generation more accessible The launch comes as Adobe reports that its Firefly family of AI models has generated more than 18 billion assets globally since its initial release in March 2023. This rapid adoption suggests strong demand for AI tools that creative professionals can confidently use in commercial work. The new video capabilities will be available through Adobe’s redesigned Firefly web application and integrated into Premiere Pro, Adobe’s professional video editing software. The system can generate 1080P video clips from text prompts or images, with features like camera angle control and atmospheric effects generation. “Just coming from the research lab, they were demoing to me this morning some of the amazing generation capabilities that are coming, increasing the resolution, doing transparent video overlays… doing real time video,” Costin revealed, indicating Adobe’s roadmap for the technology. Adobe is introducing tiered pricing plans starting at $9.99 monthly for the Standard plan, which includes 2,000 video/audio credits — enough for approximately 20 five-second 1080p video generations. A Pro plan at $29.99 offers 7,000 credits. (Credit: Adobe) Inside Adobe’s Strategy to Dominate Professional AI Video Creation The integration with Adobe’s existing creative tools appears to be a key strategic advantage. Kylee Pena, senior product marketing manager at Adobe, demonstrated how editors can use the technology to fill gaps in video timelines or generate atmospheric effects like snow, then seamlessly adjust the results using Premiere Pro’s professional tools. “Because I’m in Premiere Pro, I also have a lot of additional pro level tools, including AI tools we’ve had for a while, like color match,” Pena explained during a demonstration. The launch comes as competition intensifies in the AI video generation space, with recent entries like OpenAI’s Sora generating significant attention. Adobe is betting that its focus on commercial safety and professional workflow integration will help it stand out in an increasingly crowded market. To ensure transparency, Adobe will include Content Credentials, a type of digital certification, with all AI-generated video content. This aligns with the company’s leadership in the Content Authenticity Initiative, which aims to provide verification tools for digital content. Global brands including Dentsu, Gatorade and Stagwell are already testing the technology in beta, suggesting potential enterprise adoption. Adobe plans to introduce a Premium plan designed for high-volume professional users in the near future. The development signals Adobe’s strategic focus on maintaining its position as the go-to provider of creative tools for professionals while adapting to the AI revolution reshaping the creative industry. With 85% of projects at the recent Sundance Film Festival using Adobe Creative Cloud, the company appears well-positioned to bridge the gap between traditional creative workflows and emerging AI capabilities. source

Adobe Firefly AI video generator debuts—the most ‘IP-safe’ AI tool yet? Read More »

‘Personalized, unrestricted’ AI lab Nous Research launches first toggle-on reasoning model: DeepHermes-3

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AI reasoning models — those that produce “chains-of-thought” (CoT) in text and reflect on their own analysis to try and catch errors midstream before outputting a response — are all the rage now thanks to the likes of DeepSeek and OpenAI’s “o” series. Still, it’s pretty incredible to me the speed at which the reasoning model approach has spread across the AI industry, with this week’s announcement that there’s yet another new model to try, this one from the mysterious yet laudably principled Nous Research collective of engineers, whose entire mission since launching in New York City in 2023 has been to make “personalized, unrestricted” AI models — often by taking and fine-tuning or retraining open-source models such as Meta’s Llama series and those from French startup Mistral. As posted on the Nous Research account on X and in the firm’s Discord channel, this new open reasoning model is called “DeepHermes-3 Preview,” and is described as an “LLM [large language model] that unifies reasoning and intuitive language model capabilities,” and allows the user to switch at will between longer reasoning processes and shorter, faster, less computationally demanding responses. It’s an 8-billion parameter (settings count) variant of Hermes 3, itself a variant of Meta’s Llama released by Nous back in August 2024. Sample exchanges have shown that it could enter into metacognition-like displays of thinking about itself and the role of AI compared to human consciousness, trigging something approaching an existential crisis in the model’s outputs. Users can download the full model code on HuggingFace and a version that’s been quantized (reduced bit count) and saved in the GPT-generated unified format (GGUF), which is designed to run model inferences (the actual production build, as opposed to training) on consumer-grade PCs and servers. Nous today wrote that its researchers “hope our unique approach to user controlled, toggleable reasoning mode furthers our mission of giving those who use DeepHermes more steerability for whatever need they have.” Building on Hermes 3: The data and training approach DeepHermes-3 builds on the Hermes 3, a meticulously curated multi-domain dataset that Nous Research developed for the broader Hermes 3 series. According to the Hermes 3 Technical Report released in August, this dataset is composed of approximately 390 million tokens spanning diverse instructional and reasoning-based domains. The dataset is broken down into the following key categories: General instructions (60.6%): Broad, open-ended prompts similar to those found in general-purpose AI chat models. Domain expert data (12.8%): Specialized knowledge in fields like science, law and engineering. Mathematics (6.7%): Advanced problem-solving datasets aimed at improving numerical and logical reasoning. Roleplaying and creative writing (6.1%): Data designed to enhance storytelling and simulated dialogue. Coding and software development (4.5%): Code generation and debugging tasks. Tool use, agentic reasoning and retrieval-augmented generation (RAG) (4.3%): Training on function calling, planning and knowledge retrieval. Content generation (3.0%): Writing, summarization and structured output tasks. Steering and alignment (2.5%): Data focused on making the model highly steerable and responsive to user prompts. In addition, the pseudonymous Nous Research team member @Teknium (@Teknium1 on X) wrote in response to a user of the company’s Discord server that the model was trained on “1M non cots and 150K cots,” or 1 million non-CoT outputs and 150,000 CoT outputs. This data mixture supports DeepHermes-3’s unique ability to toggle between intuitive responses and deep, structured reasoning, a key feature that distinguishes it from other LLMs. How toggleable reasoning mode works DeepHermes-3 allows users to control its reasoning depth using a system prompt. The user must enter the following text before a prompt to “toggle on” the model’s reasoning mode: “You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside tags, and then provide your solution or response to the problem.“ When reasoning mode is enabled, the model processes information in long CoTs, allowing it to deliberate systematically before generating an answer. This is achieved using the <think></think> tags, where the model’s internal monologue is structured before presenting a final solution. In standard response mode, the model operates more like a traditional AI chatbot, providing quicker, intuition-based responses without deep logical processing. Performance insights and community feedback Early benchmarking and community testing have provided key insights into DeepHermes-3’s capabilities: Mathematical reasoning: DeepHermes-3 scores 67% on MATH benchmarks, compared to 89.1% for DeepSeek’s R1-distilled model. While DeepSeek outperforms it in pure math tasks, Nous Research positions DeepHermes-3 as a more generalist model with broader conversational and reasoning skills. Multi-turn conversations: Some testers report that reasoning mode activates correctly on the first response, but may fail to persist in extended conversations. Community members suggest enforcing <think>n at the start of each response, a method also used in DeepSeek-R1. Function calling: DeepHermes-3 supports tool use, although it was not explicitly trained to integrate reasoning mode and function calling simultaneously. Some users report that while combining both features improves accuracy in executing tools, results remain inconsistent. Nous Research is actively gathering user feedback to refine reasoning persistence and improve multi-turn interactions. Deployment and hardware performance DeepHermes-3 is available for testing on Hugging Face, with GGUF quantized versions optimized for low-power hardware. The model is compatible with vLLM for inference and uses Llama-Chat format for multi-turn dialogue. One user reported a processing speed of 28.98 tokens per second on a MacBook Pro M4 Max, demonstrating that the model can run efficiently on consumer hardware. DeepHermes-3 is based on Meta’s Llama 3 model and is governed by the Meta Llama 3 Community License. While the model is freely available for use, modification and redistribution, certain conditions apply: Redistribution: Any derivative models or deployments must include the original license and prominently display “Built with Meta Llama 3.” Restrictions on model training: Users cannot use

‘Personalized, unrestricted’ AI lab Nous Research launches first toggle-on reasoning model: DeepHermes-3 Read More »