Midjourney v7 launches with voice prompting and faster draft mode — why is it getting mixed reviews?
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Midjourney, the boot-strapped startup viewed by many AI power users as the “gold standard” of AI image generation since its launch in 2022, has now introduced the much-anticipated, most advanced version of its generator model, Midjourney v7. The headline feature is a new way to prompt the model to create images. Previously, users were limited to typing in text prompts and attaching other images to help guide generations (the model could incorporate a variety of user-uploaded and attached images, including other Midjourney generations, to influence the style and subjects of new generations). Now, the user can simply speak aloud to Midjourney’s alpha website (alpha.midjourney.com) — provided they have a microphone in/on/attached to their computer (or using a networked device with audio input, such as headphones or a smartphone) — and the model will listen and conjure up its own text prompts based on the user’s spoken audio descriptions, generating images from this. It’s unclear whether or not Midjourney created a new voice input model (speech-to-text) from scratch or is using a fine-tuned or out-of-the-box version of one from another provider such as ElevenLabs or OpenAI. I asked Midjourney founder David Holz on X, but he has yet to answer. Using Draft Mode and conversational Voice Input to prompt in a flow state Going hand-in-hand with this input method is a new “Draft Mode” that generates images more rapidly than Midjourney v6.1, the most immediate preceding version, often in less than a minute or even 30 seconds in some cases. While the images are initially of lower quality than v6.1, the user can click on the “enhance” or “vary” buttons located to the right of each generation to re-render the draft at full quality. The idea is that the human user will be happy to use both together — in fact, you need to have “Draft Mode” turned on to activate audio input — to enter a more seamless flow state of creative drafting with the model, spending less time on refining the specific language of prompts and more on seeing new generations, reacting to them in realtime, and adjusting them or tweaking them as needed more naturally and rapidly by simply speaking the thoughts out to the model. “Make this look more detailed, darker, lighter, more realistic, more kinetic, more vibrant,” etc. are some of the instructions the user could provide through the new audio interface in response to generations to produce new, adjusted ones that better match their creative vision. Getting started with Midjourney v7 To enter these modes, starting with the new “Draft” feature, the user must first jump through one new hurdle: Midjourney’s personalization feature. While this feature had been introduced previously on Midjourney v6 back in June 2024, it was optional, allowing the user to create a personal “style” that could be applied to all generations going forward by rating 200 pairs of images (selecting which on the user liked best) through the Midjourney website. The user could then toggle on a style that matched the images they liked best during the pairwise rating process. Now, Midjourney v7 requires users to generate a new v7-specific personalized style before even using it at all in the first place. Once the user does that, they’ll land on the familiar Midjourney Alpha website dashboard where they can click “Create” from the left side rail to open a the creation tab. Then, in the prompt entry bar at the top, the user can click on the new “P” button to the right of the bar to turn on their personalization mode. Midjourney founder and leader David Holz confirmed to VentureBeat on X that older personalization styles from v6 could also be selected, but not the separate “moodboards” — styles made up of user-uploaded image collections — though Midjourney’s X account separately stated that feature will be returning soon as well. However, I didn’t see the opportunity to select my older v6 style. Nonetheless, the user can then click on the new “Draft Mode” button to the right of the Personalization button (also further to the right of the text prompt entry box) to activate this faster image generation mode. Once that’s been selected with the cursor, it will turn orange indicating it is turned on, and then a new button with a microphone icon should appear to the right of this one. This is the voice prompting mode, which the user can once again click on to activate. Once the user has pressed this microphone button to enter the voice prompting mode, they should see the microphone icon change from white to orange to indicate it is engaged, and a waveform line will appear to the right of it that should begin undulating in time with the user’s speech. The model will then be able to hear you and should also hear when you finish speaking. In practice, I sometimes got an error message saying “Realtime API disconnected,” but stopping and restarting the voice entry mode and refreshing the webpage usually cleared it quickly. After a few seconds of speaking, Midjourney will begin flashing some keyword windows below the prompt entry textbox at the top and also generate a full text prompt to the right as it generates a new set of 4 images based on what the user said. The user can then further modify these new generations by speaking to the model again, toggling voice mode on and off as needed. Here’s a quick demo video of me using it today to generate some sample imagery. You’ll see the process is far from perfect, but it is really fast and does allow for more of an interrupted state of prompting, refining, and receiving images from the model. More new features…but also many missing features and limitations from v6/6.1 Midjourney v7 is launching with two operational modes: Turbo and Relax. Turbo Mode provides high performance at twice the cost