As discussed in the earlier article, bringing LLMs to production in 2024 is hard.
But let’s forget that for a minute and discuss one of the most fundamental improvements AI will have on your products in the next eighteen months, entirely new ways of building user interfaces. A topic which I personally feel is not explored enough, maybe because LLMs are still seen as quite complex, low-level tech, so are not yet on the radar of product designers and UX experts. But GenAI really has the ability to completely revolutionize our user interfaces and the way our users interact with our platforms and data.
This is especially true for structured data tasks like filling out a CV, posting a house for sale or completing a complicated medical or history questionnaire. In a property marketplace, today we manage the post listing process by asking the user to fill out a long form, with a lot of required fields… Maybe they select the location and category first, so we can customize the inputs in the next step. Or maybe platforms prefer to start with the images, thinking that by making the user do this task it commits them to the process, so they are more likely to finish.
There is a lot of theory about how to improve data capture, but ultimately you are balancing two competing objectives;
- You want as much structured data as possible
- Your user is begrudging typing behind a screen instead of being out there selling
However, you both have one thing in common, you both want to stand out from your respective competition. The agent wants the most attractive house on your platform, and you want the most attractive version of their house across all property platforms.
Multimodal generative AI models let us get the best of both worlds. We can have a unique, detail rich listing which is easy and quick for the user to create! But how?
It starts with the multimodal part
Users already have long form descriptions, images and even videos. But rarely do we use them for data, they are more assets we bolt on the side of the main object. Yet LLM are exceptionally good at extracting structured data from free text, images and audio. The future of UIs is to start with assets and then propose the data, flip the script, quite literally.
Your users don’t want to be sitting behind a desk typing, they want to be doing their real job. Be that an HR manager, estate agent, doctor or marketer, all of them would much rather be out there doing interviews, showing houses, reviewing results or selling, yet we ask them to… type. Instead let them rather show and tell, in a natural and multi format method. Let that estate agent film a viewing and upload it, let the hiring manager describe the sort of person they are looking for.
From these varied and free form inputs, you can build a rich and detailed set of data. One much deeper and more nuanced than if you asked them to tick boxes and select from drop downs.
But wait there’s more…
So if we can actually upend the initial user experience, to be much more centered around what they actually want to do in their job, I think we can get a much higher quality object out of the user. But it won’t be perfect, at least initially.
We can now give the user a mostly auto completed form, but where things become magical is when you then have the AI interact with them, ask follow up questions and start a conversation!
Think of it as data validation on steroids! Instead of a field being mandatory and popping up a red error. The LLM can converse with the user, diving deeper into points of interest, following up on missing information and correcting inconsistencies. “Why did you say it is unfurnished but the pictures show it is? Would the seller consider a furnished option if the buyer asks?”.
In a conversation you have the ability to learn so much more. Traditionally we had an integer input box from the number of bathrooms… Now the user might say “the apartment has two bedroom en suite, with a guest toilet”. In both cases the answer is 3, but now we have so much more context to use.
Consider other use cases, like applying for a job. Upload your resume, but then have a conversation with a hiring AI, a sort of mini interview. Not with the intention of excluding or recommending you for the job immediately, but rather diving into the nuances lost in a PDF. By having a constructive back and forth, the LLM can ensure you are putting forward the information most relevant to this application, rather than a generic fact sheet. Increasing your chances but also making the hiring manager’s job much easier.
People hate chatbots
Especially the NLP powered platform assistants of the last five years. Thankfully with LLMs we can move beyond a chat bot and much closer to a fully interactive assistant. But this MUST be combined with a traditional UI data entry form.
And why is that? Inputs like photo and voice will be great for the initial data capture, and a conversation will let you go deeper and capture as much detail as possible. But when it is time to review the information and make edits, nothing beats a good old form. It removes ambiguity from understanding and lets small edits be done quickly and accurately.
There is also something magical to see the form being completed in front of your very eyes as you are talking to your AI assistant. It builds a relationship and trust in a platform like we have never achieved before… presuming you build it well.
Now, make it dynamic
Today we can make these three AI enabled improvements.
- Allow more advanced initial input formats
- Leverage a conversation to refine and improve the initial data
- Keep the traditional form to be able to do edits
So what is the AI cherry on this LLM cake? Looking a little bit further ahead, as the frameworks do not exist for this yet. I think making things truly next-generation, is using GenAI to dynamically generate the UI for the user in real time. Imagine a UI that customizes itself to
- The task you trying to perform
- The data already known and needed
- Your personal preferences
- Based on how you have been interacting up till now
We have had widgets for a while, customisable blocks are common in many dashboards. In the case of Google or Bing already stock tickers, calculators or weather graphs show up depending on the results. But ultimately these are just premade components being turned on or off. What if instead the LLM could actually write the components dynamically.
Widgets could be loaded in or out, but also new widgets could be created on the fly. This would allow UIs to be much cleaner and more focused, but also more versatile. Super important as we move more and more to mobile first (only) and want to leverage voice and vision.
Complicated data entry is very difficult on a phone, but now you can talk about and show what you want the system to understand then dynamically see only the important information you need back. Ever changing, in near real time as you interact with the platform.
This is truly a revolution in user interface, one where you do not design a wireframe but rather a user interaction ruleset and brand kit. Personalized, dynamic, and easy to use but still able to extract deep, accurate structured data. Broken free from the need for a keyboard and mouse, truely a system without compromise.
Are you and your design teams thinking about AI as the future of interaction?

Leave a comment