This website was built using a large language model (mostly)
GPT4 turned us all on our heads two weeks ago. We wanted to see how far we could push it. And it had been on our list to build a website for our AI + innovation work even before the release. Taking GPT4 for a test whilst building out the new website felt like it’d be an interesting experiment.
And, ok, sure, it’s a cliche. A pretty boring cliche at this point: look at this cool thing we built with an LLM. But bear with us, we’ll try and make it interesting.
An unknown framework
Going into the experiment we had a few conditions to make life more difficult for ourselves - and reduce the ability to cheat with existing knowledge. We decided that whatever framework we used should be:
- Something we didn’t know
- Something that was fast in the front-end
- Something old enough - or based on technology that was old enough - for GPT4 to have lots of data on it
Astro.js fit the bill. It’s a solid top-10 Static Site Generator but you’ve probably never heard of it. We hadn’t either. But we saw a recent piece on how fast it was compared to next.js and Gatsby. Astro.js wasn’t a great fit for the third condition since it’s only been around since 2021 (when GPT4’s data ends) but it’s built on top of TypeScript so we thought it’d be fine. Partially because the framework is built explicitly around delivering content, which means the underlying architecture is simpler than some other front-end frameworks.
You’ll be glad to know we’re not going to take you through our prompt history. Not least thanks to the bust-up between Italy and OpenAI a lot of the conversation is no longer accessible since Edd is based in north Italy.
We found it difficult to begin with. Partly because we were trying to build it as quickly as humanly possible whilst knowing nothing about how Astro was put together.
It also became clear that GPT4 is frustratingly slow. Responses took minutes rather than seconds to arrive and were often incomplete. That made trying to find any sort of ‘flow’ completely impossible.
The last lesson was just how much domain knowledge was required. Asking something general often created really strange results. As an example of something that will return a useless suggestion:
I want a component that can display a title, date and description from an article. Output the necessary html and js code.
Instead it would need to be something very specific like the following to return something that is useful:
I am building an astro.js app. I am using Tailwind CSS. I have a component called ThinkingRow at components/ThinkingRow.tsx. The component is included on a parent page, pages/thinking.tsx. The component has unnamed properties called ’title’, ‘date’ and ‘description’. All are strings and all can be accessed via the frontmatter of the markdown file. Output html and js elements using tailwind css classes to style them. Remember astro uses ‘class’ rather than ‘className’
The need for high levels of specificity, and domain knowledge, meant that after an hour or two of dead ends we started taking a more iterative approach. This was fairly emergent and based on the backwards and forwards of human-machine conversation. And the very real human frustration when the machine didn’t do what was expected!
Between learning how GPT4 wanted prompts to be created, understanding how the pieces fit together in astro, and figuring out what we actually wanted from the site we looped through the following steps.
- Scaffold Astro project with tailwind css
- Content model
- Detail page
- List page
- Final fix
We skipped the final fix bit because we were more interested in sharing where we’d gotten to. The only way to learn is by testing and getting our ideas out in their raw form was the quickest way we could do that.
We’ve used a lot of words to describe the frustrations and the process but - taking this iterative approach - it was remarkable just how fast the project moved forwards. It was also remarkable how I only needed to once look at the Astro documentation.
A lever not a magic wand
Humans can’t lift heavy things. But we figured out a long time ago that with a lever that’s long enough almost anything is moveable (including our planet if you listen to Archimedes*) They’re an incredibly useful tool. But that’s all they are, they’re part of a wider process and system.
Humans need to decide what heavy thing needs to be moved and what they’ll do once they’re moved. Which is a clumsy analogy for what GPT4 felt like building this website. It acted as an incredibly useful lever that allowed us to create something that would have otherwise been impossible. But it was still down to us to figure out where to put all the pieces and how the project needed to be pushed along.
For the moment there’s no magic trick happening with large language models. There’s lots of demos around where with one-shot prompts it’ll create a video game or perfect website. The outputs are always very familiar - things like Pong, Tetris or Snake - which makes sense given how LLMs are built. Large language models are very good at quickly writing known-known patterns. They’re so good at it that they make it look like magic.
But our website - despite being simple - could never have been created in a one-shot, or short, conversation. GPT4 needed to be used like a lever with supervision and guidance from humans every step of the way.
Still, levers shouldn’t be underestimated, they’re what gave us the pyramids, Rome and the Pantheon. We’re fascinated to see what gets created with this new type of lever that LLMs have given us.
* "Give me a lever long enough and a fulcrum on which to place it, and I shall move the world." is credited to the ancient Greek mathematician Archimedes