Drafting a Legal Notice: Comparing OpenAI's GPT 3.5 Turbo and GPT 4
Why are we still preparing first drafts of legal notices?
OpenAI’s Da Vinci model is a powerful language learning model that powers GPT 3.5 Turbo. In its chat completion avatar, it took the world by storm when ChatGPT was first released. All those impressive looking streaming chat completions we saw in our first experiences of ChatGPT? That was all GPT 3.5 Turbo.
Going from chat completions to legal drafts.
The obvious use case for a language model that shines in offering text completions is preparing a first draft of all sorts of things, including say, legal documents.
Legal documents are built on precedents, existing knowledge and data. Something that translates that into a first cut automatically.
We lawyers love that. The tireless paralegal of our dreams!
How does GPT 3.5 Turbo perform?
Here’s where we were excited and disappointed at the same time. Before sharing what we found, lets go over a sample of the prompt we used. Keeping it very simple, limiting the detail to a brief statement of facts, the actions/remedies and jurisdiction. Standard temp and no limits on tokens. (Ignore the casual defamation of Google India)
Well, there were good and bad drafts. Overall, with GPT 3.5 Turbo, there are some problems common to other use cases and which are particularly striking in legal drafts.
Ofcourse it is not to say that it is NOT super cool, and could even be very useful!
Lets look at a sample of the output that we got:
You will notice a few things just by looking at the draft, and these are issues common to GPT 3.5 chat completions across several drafts that we produced:
Inconsistent or lackadaisical structure - 3.5 struggles with logically applying your prompt to a problem before generating text. In that way, it acts more like a language model should, generating text completions irrespective of the logic. It also lacks coherency in how it structures and formats a document.
GPT 3.5 generates less tokens, i.e., the length of the notice in more complex fact situations can be underwhelming and not sufficiently detailed.
For those of you who have experimented with GPT prompts, 3.5 doesn’t really understand the system prompt in a logical way. It uses the system prompt as an aid to generating a user side completion. There isn’t that much change in the output if say, you were to include some parts of the user prompt in the system prompt and vice-versa. This means less control over how the draft is prepared, and also over the final structure of the draft.
How does GPT-4 perform?
While there is a good understanding of the “role” in both GPT-3.5 Turbo and GPT-4 (here, the role of a lawyer) since the legal use case is common and OpenAI has trained its models on truckloads of legalese, things dramatically improve in GPT-4.
For fairness, we used the same prompt with only the change in model. All other parameters (temp/no limit on tokens), standard. Here is the first page of the draft that GPT-4 produces:
After running through several legal notice drafts and combination of prompts, what did we discover about GPT-4?
Better structure and logical arrangement of sections.
Lengthier text.
GPT-4 is more sensitive to prompts, even more complex ones than the one above. It applies logic to the system prompt, and does not disobey user prompts, i.e., you have more control over both, how it produces your draft and the structure of the draft itself.
You might also notice that GPT-4 drafts just read better and use more context appropriate language.
Some questions to think about.
If you have been reading this far, the questions that remain for us all to find answers:
GPT-4 costs 10 times as much as GPT-3.5 Turbo. Do we see a difference in quality that matters?
What is the best prompt structure for legal document generation?
Are you relying on GPT or a wrapper like TipsyTom to generate legal drafts? If you don’t think its ready, why do you think so?
If you enjoyed reading, this Substack is free to subscribe, and also free to recommend to your friends who like law, technology, artificial intelligence and innovation. Always love to hear your thoughts, feedback and ideas on this other and posts that interest you.
Till next time!