How to Make AI a Reliable Ally?

Imagen generada con ChatGPT

Image generated with ChatGPT
Would you let an AI agent manage your email and respond on your behalf? Or classify your potential clients automatically and send them commercial messages?

If you answered yes to at least one of these questions, keep reading.

🕒 Summary for busy people

Estimated reading time for the full article: 10 minutes.

AI is not magic. It is a tool that, when used well, can multiply productivity, and when used poorly, can become a factory of convincing errors. Language models are impressive, yes, but they lack judgment. And that’s why the key is not to give them control, but to design a clear framework: when to use them, how to limit their scope, and what to never delegate.

Here I propose a practical guide to taming LLMs: use classical programming when the task is deterministic, define good prompts with context and objective, choose the right model according to the task, and always, always, validate the result with a human touch.

I also discuss the importance of having backup plans, personalizing models with your own data, and not getting blinded by the trend of total automation.

AI can be a reliable ally if we treat it with the same discipline as any other tool. But if we expect miracles, we will end up receiving poetic… and incorrect answers.

The talent and trap of LLMs

AI, and specifically LLMs, can perform surprising tasks: writing articles, summarizing books, generating realistic images, or even composing music. And often, it does so with astonishing quality.

The problem is that they lack judgment. And along with their well-known hallucinations (fabricated information presented as true), this becomes a risk. For example, an LLM could write about the discovery of America and confidently state that it occurred in 1378 instead of 1492.

In my professional experience, where I use LLMs for marketing, business management, and software development, reliability is always a challenge. The key is not to give up on these tools, but to learn how to set limits and design mechanisms that make them more useful and safe.

In this article, I want to share a series of techniques for you to leverage them in your daily life.

If it can be programmed, program it

Any task that can be solved with a deterministic algorithm should not be delegated to AI. For that, we have classical programming. AI is good for the ambiguous: interpreting, detecting patterns, summarizing, generating ideas.

AI can do many things, but asking it to calculate a sum is like hiring a poet to handle accounting.

The importance of a good prompt

Just like people, LLMs need context. A good prompt is the difference between a mediocre result and a brilliant one. There are various techniques that are more or less proven. RCO (Role-Context-Objective) works very well for me, but I suppose it varies depending on the task or area of expertise.

It works very well to define a starting role. This way, the model limits its search field to a specific area and provides better results. The context gives the model enough information to narrow down its response. Finally, the objective defines what we want it to provide us.

Role: Act as an expert coach in productivity and personal organization, specialized in helping professionals with excessive workloads improve their time management.

Context: I am working on several projects at once and feel like I am not making progress. Every day, urgent tasks arise that force me to postpone the important ones. Additionally, I spend a lot of time in unproductive meetings and find it hard to concentrate for long periods.

Objective: I need you to propose a weekly work plan that helps me:

Reduce time wasted in meetings.
Improve my concentration ability.
Organize tasks by priority.
Include concrete examples of daily routines, time management techniques (like Pomodoro or GTD), and a table with an adaptable weekly schedule model.

There are other interesting options, such as the output format. I use this a lot when working with APIs to interact with a model. I ask them to return the response in JSON, CSV, XML format, and I can validate it to detect possible errors.

You can also set limits, or if you want to sound knowledgeable, constraints or guardrails:

Do not exceed 400 words and limit each recommendation to a maximum of 3 sentences.
Use a professional and empathetic tone, without empty motivational phrases or self-help quotes.
Do not include references to pseudoscientific methods or mentions of public figures.

Finally, you can use a model to ask it to generate a structured and complete prompt, which you can manually adjust before sending it.

Don’t use a cannon to kill flies

It is not always necessary to use the largest or most sophisticated model. If you need to classify emails by urgency, a small and fast model will be more efficient than a cutting-edge one. GPT-4 can be useful for creating a draft of a business plan. Mistral or Gemini Flash are more than sufficient for classifying emails.

That said, do not rely on a model to recommend which is the best model for each task because they often do not have reliable or up-to-date information, and they may also present biased interests.

Have a plan B (and C)

There are situations where one model provides a more satisfactory result than others, and it does not always depend on whether it is a model suitable for that use case, as we have seen in the previous section.

In this scenario, it is advisable for one model to act as a backup for the primary one. For example, in Vinicio, our virtual sommelier, if ChatGPT says a wine does not exist, we launch the same query to Grok. The first one may fail, but the second one may succeed. You can even chain three or more models if your budget allows.

Personalize the experience

More and more models allow the use of private data to enrich their responses, such as Google’s NotebookLM, or more recently, ChatGPT with the concept of projects.

You can upload documents in a wide variety of formats, including multimedia and source code, which the model analyzes and takes into account as predetermined context in its responses. This is something that many code assistants already do, which greatly facilitates covering extensive codebases.

Always verify the essentials

As model managers, we should never delegate quality verification. Blindly trusting a model is a guarantee of disaster sooner or later. Only a person has the discernment to know if something is correct or incorrect.

I am totally against delegating critical tasks to models or unsupervised agents. How are you going to let them attend a meeting for you? If we reach that point, we are acknowledging that we are no longer needed, and then, as José Mota said, going for nothing is foolish.

Don’t remove yourself from the equation

AI is powerful but not infallible, and as my good friend Hipólito, one of the smartest people I know, says, these failures “are not a bug that can be resolved with little effort but something emergent from the technology.”

Our job is therefore not to step aside and let AI take over all our competencies, but to learn to integrate it with discernment. No matter how many “AI experts” proliferate now, we are still in a very early phase where we need to make mistakes and learn a lot from our errors.

🚀 Weekly reflections like this in your inbox. Subscribe to Charcos Tecnológicos