Properties of tasks that are good for Agents

Published: Mar 09, 2026

It is easy to verify whether a change is good or bad --there is some quantity that summarises the goodness or badness of the system.
There are no major real-world consequences to your changes. If you make a change and it leads to worse performance, you discard it and move on.
Changes stack additively: if change X makes the system better and change Y makes the system better, then making the system use both X and Y improves the overall system.
You have enough datapoints that you have confidence you are not just cheating. Otherwise you are just overfitting to the test set.
You can make a change and measure its effects quickly and relatively cheaply.
What else? adrien@silmo.ai