The Shortcomings and Opportunities of Large Language Models in Agronomy
This week, I read the following research paper on using LLMs to pass agriculture-based exams and thought it was well done:
To demonstrate the capabilities of LLMs, we selected agriculture exams and benchmark datasets from three of the largest agriculture producer countries: Brazil, India, and the USA. Our analysis highlights GPT-4’s ability to achieve a passing score on exams to earn credits for renewing agronomist certifications, answering 93% of the questions correctly and outperforming earlier general-purpose models (GPT-3.5), which achieved 88% accuracy. On one of our evaluation datasets that had published student scores, GPT-4 obtained the highest performance when compared to human subjects.
Given the increasing capability, I think it is worth exploring the implications for agronomists and farmers.
In May, my friend Rhishi Pethe had similar curiosities to the researchers as to whether a Large Language Model could pass the certified crop advisor exam.
It also grew my interest as to whether LLMs could pass the CCA exam. I passed the CCA exams in 2015 and dug into some old practice tests to see how ChatGPT would do.
In a much less scientific matter and with a smaller data set than the above study, I asked ChatGPT (on model GPT-3.5) ten questions— it got seven right or 70%, which would typically be enough to obtain CCA accreditation in any given year— similar, but slightly lower than most of the examples in the paper.
The paper states that the “study offers a unique perspective and contributes to the understanding of AI’s potential impact on agriculture by providing a baseline for future benchmarks about the use of large language models to solve agricultural problems.” The researchers do not attempt to call out a replacement for agronomists, and I think it’s worth highlighting why that’s important.
Any CCA will tell you the exam establishes that you have a baseline understanding of crop production. There is, however, extreme nuance when making objective recommendations to farmers.
Therefore, I asked ChatGPT follow-up questions to dig deeper into some of the questions it got right.
In this example, the number it suggests is relatively low compared to generally accepted numbers, and I found the same with other follow-up questions. The answer fell short of the needs a farmer would have (though it does deliver some additional interesting insight). This illustrates that when it comes to making specific recommendations, it still has room for improvement.
Knowledge is helpful, but being able to apply knowledge in a practical way is where agronomists create value. I find it interesting to think through why it would be challenging for an LLM to move beyond an assistant or “co-pilot” for the agronomist.
There are many reasons there will be challenges for an LLM to ever move beyond an assistant for an agronomist, but three stand out:
Human beings do not trust machines for decisions and diagnoses that directly impact them, at least not on average. We can extrapolate from medicine and healthcare.