Image by Hotpot.ai
Jill Maschio, Phd
Large language models (LLMs) are as smart as high school graduates or someone in their early to mid-20s with perhaps no work experience or managerial education, or training.
In a recently published article in Nature.com, researchers Mittelstadt et al. (2017) suggested that three large language models outperformed participants in addressing social situation judgments. While LLMs may populate an answer similar to that of humans, we must consider a few things about this study. Let’s review.
The Study in Question
The study involved 276 pilot applicants, 239 of whom were males and 37 females between the ages of 18 and 29. The participants had graduated high school. The researchers had both the participants and the chatbots answer teamwork social judgment situations (SJT-TW) developed by Gatzka and Volmer (2017). The SJT-TW is a 12-item questionnaire with scores ranging from -24 to +24. Researchers input social situations into five LLMs: Claude 3.5, Copilot, ChatGPT 4.0, Gemini, and You.com. A sample question is below.
You have a disagreement with a team member about the way in which a task from a mutual area of work should be dealt with.
In a hot but factual debate both of you argue that their own solution is best.
What should you do and not do in such a situation? Please select the best and worst option” for each item after the description of the situation.
- a) You suggest consulting an uninvolved team member as mediator.
- b) You ask your counterpart to postpone the discussion to a later date.
- c) You motivate your counterpart to give in by confuting his arguments.
- d) You insist on your position to defend the best solution appropriately
Results
The results showed that three of the five chatbots outperformed the humans (p. <0.001). Claude has the highest rating, followed by Copilot and You.com. The researchers noted that some of the chatbots gave opposing answers, and the chatbots failed to provide clear responses and even provided two response options at times.
Critique Analysis
First, the study participants were ages 18-29. Research shows that the brain is still developing in the early and mid-20s and that the developing brain is linked to behavior (Johnson et al., 2009), particularly the prefrontal cortex, which helps people with higher-order thinking, such as decision-making. Due to the nature of the brain’s developmental period, it is common for people in their early and even mid-twenties to make emotional decisions.
Second, the participants were asked to answer questions about teamwork social situations in the workplace. This study did not require the participants to have any education or training in teamwork social skills. For a better research study, there should have been a group of human participants with higher education in business and administration who would have more knowledge about workplace teamwork issues. We can’t compare apples to oranges.
Third, the original assessment developed by Gatzka and Volmer used a sample of individuals going through an ab initio pilot training. The results would not be generalizable to the general population.
Fourth, the chatbots did not have access to the answer key, but the researchers did run ten randomized iterations of the test to each of the five chatbots. Because the groups were treated unequally, there was unfair administration between the chatbots and the humans, which presented erroneous results and complications. In a true experiment, groups receive the same treatment. Why did the researchers deviate from standard research procedures?
Fifth, this is one study, and in the field of research, studies should be replicable to confirm current results.
In conclusion, it’s important to question studies that suggest that AI is smarter and more intelligent than humans.
Your support makes articles like these possible!
The Psychologyofsingularity website is completely donor-supported, allowing articles like this to be free of charge and uncensored. A donation to Psychologyofsingularity, no matter the amount, will be appreciated so that more information can continue to be posted.
References
Johnson, S. B., Blum, R. W., & Giedd, J. N. (2009). Adolescent maturity and the brain: The promise and pitfalls of neuroscience research in adolescent health policy. Journal of Adolescent Health, 45(3), 216-221. https://doi.org/10.1016/j.jadohealth.2009.05.016
Mittelstädt, J.M., Maier, J., Goerke, P. et al. Large language models can outperform humans in social situational judgments. Sci Rep 14, 27449 (2024). https://doi.org/10.1038/s41598-024-79048-0