2024-10-13
I applaud the authors for proposing a new benchmark that goes beyond GSM8K. But after reading this work, it seems that most of the identified issues are due to poor associative-recall performance and not reasoning per se. For example, this work introduces GSM-NoOp in which a
Marcus on AI
Apple AI researchers say they found no evidence of formal reasoning in language models and their behavior is better explained by sophisticated pattern matching
RE: https://www.threads.net/... Brian Penny / @thebrianpenny : Great quick read from Gary Marcus pointing to several studies from ML researchers at Apple and Stanford pointing out ...
2024-10-12
I applaud the authors for proposing a new benchmark that goes beyond GSM8K. But after reading this work, it seems that most of the identified issues are due to poor associative-recall performance and not reasoning per se. For example, this work introduces GSM-NoOp in which a
Marcus on AI
Apple AI researchers say they found no evidence of formal reasoning in language models and their behavior is better explained by sophisticated pattern matching
Important new study from Apple — A superb new article on LLMs from six AI researchers at Apple who were brave enough …