Insurmountable Hans
The ARC benchmark was designed by François Chollet to serve one goal : be sufficiently difficult, demanding so that it cannot be “hacked” by some “cheating” AI techniques, LLM or whatever. But it must do so in a rigorous, systematic and simple to define way, it cannot be vague or ambiguous. It must be (relatively) easy for a human, but hard for a program. And when I first looked at it, I admired its simplicity and purity: the problems are simple but deep, and it’s clear that for many of them, you need to grasp something that goes beyond mere pattern recognition or superficial pattern matching. They seem to require some seriously deeper thinking. And if, like me, you thought for a minute about how you’d try to tackle them, in a programmatic way (ML or otherwise), it was quite easy to become convinced, that this is quite a good benchmark. And at first, what happened, on Kaggle, for instance, was exactly that: nobody could get even remotely decent results, the problem set really felt like a though nut to crack. From there, the temptation was great, to suggest the idea that whenever ARC would be cracked, AGI would have arrived!