H-ARC: A Comprehensive Behavioral Dataset for the Abstraction and Reasoning Corpus

Solim LeGris, Wai Keen Vong, Brenden M. Lake and Todd M. Gureckis

New York University

Publication

Dataset

Github

How well do people reason about abstract visual program synthesis problems?

Training examples

People were asked to infer an underlying rule or program using training examples from 5 randomly select problems.

Test Example

Using their inferred rule or program, people were asked to generate an output for a given test input in three attempts or less.

State Space Graph

We collected action by action data from 1729 participants. The graph below shows visited states from all participants that attempted the particular task shown here.

Explore H-ARC

Explore participant responses, natural language descriptions, errors and state space graphs for tasks from the training and evaluation sets.

Go to Training Set

Go to Evaluation Set

Abstract

The Abstraction and Reasoning Corpus (ARC) is a visual program synthesis benchmark designed to test out-of-distribution generalization in machines. Comparing AI algorithms to human performance is essential to measure progress on these problems. In this paper, we present H-ARC (Human-ARC): a novel large-scale dataset containing solution attempts from over 1700 humans on ARC problems. The dataset spans the full set of 400 training and 400 evaluation tasks from the original ARC benchmark, and it is the largest human evaluation to date. By publishing the dataset, we contribute human responses to each problem, step-by-step behavioral action traces from the ARC user-interface, and natural-language solution descriptions of the inferred program/rule. We believe this dataset will be of value to researchers, both in cognitive science and AI, since it offers the potential to facilitate the discovery of underlying mechanisms supporting abstraction and reasoning in people. The insights to be gained from these data not only have value for cognitive science, but could in turn inform the design of more efficient, human-like AI algorithms.

Previous work

Fast and flexible: Human program induction in abstract reasoning tasks