Annals of the New York Academy of Sciences 2021 | Abstraction and analogy-making in artificial intelligence

type

status

date

slug

summary

Some Terminologies

Raven’s progressive matrices, RPM

基本结构：

通常是一个3×3或2×2的矩阵

矩阵中的最后一个位置是空白的

需要从给定的多个选项中选择最合适的图案填充空白处

Deep learning approaches

Procedurally generated matrices, PGMs

A large dataset of RPM problems.

Each PGM is a set of triples, where each triple consists of

a specific relation

an object type

an attribute

1.2M training problems, 20K validation problems, 200K test problems.

RAVEN

Questioned whether the PGM dataset was diverse enough.

RAVEN dataset: 70,000 problems.

42000 training examples, 14000 test examples.

Impartial RAVEN

A major flaw of RAVEN: the seven incorrect answers were generated by randomly modifying a single attribute of the correct answer.

Allows a possible shortcut to determining the correct answer: just take the majority vote among the figures’ attributes.

Advantages

learn end-to-end from pixels, avoid relying on built-in knowledge.

Disadvantages

the need for a huge training set

the lack of transparency

open possibility of biases that allow shortcuts.

Human do not need to train extensively in advance to perform well on RPMs.

The approach of training such a system with a large set of examples is questionable.

The neural networks that do well on Raven’s test after extensive training are unable to transfer any of these skills to any other task.

Other methods

Metalearning

Metamapping

Symbolic methods

Advantages

Can offer unambiguous variables and variable-binding, clear type-token distinctions, and explicit measures of conceptual similarity and conceptual abstraction.

interpretability

Disadvantages

brittleness

Require humans to create and structure substantial prior knowledge

Often rely on semi-exhaustive search

Probabilistic program induction

Recognizing and generating handwritten characters

Task

one-shot learning—learning a concept from a single example.

input

Seeing one example of a handwritten character

output

not only recognize new examples of this character
but also being able to generate new examples

Concept & program

A program representing a concept consists of a hierarchical character-generation procedure.

The learning system collected a library of primitive pen strokes and learned probability distributions over features of these pen strokes.

It also learned transition probabilities between strokes and probabilities over types of relations among the strokes.

This collection of probabilities is the prior knowledge of the system.

One-shot classification

input

a single test image of a new character of class
along with 20 distinct characters in the same alphabet produced by human drawers.
Only one of the 20 was the same class as .

output

choose that character from the 20 choices

Bayesian model:

The term is approximated by a probabilistic search method to generate a program to represent of the form shown in the figure above.

The prior probabilities learned from the original training set can be used to approximate .

The term can be approximated by attempts to “refit” the program representing to .

One-shot generation

input

an example image of a hand-drawn character of class .

output

generate a new image of a character of class .

method:

first search to find a program representing

run this program to generate a new example.

Solving Bongard problems

Task

input

the raw pixels of the 12 frames

output

a rule —in a logic-like language—that is true of all frames on one side and none of the frames on the other side.

Method

define a probability distribution over possible rules.

is a rule

is the set of 12 input frames

is the rule grammar

Bayesian model:

The likelihood equal to 1 if

is true of all the frames on one side and none of the frames on the other side

the frames in are “informative” about , meaning that the frames must contain the objects or relationships that are mentioned in the rule.

The prior probability term is defined by the authors in a way that favors shorter rules.

💡

Bayesian probabilistic model:

Combine prior knowledge and preference with likelihoods.

Advantages

flexible abstraction

reusability

modularity

interpretability

Enable the combination of prior knowledge and preferences with likelihoods

Enables powerful sampling methods

Disadvantages

Require significant human-engineered prior knowledge in the form of a domain-specific language

The more expressive the language, the more daunting the combinatorial search problem these methods face

Discussion: how to make progress in AI on abstraction and analogy (重点）

Focus on idealized domains

Avoid real-world image- or language-based tasks, because these tasks have much richer meanings to humans than to the machines processing this data.

例如：RPM

Focus on core knowledge

Four core knowledge systems that human cognitions is founded on:

objects and intuitive physics

agents and goal-directedness

numbers and elementary arithmetic

spatial geometry of the environment

Evaluate systems across multiple domains

Adopt a diverse suite of challenge domains.

Focus on tasks that require little or no training

AI systems that can solve problems in idealized domains should require training only on the core concepts required in each domain.

Include generative tasks

Generative tasks are likely more resistant to shortcut learning than discriminative tasks, and systems that generate answers are in many cases more interpretable.

Evaluate systems on “hidden”, human-curated, changing sets or problems

Procedurally generated problems can unintentionally allow shortcut solutions.

They can also allow a system to reverse-engineer the generating algorithm instead of being able to solve the problems in a general way.

Evaluate systems on their robustness, not simply their accuracy

The evaluation benchmarks should feature various kinds of challenges in order to measure a system’s robustness to “noise” and other irrelevant distractions, and to variations in a given concept.