🪔Annals of the New York Academy of Sciences 2021 | Abstraction and analogy-making in artificial intelligence
type
status
date
slug
summary
tags
category
icon
password
Some Terminologies
Raven’s progressive matrices, RPM
基本结构:
- 通常是一个3×3或2×2的矩阵
- 矩阵中的最后一个位置是空白的
- 需要从给定的多个选项中选择最合适的图案填充空白处
Deep learning approaches
Procedurally generated matrices, PGMs
A large dataset of RPM problems.
Each PGM is a set of triples, where each triple consists of
- a specific relation
- an object type
- an attribute
1.2M training problems, 20K validation problems, 200K test problems.
RAVEN
Questioned whether the PGM dataset was diverse enough.
RAVEN dataset: 70,000 problems.
42000 training examples, 14000 test examples.
Impartial RAVEN
A major flaw of RAVEN: the seven incorrect answers were generated by randomly modifying a single attribute of the correct answer.
Allows a possible shortcut to determining the correct answer: just take the majority vote among the figures’ attributes.
Advantages
- learn end-to-end from pixels, avoid relying on built-in knowledge.
Disadvantages
- the need for a huge training set
- the lack of transparency
- open possibility of biases that allow shortcuts.
- Human do not need to train extensively in advance to perform well on RPMs.
The approach of training such a system with a large set of examples is questionable.
The neural networks that do well on Raven’s test after extensive training are unable to transfer any of these skills to any other task.
Other methods
- Metalearning
- Metamapping
Symbolic methods
Advantages
- Can offer unambiguous variables and variable-binding, clear type-token distinctions, and explicit measures of conceptual similarity and conceptual abstraction.
- interpretability
Disadvantages
- brittleness
- Require humans to create and structure substantial prior knowledge
- Often rely on semi-exhaustive search
Probabilistic program induction
Recognizing and generating handwritten characters
Task
one-shot learning—learning a concept from a single example.
- input
Seeing one example of a handwritten character
- output
- not only recognize new examples of this character
- but also being able to generate new examples
Concept & program
A program representing a concept consists of a hierarchical character-generation procedure.
The learning system collected a library of primitive pen strokes and learned probability distributions over features of these pen strokes.
It also learned transition probabilities between strokes and probabilities over types of relations among the strokes.
This collection of probabilities is the prior knowledge of the system.
One-shot classification
- input
- a single test image of a new character of class
- along with 20 distinct characters in the same alphabet produced by human drawers.
- Only one of the 20 was the same class as .
- output
- choose that character from the 20 choices
Bayesian model:
- The term is approximated by a probabilistic search method to generate a program to represent of the form shown in the figure above.
The prior probabilities learned from the original training set can be used to approximate .
- The term can be approximated by attempts to “refit” the program representing to .
One-shot generation
- input
- an example image of a hand-drawn character of class .
- output
- generate a new image of a character of class .
method:
- first search to find a program representing
- run this program to generate a new example.
Solving Bongard problems
Task
- input
- the raw pixels of the 12 frames
- output
- a rule —in a logic-like language—that is true of all frames on one side and none of the frames on the other side.
Method
define a probability distribution over possible rules.
- is a rule
- is the set of 12 input frames
- is the rule grammar
Bayesian model:
The likelihood equal to 1 if
- is true of all the frames on one side and none of the frames on the other side
- the frames in are “informative” about , meaning that the frames must contain the objects or relationships that are mentioned in the rule.
The prior probability term is defined by the authors in a way that favors shorter rules.
Bayesian probabilistic model:
Combine prior knowledge and preference with likelihoods.
Advantages
- flexible abstraction
- reusability
- modularity
- interpretability
- Enable the combination of prior knowledge and preferences with likelihoods
- Enables powerful sampling methods
Disadvantages
- Require significant human-engineered prior knowledge in the form of a domain-specific language
- The more expressive the language, the more daunting the combinatorial search problem these methods face
Discussion: how to make progress in AI on abstraction and analogy (重点)
Focus on idealized domains
Avoid real-world image- or language-based tasks, because these tasks have much richer meanings to humans than to the machines processing this data.
例如:RPM
Focus on core knowledge
Four core knowledge systems that human cognitions is founded on:
- objects and intuitive physics
- agents and goal-directedness
- numbers and elementary arithmetic
- spatial geometry of the environment
Evaluate systems across multiple domains
Adopt a diverse suite of challenge domains.
Focus on tasks that require little or no training
AI systems that can solve problems in idealized domains should require training only on the core concepts required in each domain.
Include generative tasks
Generative tasks are likely more resistant to shortcut learning than discriminative tasks, and systems that generate answers are in many cases more interpretable.
Evaluate systems on “hidden”, human-curated, changing sets or problems
Procedurally generated problems can unintentionally allow shortcut solutions.
They can also allow a system to reverse-engineer the generating algorithm instead of being able to solve the problems in a general way.
Evaluate systems on their robustness, not simply their accuracy
The evaluation benchmarks should feature various kinds of challenges in order to measure a system’s robustness to “noise” and other irrelevant distractions, and to variations in a given concept.