A Newborn Embodied Turing Test for Visual Parsing

Manju Garimella, Denizhan Pak, Lalit Pandey, Justin N. Wood, & Samantha M. W. Wood

Abstract

Newborn brains exhibit remarkable abilities in rapid and generative learning, including the ability to parse objects from backgrounds and recognize those objects across substantial changes to their appearance (i.e., novel backgrounds and novel viewing angles). How can we build machines that can learn as efficiently as newborns? To accurately compare biological and artificial intelligence, researchers need to provide machines with the same training data that an organism has experienced since birth. Here, we present an experimental benchmark that enables researchers to raise artificial agents in the same controlled-rearing environments as newborn chicks. First, we raised newborn chicks in controlled environments with visual access to only a single object on a single background and tested their ability to recognize their object across novel viewing conditions. Then, we performed “digital twin” experiments in which we reared a variety of artificial neural networks in virtual environments that mimicked the rearing conditions of the chicks and measured whether they exhibited the same object recognition behavior as the newborn chicks. We found that biological chicks developed background-invariant object recognition, while the artificial chicks developed background-dependent recognition. Our benchmark exposes the limitations of current unsupervised and supervised algorithms in achieving the learning abilities of newborn animals. Ultimately, we anticipate that this approach will contribute to the development of AI systems that can learn with the same efficiency as newborn animals.

Experiment Design

  • VR chambers were equipped with two display walls (LCD monitors) for displaying object stimuli.

  • During the Training Phase, artificial chicks were reared in an environment containing a single 3D object rotating a full 360° around a horizontal axis in front of a naturalistic background scene. The object made a full rotation every 15s.

  • During the Test Phase, the VR chambers measured the artificial chicks’ imprinting response and object recognition performance. The “imprinting trials” measured whether the chicks developed an imprinting response. The “test trials” measured the aritifical chicks’ ability to visually parse and recognize their imprinted object. During these trials, the imprinted object was presented on one display wall and an unfamiliar object was presented on the other display wall. Across the test trials, the objects were presented on all possible combinations of the three background scenes (Background 1 vs.Background 1, Background 1 vs. Background 2, Background 1 vs.Background 3, etc.).

Arguments

Train configuration

agent_count: 1
run_id:exp1
log_path: data/exp1
mode: full
train_eps: 1000
test_eps: 40
cuda: 0
Agent:
  reward: supervised
  encoder: small
Environment:
  use_ship: true
  side_view: false
  background: A
  base_port: 5100
  env_path: data/executables/parsing_benchmark/parsing.x86_64
  log_path: data/ship_backgroundA_exp/Env_Logs
  rec_path: data/ship_backgroundA_exp/Recordings/
  record_chamber: false
  record_agent: false
  recording_frames: 0

Run script

python src/simulation/run_parsing_exp.py ++run_id=exp1 ++Environment.env_path=data/executables/parsing_benchmark/parsing_app.x86_64 ++mode=full ++train_eps=1000 ++test_eps=40 ++Agent.encoder="small" ++Environment.use_ship="true" ++Environment.background="A"

where

Environment.use_ship = True or False (to choose between Ship and Fork); Environment.background = A, B, C (to choose between the three background); mode = full or train or test (to choose between the three modes to run); Agent.encoder = “small”, “medium” or “large” to choose between the three different types of encoders: NatureCNN, resnet10 and resnet18 Agent.reward = “supervised” default

Custom Configuration:

  • Update train episode count; test episode count

  • Encoder types - small, medium and large

  • Reward types - supervised or ‘unsupervised’