Learning Social Behavior with Reinforcement

This project was independently developed as part of the Seminar in Cognitive Science (COG403) at the University of Toronto.

Role

Cognitive Modeler, Python Developer, Researcher

Tools

pyClarion (Cognitive Architecture), Python, NumPy, Matplotlib

Focus Areas

Human-Like AI, Reinforcement Learning, Behavioral Modeling, Social Cognition

Timeline

January-April 2025

Contributions

Developed an end-to-end simulation pipeline in Python using pyClarion, integrating custom learning modules to evaluate multiple social learning scenarios in a dynamic environment.
Designed and implemented a dual-process learning agent using Rescorla-Wagner-based mechanisms for both stimulus-response and stimulus-value learning.
Created four targeted training scenarios to explore how social behaviors like imitation, threat avoidance, and social approach emerge through reinforcement learning.
Generated custom data visualizations to track key metrics such as average reward, Q-value progression, and associative strength across 900+ training trials
Conducted in-depth analysis of learning patterns to assess the model’s adaptability and the effectiveness of associative mechanisms in producing socially relevant behavior.

Project Overview

Can machines learn to imitate, avoid danger, or respond socially without explicit instructions?

In this simulation-based project, I modeled human social learning through purely associatve mechanisms, using the CLARION cognitive architecture and a custom implementation of the Rescorla-Wagner algorithm. By eliminating symbolic reasoning and relying solely on reinforcement-based learning, I explored how adaptive behaviors–such as imitation, social approach, and avoidance–can emerge from domain-general learning processes.

The results demonstrate that associative mechanisms alone can produce complex, socially relevant behaviors typically attributed to higher-level cognition. This offers meaningful insights into the foundational processes of social behavior in both humans and artificial agents.

Architecture

The agent architecture consists of:

Input process: encodes environmental stimuli into social or non-social categories.
Rescorla-Wagner Learners:
- Stimulus-Response Learning
- Stimulus-Value Learning
Choice Process: Selects actions based on learned associations and value estimates

Implemented in pyClarion, the system excludes symbolic reasoning and instead relies solely on reinforcement-driven associations updates.

Scenarios Simulated

Social Presence -> Approach
Behavioral Imitation
Stimulus X -> Approach
Predator Warning -> Escape

Each trial randomly sampled one of these contexts over 900 iterations, with periodic resets to simulate changing environmental demands.

Key Findings & Visualizations

Agent Confidence Over Time

Agent’s internal confidence (Q-values) steadily increased over time, indicating effective learning and convergence toward stable action policies.

Emergence of Social Behaviors

Distinct stimulus-response associations emerged across scenarios, showing how the agent learned behaviors like imitation, escape, and approach through reinforcement.

Social Reward Trends

Average reward per trial fluctuated across conditions, reflecting the complexity of learning in mixed social and non-social environments.

Impact & Insights

Demonstrated that complex behaviors like imitation can emerge from low-level reinforcement learning, without relying on explicit symbolic reasoning.
Provided a proof-of-concept for how AI agents could aquire social intelligence through trial-and-error learning alone.
Generated insights into cognitive flexibility, cue salience, and adaptive behavior modeling across both artificial and biological systems.

Significance

This model lays the groundwork for developing more human-like artificial agents that can learn socially through observation and interaction–key for applications in human-robot interaction, AI safety, and adaptive user modeling.

Future Work

To better emulate human-level social cognition, future iterations could incorporate:

Episodic memory structures
Hierarchical or symbolic reasoning modules
Multi-agent social environments with variable reward conditions