dotlah! dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
Social Links
  • zedreviews.com
  • citi.io
  • aster.cloud
  • liwaiwai.com
  • guzz.co.uk
  • atinatin.com
0 Likes
0 Followers
0 Subscribers
dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
  • Lah!
  • Technology

NUS, Facebook AI And Other World-Class Universities Collaborate To Teach AI To Understand The World Through Our Eyes

  • October 16, 2021
Total
0
Shares
0
0
0
2021 1014 Facebook Ego4D 1
By collecting first-person video data in everyday scenarios, such as having a meal in a food court, the NUS team and collaborators have forged an entirely new path for building smarter video AI models to power functions like memory augmentation: AR glasses can remember where users have left their wallets and even remind them not to leave it behind.

There is a marked difference between viewing and interacting with the world as a third-party spectator, and experiencing the action intimately from a first-person point of view.

This difference is similar to watching others ride a roller coaster, from the ground, as opposed to riding the roller coaster yourself – what ultimately ends up informing the experience and understanding of the roller coaster ride, is entirely disparate.

This is the current obstacle facing Artificial Intelligence (AI) and its applications for use.

To unlock the next wave of AI technology that will power future assistants and innovations for Augmented Reality (AR) and robotics, AI needs to evolve to an entirely new paradigm of egocentric (i.e. first-person) perception. This means teaching AI to understand the world through human eyes in the context of real-time motion, interaction, and multi-sensory observations.

To do so, a consortium has been formed by the National University of Singapore (NUS) and 12 other universities around the worldto undertake an ambitious, long-term project, called Egocentric 4D Live Perception (Ego4D).

A team from the NUS Department of Electrical and Computer Engineering has been actively working to collect first-person data, specifically in Singapore. This data was collected over the course of 2021, via head-mounted cameras and AR glasses distributed to a total of 40 participants locally. This allowed the NUS team to capture their eye-level, first-person, unscripted experiences of day-to-day scenarios: routine activities such as getting a haircut, dining at a hawker centre, or going to the gym. Based on the collected video data, the NUS researchers were then able to train AI models to understand people and their interactions by leveraging both audio and visual cues.

Assistant Professor Mike Shou, who leads the NUS research team, said, “Over the past 10 years, we have witnessed the revolution of AI for image understanding, which is built on the foundations laid down by datasets like the ImageNet. Similarly, I believe our Ego4D dataset will lay down the necessary groundwork for egocentric video research, and spur remarkable progress in building AI models for AR and robot applications.”

Improving AI technology for smart and useful assistants

The current paradigm of computer vision (CV) has, so far, excelled at understanding what is in an image or video by learning from vast amounts of online photos and videos from a third-person view, where the camera is a spectator from afar.

Advancements in first-person, or egocentric perception, will provide the fundamental building blocks necessary to develop smarter cognitive capabilities for AI assistants in the context of the person interacting with it. Such AI assistants will prove more useful in our day-to-day lives and work. Imagine: when trying to cook a new recipe, instead of referring repeatedly to a cookbook and attempting to multi-task, simply wearing a pair of AR smart glasses can direct you to perform each specific step, as you are doing it.

2021 1014 Facebook Ego4D 2
Examples of video data collected by the NUS team of first and third-person perspective, as a user is preparing a meal.

Asst Prof Shou explained, “In particular for Singapore and her aging population, such AI assistants can be an exceptional aid for the elderly, especially those with health conditions like dementia or Alzheimer’s. A pair of AR glasses could help elderly patients remember, and memorise “what happened and when”, to answer questions like where they left their keys, or if they remembered to lock the door. AI assistants applied to healthcare robotics can also understand if a person is speaking to, or looking at the robot itself, and thus take care of multiple patients in a single ward concurrently.”

This technology can be applied across a spectrum of devices, and fuel a world where physical, augmented, and virtual reality can co-exist together in a single space.

World’s largest first-person video data set

Today, the consortium is announcing the world’s largest first-person video data set that’s captured “in the wild”, featuring people going about their normal daily life. The NUS team, together with its partners, have collectively gathered more than 3,000 hours of first-person video data — which will be publicly available in November 2021 — from more than 700 research participants across nine countries.

2021 1014 Facebook Ego4D 3
Ego4D is a massive-scale egocentric video dataset of daily life activity spanning 73 locations worldwide. It offers greater global representation, and provides more diversity in scenes, people, and activities, which increases the applicability of AI models trained for people, across backgrounds, ethnicities, occupations, and ages. Photo credit: Ego4D Academic Consortium

Progress in the nascent field of egocentric perception depends on large volumes of egocentric data of daily life activities, considering most AI systems learn from thousands, if not millions, of examples. Existing data sets do not yet have the scale, diversity, and complexity necessary to be useful in the real-world.

This first-of-its-kind video dataset captures what the camera wearer chooses to gaze at in a specific environment; what the camera wearer is doing with their hands and objects in front of them; and how the camera wearer interacts with other people from the egocentric perspective. So far, the collection features camera wearers performing hundreds of activities and interactions with hundreds of different objects. All visible faces and audible speeches in the Ego4D dataset’s video footage have been consented to by the participants for public release.

Global representation is crucial for egocentric research since the egocentric visual

experience will significantly differ across cultural and geographic contexts. In particular, the NUS team’s collected data can be extrapolated to represent the Southeast Asian demographic, so that AI systems developed under-the-hood can ideally recognise the nuances and needs that may look different from region to region.

Unpacking the real-world data set

Equally important as data collection is defining the right research benchmarks, or tasks, that can be used for testing and measurement.

To provide a common objective for all the researchers involved to build fundamental research for real-world perception of visual and social contexts, the consortium has collectively developed five new, challenging benchmarks, which required rigorous annotation of the collective egocentric data set.

“Our NUS research team has been focusing on two of these key benchmarks: audio-visual diarization, that is, to leverage cues of sound and sight to help AI machines identify ‘who said what when’; and the second is to train AI models to better understand the nature of social interactions. A socially intelligent AI should understand who is speaking to whom, and who is paying attention to whom at any given point of time,” illustrated Professor Li Haizhou, a co-Principal Investigator also from the NUS Department of Electrical and Computer Engineering.

To date, the NUS team has created multi-modal deep learning models for:

1) Active speaker detection (detecting who is speaking, using both audio and visual signals)

2) Detecting who is speaking to whom

3) Detecting who is paying attention to whom in a given social interaction

Currently, these models are only trained on video data up to 100 hours, which is a small part of the entire Ego4D dataset. Next steps for the NUS team include conducting large-scale pre-training, to create one generic, strong model that is trained on the whole dataset and can learn to perform multiple tasks jointly.

In addition to Asst Prof Shou, other Principal Investigators that comprise the academic consortium include: CV Jawahar (International Institute of Information Technology, Hyderabad); David Crandall (Indiana University); Dima Damen (University of Bristol); Giovanni Maria Farinella (University of Catania); Bernard Ghanem (King Abdullah University of Science and Technology); Kris Kitani (Carnegie Mellon University, Pittsburgh and Africa); Aude Oliva and Antonio Torralba (Massachusetts Institute of Technology); Hyun Soo Park (University of Minnesota); Jim Rehg (Georgia Institute of Technology); Yoichi Sato (University of Tokyo); Jianbo Shi (University of Pennsylvania); and Pablo Arbelaez (Universidad de los Andes). For the complete list of benchmarks and partners involved, read more here. 

Total
0
Shares
Share
Tweet
Share
Share
Related Topics
  • AI
  • Artificial Intelligence
  • Ego4D
  • Egocentric 4D Live Perception
  • NUS
dotlah.com

Previous Article
  • Lah!
  • Society

A Cleaning Revolution: How JCS-Echigo Partnered A*STAR To Clean Faster And Smarter

  • October 16, 2021
View Post
Next Article
  • Lah!

How Can We Speed Up Singapore’s Clean Energy Efforts?

  • October 16, 2021
View Post
You May Also Like
View Post
  • Artificial Intelligence
  • Technology

U.S. Ski & Snowboard and Google Announce Collaboration to Build an AI-Based Athlete Performance Tool

  • Dean Marc
  • February 8, 2026
View Post
  • Artificial Intelligence
  • Technology

IBM to Support Missile Defense Agency SHIELD Contract

  • Dean Marc
  • February 5, 2026
Smartphone hero image
View Post
  • Gears
  • Technology

Zed Approves | Smartphones for Every Budget Range

  • Ackley Wyndam
  • January 29, 2026
View Post
  • People
  • Technology

This is what the new frontier of AI-powered financial inclusion looks like

  • dotlah.com
  • January 2, 2026
View Post
  • Artificial Intelligence
  • Technology

How AI can accelerate the energy transition, rather than compete with it

  • dotlah.com
  • November 19, 2025
View Post
  • Gears
  • Technology

Apple Vision Pro upgraded with the powerful M5 chip and comfortable Dual Knit Band

  • Dean Marc
  • October 15, 2025
View Post
  • Gears
  • Technology

Meet Samsung Galaxy Tab S11 Series: Packing Everything You Expect From a Premium Tablet

  • Dean Marc
  • September 4, 2025
View Post
  • Technology

Malaysia’s ‘ASEAN Shenzhen’ needs some significant legal reform to take off — here’s how

  • dotlah.com
  • August 25, 2025


Trending
  • car-fixing-pexels-photo-2244746 1
    • Cities
    How To Protect Your Cars From Harsh Weather Conditions
    • January 28, 2022
  • 2
    • Lah!
    Plans For A Green Connection From Orchard To Singapore River Unveiled
    • February 14, 2020
  • "Heat stress primes people to act more aggressively," says Craig A. Anderson. "We can see this play out on a larger scale across geographic regions and over time." (Credit: Getty Images) 3
    • Climate Change
    • Environment
    • People
    How Climate Change Will Push People Toward Violence
    • March 11, 2022
  • 4
    • Lah!
    Surbana Jurong, First In Singapore To Attain New International Building Standard
    • January 27, 2020
  • 5
    • Lah!
    Monetary Authority Of Singapore And Banque de France Break New Ground In CBDC Experimentation
    • July 9, 2021
  • 6
    • Cities
    • Lah!
    DBS Completes Singapore’s First USD SOFR-linked Export Financing Transaction With Global Agri-Business Bunge
    • June 25, 2021
  • 7
    • Environment
    • People
    Here’s Why The WHO Says A Coronavirus Vaccine Is 18 Months Away
    • February 15, 2020
  • 8
    • Lah!
    Sembcorp Unveils Strategic Plan To Transform Its Portfolio From Brown To Green
    • May 29, 2021
  • 9
    • Society
    Combating Medical Misinformation And Disinformation Amid Coronavirus Outbreak In Southeast Asia
    • February 8, 2020
  • 10
    • Technology
    The Highest-Funded Fintech Startups In The World
    • September 25, 2020
  • covid19 testing 11
    • People
    • World Events
    New Report Gives Covid-19 Testing Strategies To Help America’s Communities Open Safely
    • September 28, 2020
  • ruth bader ginsburg 12
    • People
    • Techton
    Ruth Bader Ginsburg Forged A New Place For Women In The Law And Society
    • September 25, 2020
Trending
  • 1
    How the Iran war could create a ‘fertiliser shock’ – an often ignored global risk to food prices and farming
    • March 6, 2026
  • 2
    About 23,000 community care sector employees could get at least 7% pay raise as part of new salary guidelines
    • February 18, 2026
  • 3
    U.S. Ski & Snowboard and Google Announce Collaboration to Build an AI-Based Athlete Performance Tool
    • February 8, 2026
  • 4
    IBM to Support Missile Defense Agency SHIELD Contract
    • February 5, 2026
  • Smartphone hero image 5
    Zed Approves | Smartphones for Every Budget Range
    • January 29, 2026
  • 6
    Zed Approves | Work From Anywhere, Efficiently – The 2026 Essential Gear Guide
    • January 20, 2026
  • 7
    Global power struggles over the ocean’s finite resources call for creative diplomacy
    • January 17, 2026
  • 8
    New research may help scientists predict when a humid heat wave will break
    • January 6, 2026
  • 9
    This is what the new frontier of AI-powered financial inclusion looks like
    • January 2, 2026
  • 10
    How bus stops and bike lanes can make or break your festive city trip
    • December 29, 2025
Social Links
dotlah! dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
Connecting Dots Across Asia's Tech and Urban Landscape

Input your search keywords and press Enter.