dotlah! dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
Social Links
  • zedreviews.com
  • citi.io
  • aster.cloud
  • liwaiwai.com
  • guzz.co.uk
  • atinatin.com
0 Likes
0 Followers
0 Subscribers
dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
  • Lah!
  • Technology

NUS, Facebook AI And Other World-Class Universities Collaborate To Teach AI To Understand The World Through Our Eyes

  • October 16, 2021
Total
0
Shares
0
0
0
2021 1014 Facebook Ego4D 1
By collecting first-person video data in everyday scenarios, such as having a meal in a food court, the NUS team and collaborators have forged an entirely new path for building smarter video AI models to power functions like memory augmentation: AR glasses can remember where users have left their wallets and even remind them not to leave it behind.

There is a marked difference between viewing and interacting with the world as a third-party spectator, and experiencing the action intimately from a first-person point of view.

This difference is similar to watching others ride a roller coaster, from the ground, as opposed to riding the roller coaster yourself – what ultimately ends up informing the experience and understanding of the roller coaster ride, is entirely disparate.

This is the current obstacle facing Artificial Intelligence (AI) and its applications for use.

To unlock the next wave of AI technology that will power future assistants and innovations for Augmented Reality (AR) and robotics, AI needs to evolve to an entirely new paradigm of egocentric (i.e. first-person) perception. This means teaching AI to understand the world through human eyes in the context of real-time motion, interaction, and multi-sensory observations.

To do so, a consortium has been formed by the National University of Singapore (NUS) and 12 other universities around the worldto undertake an ambitious, long-term project, called Egocentric 4D Live Perception (Ego4D).

A team from the NUS Department of Electrical and Computer Engineering has been actively working to collect first-person data, specifically in Singapore. This data was collected over the course of 2021, via head-mounted cameras and AR glasses distributed to a total of 40 participants locally. This allowed the NUS team to capture their eye-level, first-person, unscripted experiences of day-to-day scenarios: routine activities such as getting a haircut, dining at a hawker centre, or going to the gym. Based on the collected video data, the NUS researchers were then able to train AI models to understand people and their interactions by leveraging both audio and visual cues.

Assistant Professor Mike Shou, who leads the NUS research team, said, “Over the past 10 years, we have witnessed the revolution of AI for image understanding, which is built on the foundations laid down by datasets like the ImageNet. Similarly, I believe our Ego4D dataset will lay down the necessary groundwork for egocentric video research, and spur remarkable progress in building AI models for AR and robot applications.”

Improving AI technology for smart and useful assistants

The current paradigm of computer vision (CV) has, so far, excelled at understanding what is in an image or video by learning from vast amounts of online photos and videos from a third-person view, where the camera is a spectator from afar.

Advancements in first-person, or egocentric perception, will provide the fundamental building blocks necessary to develop smarter cognitive capabilities for AI assistants in the context of the person interacting with it. Such AI assistants will prove more useful in our day-to-day lives and work. Imagine: when trying to cook a new recipe, instead of referring repeatedly to a cookbook and attempting to multi-task, simply wearing a pair of AR smart glasses can direct you to perform each specific step, as you are doing it.

2021 1014 Facebook Ego4D 2
Examples of video data collected by the NUS team of first and third-person perspective, as a user is preparing a meal.

Asst Prof Shou explained, “In particular for Singapore and her aging population, such AI assistants can be an exceptional aid for the elderly, especially those with health conditions like dementia or Alzheimer’s. A pair of AR glasses could help elderly patients remember, and memorise “what happened and when”, to answer questions like where they left their keys, or if they remembered to lock the door. AI assistants applied to healthcare robotics can also understand if a person is speaking to, or looking at the robot itself, and thus take care of multiple patients in a single ward concurrently.”

This technology can be applied across a spectrum of devices, and fuel a world where physical, augmented, and virtual reality can co-exist together in a single space.

World’s largest first-person video data set

Today, the consortium is announcing the world’s largest first-person video data set that’s captured “in the wild”, featuring people going about their normal daily life. The NUS team, together with its partners, have collectively gathered more than 3,000 hours of first-person video data — which will be publicly available in November 2021 — from more than 700 research participants across nine countries.

2021 1014 Facebook Ego4D 3
Ego4D is a massive-scale egocentric video dataset of daily life activity spanning 73 locations worldwide. It offers greater global representation, and provides more diversity in scenes, people, and activities, which increases the applicability of AI models trained for people, across backgrounds, ethnicities, occupations, and ages. Photo credit: Ego4D Academic Consortium

Progress in the nascent field of egocentric perception depends on large volumes of egocentric data of daily life activities, considering most AI systems learn from thousands, if not millions, of examples. Existing data sets do not yet have the scale, diversity, and complexity necessary to be useful in the real-world.

This first-of-its-kind video dataset captures what the camera wearer chooses to gaze at in a specific environment; what the camera wearer is doing with their hands and objects in front of them; and how the camera wearer interacts with other people from the egocentric perspective. So far, the collection features camera wearers performing hundreds of activities and interactions with hundreds of different objects. All visible faces and audible speeches in the Ego4D dataset’s video footage have been consented to by the participants for public release.

Global representation is crucial for egocentric research since the egocentric visual

experience will significantly differ across cultural and geographic contexts. In particular, the NUS team’s collected data can be extrapolated to represent the Southeast Asian demographic, so that AI systems developed under-the-hood can ideally recognise the nuances and needs that may look different from region to region.

Unpacking the real-world data set

Equally important as data collection is defining the right research benchmarks, or tasks, that can be used for testing and measurement.

To provide a common objective for all the researchers involved to build fundamental research for real-world perception of visual and social contexts, the consortium has collectively developed five new, challenging benchmarks, which required rigorous annotation of the collective egocentric data set.

“Our NUS research team has been focusing on two of these key benchmarks: audio-visual diarization, that is, to leverage cues of sound and sight to help AI machines identify ‘who said what when’; and the second is to train AI models to better understand the nature of social interactions. A socially intelligent AI should understand who is speaking to whom, and who is paying attention to whom at any given point of time,” illustrated Professor Li Haizhou, a co-Principal Investigator also from the NUS Department of Electrical and Computer Engineering.

To date, the NUS team has created multi-modal deep learning models for:

1) Active speaker detection (detecting who is speaking, using both audio and visual signals)

2) Detecting who is speaking to whom

3) Detecting who is paying attention to whom in a given social interaction

Currently, these models are only trained on video data up to 100 hours, which is a small part of the entire Ego4D dataset. Next steps for the NUS team include conducting large-scale pre-training, to create one generic, strong model that is trained on the whole dataset and can learn to perform multiple tasks jointly.

In addition to Asst Prof Shou, other Principal Investigators that comprise the academic consortium include: CV Jawahar (International Institute of Information Technology, Hyderabad); David Crandall (Indiana University); Dima Damen (University of Bristol); Giovanni Maria Farinella (University of Catania); Bernard Ghanem (King Abdullah University of Science and Technology); Kris Kitani (Carnegie Mellon University, Pittsburgh and Africa); Aude Oliva and Antonio Torralba (Massachusetts Institute of Technology); Hyun Soo Park (University of Minnesota); Jim Rehg (Georgia Institute of Technology); Yoichi Sato (University of Tokyo); Jianbo Shi (University of Pennsylvania); and Pablo Arbelaez (Universidad de los Andes). For the complete list of benchmarks and partners involved, read more here. 

Total
0
Shares
Share
Tweet
Share
Share
Related Topics
  • AI
  • Artificial Intelligence
  • Ego4D
  • Egocentric 4D Live Perception
  • NUS
dotlah.com

Previous Article
  • Lah!
  • Society

A Cleaning Revolution: How JCS-Echigo Partnered A*STAR To Clean Faster And Smarter

  • October 16, 2021
View Post
Next Article
  • Lah!

How Can We Speed Up Singapore’s Clean Energy Efforts?

  • October 16, 2021
View Post
You May Also Like
View Post
  • Cities
  • Technology

Meralco PowerGen’s PacificLight starts up 100 MW fast-response plant in Singapore

  • dotlah.com
  • June 20, 2025
View Post
  • Technology

Apple services deliver powerful features and intelligent updates to users this autumn

  • Dean Marc
  • June 12, 2025
View Post
  • Artificial Intelligence
  • Machine Learning
  • Technology

Apple supercharges its tools and technologies for developers to foster creativity, innovation, and design

  • Dean Marc
  • June 11, 2025
View Post
  • Technology
  • Working Life

It’s time to stop debating whether AI is genuinely intelligent and focus on making it work for society

  • dotlah.com
  • June 8, 2025
oracle-ibm
View Post
  • Artificial Intelligence
  • Technology

IBM and Oracle Expand Partnership to Advance Agentic AI and Hybrid Cloud

  • Dean Marc
  • May 6, 2025
View Post
  • Software
  • Technology

Canonical Releases Ubuntu 25.04 Plucky Puffin

  • Dean Marc
  • April 17, 2025
View Post
  • Artificial Intelligence
  • Technology

Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology

  • Dean Marc
  • April 2, 2025
View Post
  • Lah!

Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them

  • John Francis
  • March 25, 2025


Trending
  • Meal | Egg | Tomato | Mushroom | Beans 1
    • Features
    • People
    Stiff Upper Lip and Sterling Craftsmanship. The Endearing Paradox of British Tradition and Luxury.
    • June 24, 2023
  • 2
    • Features
    • People
    Effective Ways For Teaching Children With Autism
    • August 8, 2020
  • 3
    • Lah!
    • Technology
    Asia Tech x Singapore Unveils Exciting Speaker Line-Up That Explores The Intersection Of Technology, Society & The Digital Economy
    • July 7, 2021
  • 4
    • Lah!
    Who Cleans Our Singapore?
    • September 10, 2019
  • 5
    • Lah!
    Singapore Media Festival 2019 Kicks Off With Local Premiere Of Anthony Chen’s Wet Season
    • November 22, 2019
  • 6
    • Technology
    How To Configure SSL In NGINX
    • February 1, 2018
  • 7
    • People
    Has The World Gone Far Enough In Women’s Rights?
    • April 1, 2020
  • 8
    • Lah!
    Coronavirus And The Digital Economy
    • April 17, 2020
  • 9
    • People
    Being Kind To Yourself Is Now Important More Than Ever: Tips That Will Help
    • July 29, 2020
  • 10
    • Lah!
    • Technology
    SUTD And Ecole 42 Collaborate To Set-Up First Tuition-Free, No Instructor Programme In Singapore
    • November 27, 2021
  • 11
    • Lah!
    SMU Lee Kong Chian School Of Business Ranked 8th Among Asia Business Schools In Financial Times’ EMBA Ranking 2020
    • October 28, 2020
  • 12
    • Technology
    The Weird, The Wacky, The Just Plain Cool: Best Of CES 2020
    • January 13, 2020
Trending
  • 1
    Meralco PowerGen’s PacificLight starts up 100 MW fast-response plant in Singapore
    • June 20, 2025
  • 2
    A Father’s Day Gift for Every Pop and Papa
    • June 14, 2025
  • 3
    Apple services deliver powerful features and intelligent updates to users this autumn
    • June 12, 2025
  • 4
    Apple supercharges its tools and technologies for developers to foster creativity, innovation, and design
    • June 11, 2025
  • 5
    It’s time to stop debating whether AI is genuinely intelligent and focus on making it work for society
    • June 8, 2025
  • 6
    PBBM asks Singapore to invest more in PH renewable energy projects
    • June 6, 2025
  • 7
    Singapore PM Wong arrives in Malacañang
    • June 4, 2025
  • 8
    Renewable energy, carbon credits are priority areas of cooperation for Singapore, Philippines: Lawrence Wong
    • June 4, 2025
  • 9
    Singapore businesses eye more investments in PH, says PM Wong
    • June 4, 2025
  • 10
    The Summer Adventures : Hiking and Nature Walks Essentials
    • June 3, 2025
Social Links
dotlah! dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
Connecting Dots Across Asia's Tech and Urban Landscape

Input your search keywords and press Enter.