dotlah! dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
Social Links
  • zedreviews.com
  • citi.io
  • aster.cloud
  • liwaiwai.com
  • guzz.co.uk
  • atinatin.com
0 Likes
0 Followers
0 Subscribers
dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
  • Lah!
  • Technology

NUS, Facebook AI And Other World-Class Universities Collaborate To Teach AI To Understand The World Through Our Eyes

  • October 16, 2021
Total
0
Shares
0
0
0
2021 1014 Facebook Ego4D 1
By collecting first-person video data in everyday scenarios, such as having a meal in a food court, the NUS team and collaborators have forged an entirely new path for building smarter video AI models to power functions like memory augmentation: AR glasses can remember where users have left their wallets and even remind them not to leave it behind.

There is a marked difference between viewing and interacting with the world as a third-party spectator, and experiencing the action intimately from a first-person point of view.

This difference is similar to watching others ride a roller coaster, from the ground, as opposed to riding the roller coaster yourself – what ultimately ends up informing the experience and understanding of the roller coaster ride, is entirely disparate.

This is the current obstacle facing Artificial Intelligence (AI) and its applications for use.

To unlock the next wave of AI technology that will power future assistants and innovations for Augmented Reality (AR) and robotics, AI needs to evolve to an entirely new paradigm of egocentric (i.e. first-person) perception. This means teaching AI to understand the world through human eyes in the context of real-time motion, interaction, and multi-sensory observations.

To do so, a consortium has been formed by the National University of Singapore (NUS) and 12 other universities around the worldto undertake an ambitious, long-term project, called Egocentric 4D Live Perception (Ego4D).

A team from the NUS Department of Electrical and Computer Engineering has been actively working to collect first-person data, specifically in Singapore. This data was collected over the course of 2021, via head-mounted cameras and AR glasses distributed to a total of 40 participants locally. This allowed the NUS team to capture their eye-level, first-person, unscripted experiences of day-to-day scenarios: routine activities such as getting a haircut, dining at a hawker centre, or going to the gym. Based on the collected video data, the NUS researchers were then able to train AI models to understand people and their interactions by leveraging both audio and visual cues.

Assistant Professor Mike Shou, who leads the NUS research team, said, “Over the past 10 years, we have witnessed the revolution of AI for image understanding, which is built on the foundations laid down by datasets like the ImageNet. Similarly, I believe our Ego4D dataset will lay down the necessary groundwork for egocentric video research, and spur remarkable progress in building AI models for AR and robot applications.”

Improving AI technology for smart and useful assistants

The current paradigm of computer vision (CV) has, so far, excelled at understanding what is in an image or video by learning from vast amounts of online photos and videos from a third-person view, where the camera is a spectator from afar.

Advancements in first-person, or egocentric perception, will provide the fundamental building blocks necessary to develop smarter cognitive capabilities for AI assistants in the context of the person interacting with it. Such AI assistants will prove more useful in our day-to-day lives and work. Imagine: when trying to cook a new recipe, instead of referring repeatedly to a cookbook and attempting to multi-task, simply wearing a pair of AR smart glasses can direct you to perform each specific step, as you are doing it.

2021 1014 Facebook Ego4D 2
Examples of video data collected by the NUS team of first and third-person perspective, as a user is preparing a meal.

Asst Prof Shou explained, “In particular for Singapore and her aging population, such AI assistants can be an exceptional aid for the elderly, especially those with health conditions like dementia or Alzheimer’s. A pair of AR glasses could help elderly patients remember, and memorise “what happened and when”, to answer questions like where they left their keys, or if they remembered to lock the door. AI assistants applied to healthcare robotics can also understand if a person is speaking to, or looking at the robot itself, and thus take care of multiple patients in a single ward concurrently.”

This technology can be applied across a spectrum of devices, and fuel a world where physical, augmented, and virtual reality can co-exist together in a single space.

World’s largest first-person video data set

Today, the consortium is announcing the world’s largest first-person video data set that’s captured “in the wild”, featuring people going about their normal daily life. The NUS team, together with its partners, have collectively gathered more than 3,000 hours of first-person video data — which will be publicly available in November 2021 — from more than 700 research participants across nine countries.

2021 1014 Facebook Ego4D 3
Ego4D is a massive-scale egocentric video dataset of daily life activity spanning 73 locations worldwide. It offers greater global representation, and provides more diversity in scenes, people, and activities, which increases the applicability of AI models trained for people, across backgrounds, ethnicities, occupations, and ages. Photo credit: Ego4D Academic Consortium

Progress in the nascent field of egocentric perception depends on large volumes of egocentric data of daily life activities, considering most AI systems learn from thousands, if not millions, of examples. Existing data sets do not yet have the scale, diversity, and complexity necessary to be useful in the real-world.

This first-of-its-kind video dataset captures what the camera wearer chooses to gaze at in a specific environment; what the camera wearer is doing with their hands and objects in front of them; and how the camera wearer interacts with other people from the egocentric perspective. So far, the collection features camera wearers performing hundreds of activities and interactions with hundreds of different objects. All visible faces and audible speeches in the Ego4D dataset’s video footage have been consented to by the participants for public release.

Global representation is crucial for egocentric research since the egocentric visual

experience will significantly differ across cultural and geographic contexts. In particular, the NUS team’s collected data can be extrapolated to represent the Southeast Asian demographic, so that AI systems developed under-the-hood can ideally recognise the nuances and needs that may look different from region to region.

Unpacking the real-world data set

Equally important as data collection is defining the right research benchmarks, or tasks, that can be used for testing and measurement.

To provide a common objective for all the researchers involved to build fundamental research for real-world perception of visual and social contexts, the consortium has collectively developed five new, challenging benchmarks, which required rigorous annotation of the collective egocentric data set.

“Our NUS research team has been focusing on two of these key benchmarks: audio-visual diarization, that is, to leverage cues of sound and sight to help AI machines identify ‘who said what when’; and the second is to train AI models to better understand the nature of social interactions. A socially intelligent AI should understand who is speaking to whom, and who is paying attention to whom at any given point of time,” illustrated Professor Li Haizhou, a co-Principal Investigator also from the NUS Department of Electrical and Computer Engineering.

To date, the NUS team has created multi-modal deep learning models for:

1) Active speaker detection (detecting who is speaking, using both audio and visual signals)

2) Detecting who is speaking to whom

3) Detecting who is paying attention to whom in a given social interaction

Currently, these models are only trained on video data up to 100 hours, which is a small part of the entire Ego4D dataset. Next steps for the NUS team include conducting large-scale pre-training, to create one generic, strong model that is trained on the whole dataset and can learn to perform multiple tasks jointly.

In addition to Asst Prof Shou, other Principal Investigators that comprise the academic consortium include: CV Jawahar (International Institute of Information Technology, Hyderabad); David Crandall (Indiana University); Dima Damen (University of Bristol); Giovanni Maria Farinella (University of Catania); Bernard Ghanem (King Abdullah University of Science and Technology); Kris Kitani (Carnegie Mellon University, Pittsburgh and Africa); Aude Oliva and Antonio Torralba (Massachusetts Institute of Technology); Hyun Soo Park (University of Minnesota); Jim Rehg (Georgia Institute of Technology); Yoichi Sato (University of Tokyo); Jianbo Shi (University of Pennsylvania); and Pablo Arbelaez (Universidad de los Andes). For the complete list of benchmarks and partners involved, read more here. 

Total
0
Shares
Share
Tweet
Share
Share
Related Topics
  • AI
  • Artificial Intelligence
  • Ego4D
  • Egocentric 4D Live Perception
  • NUS
dotlah.com

Previous Article
  • Lah!
  • Society

A Cleaning Revolution: How JCS-Echigo Partnered A*STAR To Clean Faster And Smarter

  • October 16, 2021
View Post
Next Article
  • Lah!

How Can We Speed Up Singapore’s Clean Energy Efforts?

  • October 16, 2021
View Post
You May Also Like
oracle-ibm
View Post
  • Artificial Intelligence
  • Technology

IBM and Oracle Expand Partnership to Advance Agentic AI and Hybrid Cloud

  • Dean Marc
  • May 6, 2025
View Post
  • Software
  • Technology

Canonical Releases Ubuntu 25.04 Plucky Puffin

  • Dean Marc
  • April 17, 2025
View Post
  • Artificial Intelligence
  • Technology

Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology

  • Dean Marc
  • April 2, 2025
View Post
  • Lah!

Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them

  • John Francis
  • March 25, 2025
View Post
  • Artificial Intelligence
  • Technology

IBM contributes key open-source projects to Linux Foundation to advance AI community participation

  • dotlah.com
  • March 22, 2025
View Post
  • Artificial Intelligence
  • Technology

Mitsubishi Motors Canada Launches AI-Powered “Intelligent Companion” to Transform the 2025 Outlander Buying Experience

  • Dean Marc
  • March 10, 2025
View Post
  • Lah!

Canonical announces 12 year Kubernetes LTS

  • John Francis
  • March 4, 2025
View Post
  • Technology

New Meta for Education Offering is Now Generally Available

  • Dean Marc
  • February 26, 2025


Trending
  • Forest fire 1
    • Environment
    • People
    A changing climate, growing human populations and widespread fires contributed to the last major extinction event − can we prevent another?
    • August 19, 2023
  • 2
    • Society
    7 Dangerous Myths About Coronavirus Busted By The World Health Organization
    • February 11, 2020
  • 3
    • Lah!
    ​NTU Singapore Revitalises Yunnan Garden As A Place For Leisure, Education And Heritage
    • February 7, 2020
  • women-hannah-busing-Zyx1bK9mqmA-unsplash 4
    • People
    • World Events
    Only Twelve Countries Have Full Equal Rights for Women
    • March 8, 2022
  • 5
    • Lah!
    NEA Urges Residents Living In Dengue Cluster Areas To Focus On 3 Easy Actions To Protect Themselves And Their Loved Ones Against Dengue
    • August 8, 2020
  • 6
    • Science
    • Technology
    Space Exploration Is Still The Brightest Hope-Bringer We Have
    • May 27, 2020
  • 7
    • Lah!
    Promoting The Adoption Of Cleaner Commercial Vehicles In Singapore
    • March 5, 2020
  • 8
    • Cities
    • Lah!
    IMDA Collaborates With Industry To Help Singaporeans Continue To Enjoy A High Quality Of Life From Home With Digital Tech Amid COVID-19
    • March 26, 2020
  • 9
    • Research
    • Science
    Earth’s “Third Pole” and Its Role in Global Climate
    • August 27, 2023
  • Cloud platforms among the clouds 10
    • Computing
    • Public Cloud
    • Technology
    Best Cloud Platforms Offering Free Trials for Cloud Mastery
    • December 23, 2024
  • 11
    • Cities
    DBS And STB Forge Three-year Partnership To Stimulate Tourism Demand And Digitalise Local Businesses To Ensure Future-readiness
    • October 14, 2020
  • 12
    • Technology
    Innovating With AI: 5 Steps To Soar
    • January 28, 2020
Trending
  • college-of-cardinals-2025 1
    The Definitive Who’s Who of the 2025 Papal Conclave
    • May 8, 2025
  • conclave-poster-black-smoke 2
    The World Is Revalidating Itself
    • May 7, 2025
  • oracle-ibm 3
    IBM and Oracle Expand Partnership to Advance Agentic AI and Hybrid Cloud
    • May 6, 2025
  • 4
    Conclave: How A New Pope Is Chosen
    • April 25, 2025
  • 5
    Canonical Releases Ubuntu 25.04 Plucky Puffin
    • April 17, 2025
  • 6
    Mathematicians uncover the logic behind how people walk in crowds
    • April 3, 2025
  • 7
    Tokyo Electron and IBM Renew Collaboration for Advanced Semiconductor Technology
    • April 2, 2025
  • 8
    Tariffs, Trump, and Other Things That Start With T – They’re Not The Problem, It’s How We Use Them
    • March 25, 2025
  • 9
    IBM contributes key open-source projects to Linux Foundation to advance AI community participation
    • March 22, 2025
  • PiPiPi 10
    The Unexpected Pi-Fect Deals This March 14
    • March 14, 2025
Social Links
dotlah! dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
Connecting Dots Across Asia's Tech and Urban Landscape

Input your search keywords and press Enter.