dotlah! dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
Social Links
  • zedreviews.com
  • citi.io
  • aster.cloud
  • liwaiwai.com
  • guzz.co.uk
  • atinatin.com
0 Likes
0 Followers
0 Subscribers
dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
  • Machine Learning
  • Research
  • Science
  • Technology

Computer Vision System Marries Image Recognition And Generation

  • July 3, 2023
MIT MAGE
A unified vision system known as MAsked Generative Encoder (MAGE), developed by researchers at MIT and Google, could be useful for many things, like finding and classifying objects in an image, learning from just a few examples, generating images with specific conditions such as text or class, editing existing images, and more. Image: Alex Shipps/MIT CSAIL via Midjourney
Total
0
Shares
0
0
0

MAGE merges the two key tasks of image generation and recognition, typically trained separately, into a single system.

Rachel Gordon | MIT CSAIL

MIT MAGE
A unified vision system known as MAsked Generative Encoder (MAGE), developed by researchers at MIT and Google, could be useful for many things, like finding and classifying objects in an image, learning from just a few examples, generating images with specific conditions such as text or class, editing existing images, and more. Image: Alex Shipps/MIT CSAIL via Midjourney

Computers possess two remarkable capabilities with respect to images: They can both identify them and generate them anew. Historically, these functions have stood separate, akin to the disparate acts of a chef who is good at creating dishes (generation), and a connoisseur who is good at tasting dishes (recognition).

Yet, one can’t help but wonder: What would it take to orchestrate a harmonious union between these two distinctive capacities? Both chef and connoisseur share a common understanding in the taste of the food. Similarly, a unified vision system requires a deep understanding of the visual world.

Now, researchers in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have trained a system to infer the missing parts of an image, a task that requires deep comprehension of the image’s content. In successfully filling in the blanks, the system, known as the Masked Generative Encoder (MAGE), achieves two goals at the same time: accurately identifying images and creating new ones with striking resemblance to reality. 

This dual-purpose system enables myriad potential applications, like object identification and classification within images, swift learning from minimal examples, the creation of images under specific conditions like text or class, and enhancing existing images.

Unlike other techniques, MAGE doesn’t work with raw pixels. Instead, it converts images into what’s called “semantic tokens,” which are compact, yet abstracted, versions of an image section. Think of these tokens as mini jigsaw puzzle pieces, each representing a 16×16 patch of the original image. Just as words form sentences, these tokens create an abstracted version of an image that can be used for complex processing tasks, while preserving the information in the original image. Such a tokenization step can be trained within a self-supervised framework, allowing it to pre-train on large image datasets without labels. 

Now, the magic begins when MAGE uses “masked token modeling.” It randomly hides some of these tokens, creating an incomplete puzzle, and then trains a neural network to fill in the gaps. This way, it learns to both understand the patterns in an image (image recognition) and generate new ones (image generation).

“One remarkable part of MAGE is its variable masking strategy during pre-training, allowing it to train for either task, image generation or recognition, within the same system,” says Tianhong Li, a PhD student in electrical engineering and computer science at MIT, a CSAIL affiliate, and the lead author on a paper about the research. “MAGE’s ability to work in the ‘token space’ rather than ‘pixel space’ results in clear, detailed, and high-quality image generation, as well as semantically rich image representations. This could hopefully pave the way for advanced and integrated computer vision models.” 

Apart from its ability to generate realistic images from scratch, MAGE also allows for conditional image generation. Users can specify certain criteria for the images they want MAGE to generate, and the tool will cook up the appropriate image. It’s also capable of image editing tasks, such as removing elements from an image while maintaining a realistic appearance.

Recognition tasks are another strong suit for MAGE. With its ability to pre-train on large unlabeled datasets, it can classify images using only the learned representations. Moreover, it excels at few-shot learning, achieving impressive results on large image datasets like ImageNet with only a handful of labeled examples.

The validation of MAGE’s performance has been impressive. On one hand, it set new records in generating new images, outperforming previous models with a significant improvement. On the other hand, MAGE topped in recognition tasks, achieving an 80.9 percent accuracy in linear probing and a 71.9 percent 10-shot accuracy on ImageNet (this means it correctly identified images in 71.9 percent of cases where it had only 10 labeled examples from each class).

Despite its strengths, the research team acknowledges that MAGE is a work in progress. The process of converting images into tokens inevitably leads to some loss of information. They are keen to explore ways to compress images without losing important details in future work. The team also intends to test MAGE on larger datasets. Future exploration might include training MAGE on larger unlabeled datasets, potentially leading to even better performance. 

“It has been a long dream to achieve image generation and image recognition in one single system. MAGE is a groundbreaking research which successfully harnesses the synergy of these two tasks and achieves the state-of-the-art of them in one single system,” says Huisheng Wang, senior staff software engineer of humans and interactions in the Research and Machine Intelligence division at Google, who was not involved in the work. “This innovative system has wide-ranging applications, and has the potential to inspire many future works in the field of computer vision.” 

Li wrote the paper along with Dina Katabi, the Thuan and Nicole Pham Professor in the MIT Department of Electrical Engineering and Computer Science and a CSAIL principal investigator; Huiwen Chang, a senior research scientist at Google; Shlok Kumar Mishra, a University of Maryland PhD student and Google Research intern; Han Zhang, a senior research scientist at Google; and Dilip Krishnan, a staff research scientist at Google. Computational resources were provided by Google Cloud Platform and the MIT-IBM Watson AI Lab. The team’s research was presented at the 2023 Conference on Computer Vision and Pattern Recognition.

Reprinted with permission of MIT News (http://news.mit.edu/)

Total
0
Shares
Share
Tweet
Share
Share
Related Topics
  • Google
  • image generation
  • Machine Learning
  • MAGE
  • MAsked Generative Encoder
  • MIT
  • MIT CSAIL
John Francis

Previous Article
USA flag
  • Featured
  • People

Stars, Stripes, And Service. Exploring The Hierarchies And Heroes Of The U.S. Military

  • July 3, 2023
View Post
Next Article
usa-flag-justin-cron-_gtwjIzQLq4-unsplash
  • Features
  • People
  • World Events

Stars, Stripes, And Service. Exploring The Hierarchies And Heroes Of The U.S. Military

  • July 3, 2023
View Post
You May Also Like
View Post
  • Gears
  • Technology

Samsung Electronics Debuts Odyssey G7 Monitors, Showcasing Top Games on Its Displays at Gamescom 2025

  • Dean Marc
  • August 20, 2025
View Post
  • Artificial Intelligence
  • Technology

Thoughts on America’s AI Action Plan

  • Dean Marc
  • July 24, 2025
View Post
  • Technology

ESWIN Computing launches the EBC77 Series Single Board Computer with Ubuntu

  • dotlah.com
  • July 17, 2025
View Post
  • Gears
  • Technology

Samsung Galaxy Z Fold7: Raising the Bar for Smartphones

  • Dean Marc
  • July 9, 2025
View Post
  • Cities
  • Technology

Meralco PowerGen’s PacificLight starts up 100 MW fast-response plant in Singapore

  • dotlah.com
  • June 20, 2025
View Post
  • Technology

Apple services deliver powerful features and intelligent updates to users this autumn

  • Dean Marc
  • June 12, 2025
View Post
  • Artificial Intelligence
  • Machine Learning
  • Technology

Apple supercharges its tools and technologies for developers to foster creativity, innovation, and design

  • Dean Marc
  • June 11, 2025
View Post
  • Technology
  • Working Life

It’s time to stop debating whether AI is genuinely intelligent and focus on making it work for society

  • dotlah.com
  • June 8, 2025


Trending
  • Parliament Hall by Frederick Koberl 1
    • Features
    • People
    • Technology
    Inside International Institutions And Their Hierarchy
    • July 17, 2023
  • HP Transforms the Future of Work 2
    • Artificial Intelligence
    • Gears
    • Technology
    HP Transforms the Future of Work
    • September 24, 2024
  • 3
    • Cities
    • Lah!
    ST Engineering And UPS Introduce Skills Progression Programme To Train Aircraft Maintenance Technicians
    • October 16, 2021
  • The Sheeva.AI interface for EV Connect customers 4
    • Technology
    Sheeva.AI and EV Connect Enable In-Car EV Charging Transactions at Over 50,000 Chargers Across North America
    • January 8, 2025
  • 5
    • Technology
    Huawei Is Google-banned: What It Means For Users
    • May 24, 2019
  • 6
    • People
    • World Events
    Coronavirus: Could The World Have Prepared Better For A Pandemic?
    • May 11, 2020
  • 7
    • Lah!
    NEA Works With Town Councils On Intensive Two-Week Vector Control Exercise As Part Of Stepped-Up Efforts To Combat Dengue
    • July 28, 2020
  • remote-work-pexels-vlada-karpovich-4939658 8
    • Features
    • People
    5 Things To Keep In Mind When Transitioning To Remote Work
    • May 6, 2021
  • 9
    • Lah!
    • Technology
    SG Digital Community Hubs To Be Launched Islandwide To Boost Nationwide Digitalisation Movement
    • July 9, 2020
  • 10
    • Technology
    ESWIN Computing launches the EBC77 Series Single Board Computer with Ubuntu
    • July 17, 2025
  • beach-photo-1515350681417-300dee45d0e9 11
    • Cities
    What Are The Most Expensive Australian Cities To Buy A House In? Find Out Here
    • July 15, 2021
  • goswifties-accor-hotels-interior-good 12
    • Featured
    A Fan’s Guide To The Top 10 Arenas & Stadiums In Paris
    • May 12, 2024
Trending
  • French Fries 1
    Air Fryer: The One Cooking Appliance to Rule Them All – Best All-Around Picks in 2025
    • August 22, 2025
  • 2
    Samsung Electronics Debuts Odyssey G7 Monitors, Showcasing Top Games on Its Displays at Gamescom 2025
    • August 20, 2025
  • 3
    HP Cranks Up the Game with Smarter Systems, Cooler Builds, and Gear That Hits Different
    • August 14, 2025
  • 4
    New Trump tariffs: early modelling shows most economies lose – the US more than many
    • August 6, 2025
  • Scuba Diving 5
    Wetsuit or Drysuit? As always, it depends. This quick guide can help you choose.
    • August 2, 2025
  • 6
    Thoughts on America’s AI Action Plan
    • July 24, 2025
  • 7
    Introducing Surface Laptop 5G: Seamless connectivity, built for business
    • July 23, 2025
  • 8
    Press Start (Or Hit Enter)! Your Go-To Loadout for Streamers and Gamers.
    • July 19, 2025
  • 9
    ESWIN Computing launches the EBC77 Series Single Board Computer with Ubuntu
    • July 17, 2025
  • 10
    Samsung Galaxy Z Fold7: Raising the Bar for Smartphones
    • July 9, 2025
Social Links
dotlah! dotlah!
  • Cities
  • Technology
  • Business
  • Politics
  • Society
  • Science
  • About
Connecting Dots Across Asia's Tech and Urban Landscape

Input your search keywords and press Enter.