MaGGIe is an efficient framework for multi-instance human matting using sparse convolution and transformer attention to ensure temporal consistency in videos.MaGGIe is an efficient framework for multi-instance human matting using sparse convolution and transformer attention to ensure temporal consistency in videos.

MaGGIe: Achieving Temporal Consistency in Video Instance Matting

Abstract and 1. Introduction

  1. Related Works

  2. MaGGIe

    3.1. Efficient Masked Guided Instance Matting

    3.2. Feature-Matte Temporal Consistency

  3. Instance Matting Datasets

    4.1. Image Instance Matting and 4.2. Video Instance Matting

  4. Experiments

    5.1. Pre-training on image data

    5.2. Training on video data

  5. Discussion and References

\ Supplementary Material

  1. Architecture details

  2. Image matting

    8.1. Dataset generation and preparation

    8.2. Training details

    8.3. Quantitative details

    8.4. More qualitative results on natural images

  3. Video matting

    9.1. Dataset generation

    9.2. Training details

    9.3. Quantitative details

    9.4. More qualitative results

Abstract

Human matting is a foundation task in image and video processing where human foreground pixels are extracted from the input. Prior works either improve the accuracy by additional guidance or improve the temporal consistency of a single instance across frames. We propose a new framework MaGGIe, Masked Guided Gradual Human Instance Matting, which predicts alpha mattes progressively for each human instances while maintaining the computational cost, precision, and consistency. Our method leverages modern architectures, including transformer attention and sparse convolution, to output all instance mattes simultaneously without exploding memory and latency. Although keeping constant inference costs in the multiple-instance scenario, our framework achieves robust and versatile performance on our proposed synthesized benchmarks. With the higher quality image and video matting benchmarks, the novel multi-instance synthesis approach from publicly available sources is introduced to increase the generalization of models in real-world scenarios. Our code and datasets are available at https://maggie-matt.github.io.

1. Introduction

In image matting, a trivial solution is to predict the pixel transparency - alpha matte α ∈ [0, 1] for precise background removal. Considering a saliency image I with two main components, foreground F and background B, the image I is expressed as I = αF + (1 − α)B. Because of the ambiguity in detecting the foreground region, for example, whether a person’s belongings are a part of the human foreground or not, many methods [11, 16, 31, 37] leverage additional guidance, typically trimaps, defining foreground, background, and unknown or transition regions. However, creating trimaps, especially for videos, is resourceintensive. Alternative binary masks [39, 56] are simpler to obtain by human drawings or off-the-shelf segmentation models while offering greater flexibility without hardly con-

\ Figure 1. Our MaGGIe delivers precise and temporally consistent alpha mattes. It adeptly preserves intricate details and demonstrates robustness against noise in instance guidance masks by effectively utilizing information from adjacent frames. Red arrows highlight the areas of detailed zoom-in. (Optimally viewed in color and digital zoom in).

\ straint output values of regions as trimaps. Our work focuses but is not limited to human matting because of the higher number of available academic datasets and user demand in many applications [1, 2, 12, 15, 44] compared to other objects.

\ When working with video input, the problem of creating trimap guidance is often resolved by guidance propagation [17, 45] where the main idea coming from video object segmentation [8, 38]. However, the performance of trimap propagation degrades when video length grows. The failed trimap predictions, which miss some natures like the alignment between foreground-unknown-background regions, lead to incorrect alpha mattes. We observe that using binary masks for each frame gives more robust results. However, the consistency between the frame’s output is still important for any video matting approach. For example, holes appearing in a random frame because of wrong guidance should be corrected by consecutive frames. Many works [17, 32, 34, 45, 53] constrain the temporal consistency at feature maps between frames. Since the alpha matte values are very sensitive, feature-level aggregation is not an absolute guarantee of the problem. Some methods [21, 50] in video segmentation and matting compute the incoherent regions to update values across frames. We propose a temporal consistency module that works in both feature and output spaces to produce consistent alpha mattes.

\

\ Besides the temporal consistency, when extending the instance matting to videos containing a large number of frames and instances, the careful network design to prevent the explosion in the computational cost is also a key challenge. In this work, we propose several adjustments to the popular mask-guided progressive refinement architecture [56]. Firstly, by using the mask guidance embedding inspired by AOT [55], the input size reduces to a constant number of channels. Secondly, with the advancement of transformer attention in various vision tasks [40–42], we inherit the query-based instance segmentation [7, 19, 23] to predict instance mattes in one forward pass instead of separated estimation. It also replaces the complex refinement in previous work with the interaction between instances by attention mechanism. To save the high cost of transformer attention, we only perform multi-instance prediction at the coarse level and adapt the progressive refinement at multiple scales [18, 56]. However, using full convolution for the refinement as previous works are inefficient as less than 10% of values are updated at each scale, which is also mentioned in [50]. The replacement of sparse convolution [36] saves the inference cost significantly, keeping the constant complexity of the algorithm since only interested locations are refined. Nevertheless, the lack of information at a larger scale when using sparse convolution can cause a dominance problem, which leads to the higher-scale prediction copying the lower outputs without adding fine-grained details. We propose an instance guidance method to help the coarser prediction guide but not contribute to the finer alpha matte.

\ In addition to the framework design, we propose a new training video dataset and benchmarks for instance-awareness matting. Besides the new large-scale high-quality synthesized image instance matting, an extension of the current instance image matting benchmark adds more robustness with different guidance quality. For video input, our synthesized training and benchmark are constructed from various public instance-agnostic datasets with three levels of difficulty.

\ In summary, our contributions include:

\ • A highly efficient instance matting framework with mask guidance that has all instances interacting and processed in a single forward pass.

\ • A novel approach that considers feature-matte levels to maintain matte temporal consistency in videos.

\ • Diverse training datasets and robust benchmarks for image and video instance matting that bridge the gap between synthesized and natural cases.

\

:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Market Opportunity
Multichain Logo
Multichain Price(MULTI)
$0.03816
$0.03816$0.03816
-1.85%
USD
Multichain (MULTI) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Japan-Based Bitcoin Treasury Company Metaplanet Completes $1.4 Billion IPO! Will It Buy Bitcoin? Here Are the Details

Japan-Based Bitcoin Treasury Company Metaplanet Completes $1.4 Billion IPO! Will It Buy Bitcoin? Here Are the Details

The post Japan-Based Bitcoin Treasury Company Metaplanet Completes $1.4 Billion IPO! Will It Buy Bitcoin? Here Are the Details appeared on BitcoinEthereumNews.com. Japan-based Bitcoin treasury company Metaplanet announced today that it has successfully completed its public offering process. Metaplanet Grows Bitcoin Treasury with $1.4 Billion IPO The company’s CEO, Simon Gerovich, stated in a post on the X platform that a large number of institutional investors participated in the process. Among the investors, mutual funds, sovereign wealth funds, and hedge funds were notable. According to Gerovich, approximately 100 institutional investors participated in roadshows held prior to the IPO. Ultimately, over 70 investors participated in Metaplanet’s capital raising. Previously disclosed information indicated that the company had raised approximately $1.4 billion through the IPO. This funding will accelerate Metaplanet’s growth plans and, in particular, allow the company to increase its balance sheet Bitcoin holdings. Gerovich emphasized that this step will propel Metaplanet to its next stage of development and strengthen the company’s global Bitcoin strategy. Metaplanet has recently become one of the leading companies in Japan in promoting digital asset adoption. The company has previously stated that it views Bitcoin as a long-term store of value. This large-scale IPO is considered a significant step in not only strengthening Metaplanet’s capital but also consolidating Japan’s role in the global crypto finance market. *This is not investment advice. Follow our Telegram and Twitter account now for exclusive news, analytics and on-chain data! Source: https://en.bitcoinsistemi.com/japan-based-bitcoin-treasury-company-metaplanet-completes-1-4-billion-ipo-will-it-buy-bitcoin-here-are-the-details/
Share
BitcoinEthereumNews2025/09/18 08:42
CME Group to Launch Solana and XRP Futures Options

CME Group to Launch Solana and XRP Futures Options

The post CME Group to Launch Solana and XRP Futures Options appeared on BitcoinEthereumNews.com. An announcement was made by CME Group, the largest derivatives exchanger worldwide, revealed that it would introduce options for Solana and XRP futures. It is the latest addition to CME crypto derivatives as institutions and retail investors increase their demand for Solana and XRP. CME Expands Crypto Offerings With Solana and XRP Options Launch According to a press release, the launch is scheduled for October 13, 2025, pending regulatory approval. The new products will allow traders to access options on Solana, Micro Solana, XRP, and Micro XRP futures. Expiries will be offered on business days on a monthly, and quarterly basis to provide more flexibility to market players. CME Group said the contracts are designed to meet demand from institutions, hedge funds, and active retail traders. According to Giovanni Vicioso, the launch reflects high liquidity in Solana and XRP futures. Vicioso is the Global Head of Cryptocurrency Products for the CME Group. He noted that the new contracts will provide additional tools for risk management and exposure strategies. Recently, CME XRP futures registered record open interest amid ETF approval optimism, reinforcing confidence in contract demand. Cumberland, one of the leading liquidity providers, welcomed the development and said it highlights the shift beyond Bitcoin and Ethereum. FalconX, another trading firm, added that rising digital asset treasuries are increasing the need for hedging tools on alternative tokens like Solana and XRP. High Record Trading Volumes Demand Solana and XRP Futures Solana futures and XRP continue to gain popularity since their launch earlier this year. According to CME official records, many have bought and sold more than 540,000 Solana futures contracts since March. A value that amounts to over $22 billion dollars. Solana contracts hit a record 9,000 contracts in August, worth $437 million. Open interest also set a record at 12,500 contracts.…
Share
BitcoinEthereumNews2025/09/18 01:39
Why the Testing Method Developers Prefer Is Rarely Ever the One That Finds the Most Bugs

Why the Testing Method Developers Prefer Is Rarely Ever the One That Finds the Most Bugs

A replicated controlled study confirms that developers’ perceptions, preferences, and opinions about software testing techniques do not reliably predict actual
Share
Hackernoon2025/12/18 05:00