This paper introduces a flexible Transformer-based model for detecting anomalies in system logs. By embedding log templates with a pre-trained BERT model and incorporating positional and temporal encoding, it captures both semantic and sequential context within log sequences. The approach supports variable sequence lengths and configurable input features, enabling extensive experimentation across datasets. The model performs supervised binary classification to distinguish normal from anomalous patterns, using a [CLS]-like token for sequence-level representation. Overall, it pushes the boundaries of log-based anomaly detection by integrating modern NLP and deep learning techniques into system monitoring.This paper introduces a flexible Transformer-based model for detecting anomalies in system logs. By embedding log templates with a pre-trained BERT model and incorporating positional and temporal encoding, it captures both semantic and sequential context within log sequences. The approach supports variable sequence lengths and configurable input features, enabling extensive experimentation across datasets. The model performs supervised binary classification to distinguish normal from anomalous patterns, using a [CLS]-like token for sequence-level representation. Overall, it pushes the boundaries of log-based anomaly detection by integrating modern NLP and deep learning techniques into system monitoring.

Transformer-Based Anomaly Detection Using Log Sequence Embeddings

2025/11/04 01:52

Abstract

1 Introduction

2 Background and Related Work

2.1 Different Formulations of the Log-based Anomaly Detection Task

2.2 Supervised v.s. Unsupervised

2.3 Information within Log Data

2.4 Fix-Window Grouping

2.5 Related Works

3 A Configurable Transformer-based Anomaly Detection Approach

3.1 Problem Formulation

3.2 Log Parsing and Log Embedding

3.3 Positional & Temporal Encoding

3.4 Model Structure

3.5 Supervised Binary Classification

4 Experimental Setup

4.1 Datasets

4.2 Evaluation Metrics

4.3 Generating Log Sequences of Varying Lengths

4.4 Implementation Details and Experimental Environment

5 Experimental Results

5.1 RQ1: How does our proposed anomaly detection model perform compared to the baselines?

5.2 RQ2: How much does the sequential and temporal information within log sequences affect anomaly detection?

5.3 RQ3: How much do the different types of information individually contribute to anomaly detection?

6 Discussion

7 Threats to validity

8 Conclusions and References

\

3 A Configurable Transformer-based Anomaly Detection Approach

In this study, we introduce a novel transformer-based method for anomaly detection. The model takes log sequences as inputs to detect anomalies. The model employs a pretrained BERT model to embed log templates, enabling the representation of semantic information within log messages. These embeddings, combined with positional or temporal encoding, are subsequently inputted into the transformer model. The combined information is utilized in the subsequent generation of log sequence-level representations, facilitating the anomaly detection process. We design our model to be flexible: The input features are configurable so that we can use or conduct experiments with different feature combinations of the log data. Additionally, the model is designed and trained to handle input log sequences of varying lengths. In this section, we introduce our problem formulation and the detailed design of our method.

\ 3.1 Problem Formulation

We follow the previous works [1] to formulate the task as a binary classification task, in which we train our proposed model to classify log sequences into anomalies and normal ones in a supervised way. For the samples used in the training and evaluation of the model, we utilize a flexible grouping approach to generate log sequences of varying lengths. The details are introduced in Section 4

\ 3.2 Log Parsing and Log Embedding

In our work, we transform log events into numerical vectors by encoding log templates with a pre-trained language model. To obtain the log templates, we adopt the Drain parser [24], which is widely used and has good parsing performance on most of the public datasets [4]. We use a pre-trained sentence-bert model [25] (i.e., all-MiniLML6-v2 [26]) to embed the log templates generated by the log parsing process. The pre-trained model is trained with a contrastive learning objective and achieves state-ofthe-art performance on various NLP tasks. We utilize this pre-trained model to create a representation that captures semantic information of log messages and illustrates the similarity between log templates for the downstream anomaly detection model. The output dimension of the model is 384.

\ 3.3 Positional & Temporal Encoding

The original transformer model [27] adopts a positional encoding to enable the model to make use of the order of the input sequence. As the model contains no recurrence and no convolution, the models will be agnostic to the log sequence without the positional encoding. While some studies suggest that transformer models without explicit positional encoding remain competitive with standard models when dealing with sequential data [28, 29], it is important to note that any permutation of the input sequence will produce the same internal state of the model. As sequential information or temporal information may be important indicators for anomalies within log sequences, previous works that are based on transformer models utilize the standard positional encoding to inject the order of log events or templates in the sequence [11, 12, 21], aiming to detect anomalies associated with the wrong execution order. However, we noticed that in a common-used replication implementation of a transformer-based method [5], the positional encoding was, in fact, omitted. To the best of our knowledge, no existing work has encoded the temporal information based on the timestamps of logs for their anomaly detection method. The effectiveness of utilizing sequential or temporal information in the anomaly detection task is unclear.

\ In our proposed method, we attempt to incorporate sequential and temporal encoding into the transformer model and explore the importance of sequential and temporal information for anomaly detection. Specifically, our proposed method has different variants utilizing the following sequential or temporal encoding techniques. The encoding is then added to the log representation, which serves as the input to the transformer structure.

\

3.3.1 Relative Time Elapse Encoding (RTEE)

We propose this temporal encoding method, RTEE, which simply substitutes the position index in positional encoding with the timing of each log event. We first calculate the time elapse according to the timestamps of log events in the log sequence. Instead of using the log event sequence index as the position to sinusoidal and cosinusoidal equations, we use the relative time elapse to the first log event in the log sequence to substitute the position index. Table 1 shows an example of time intervals in a log sequence. In the example, we have a log sequence containing 7 events with a time span of 7 seconds. The elapsed time from the first event to each event in the sequence is utilized to calculate the time encoding for the corresponding events. Similar to positional encoding, the encoding is calculated with the above-mentioned equations 1, and the encoding will not update during the training process.

\

3.4 Model Structure

The transformer is a neural network architecture that relies on the self-attention mechanism to capture the relationship between input elements in a sequence. The transformer-based models and frameworks have been used in the anomaly detection task by many previous works [6, 11, 12, 21]. Inspired by the previous works, we use a transformer encoder-based model for anomaly detection. We design our approach to accept log sequences of varying lengths and generate sequence-level representations. To achieve this, we have employed some specific tokens in the input log sequence for the model to generate sequence representation and identify the padded tokens and the end of the log sequence, drawing inspiration from the design of the BERT model [31]. In the input log sequence, we used the following tokens: is placed at the start of each sequence to allow the model to generate aggregated information for the entire sequence, is added at the end of the sequence to signify its completion, is used to mark the masked tokens under the self-supervised training paradigm, and is used for padded tokens. The embeddings for these special tokens are generated randomly based on the dimension of the log representation used. An example is shown in Figure 1, the time elapsed for , and are set to -1. The log event-level representation and positional or temporal embedding are summed as the input feature of the transformer structure.

\ 3.5 Supervised Binary Classification Under this training objective, we utilize the output of the first token of the transformer model while ignoring the outputs of the other tokens. This output of the first token is designed to aggregate the information of the whole input log sequence, similar to the token of the BERT model, which provides an aggregated representation of the token sequence. Therefore, we consider the output of this token as a sequence-level representation. We train the model with a binary classification objective (i.e., Binary Cross Entropy Loss) with this representation.

\

:::info Authors:

  1. Xingfang Wu
  2. Heng Li
  3. Foutse Khomh

:::

:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Understanding the Ethereum Interoperability Layer (EIL): Bridging L2 Fragmentation and Building a Seamless Cross-Chain Experience

Understanding the Ethereum Interoperability Layer (EIL): Bridging L2 Fragmentation and Building a Seamless Cross-Chain Experience

Author: Pan Zhixiong Ethereum has successfully addressed the scaling issue over the past few years by deploying multiple Layer 2 solutions, such as Arbitrum, Optimism, and Base, resulting in reduced transaction costs and increased efficiency. However, this has led to a fragmented user experience: each L2 network acts like an isolated island, with users facing cumbersome steps, different bridging protocols, and complex asset and gas management when crossing chains. To address this pain point, the Ethereum core team recently proposed the Ethereum Interop Layer (EIL). To understand EIL, we first need to review its foundation—ERC-4337. ERC-4337 is an account abstraction standard proposed by Ethereum. It requires no changes to the underlying Ethereum protocol, implementing a new type of account structure—the smart account—simply by deploying smart contracts. This type of account not only supports advanced features such as social recovery, multisignature, and batch operations, but also allows for gas payments using ERC-20 tokens via smart contracts. However, despite the many technological innovations brought by ERC-4337, its adoption in practice remains limited. Fragmented user experience, difficulties in multi-chain collaboration, high gas costs, and ecosystem compatibility issues all restrict the widespread adoption of 4337. The EIL was developed to address these issues on top of ERC-4337. EIL is an additional multi-chain interoperability protocol built upon the ERC-4337 framework . It extends the single-chain account abstraction to multi-chain account interoperability, enabling a seamless experience across multiple L2 networks. Specifically, EIL implements two important innovations: one-signature multi-chain operations (bulk authorization) and a competitive funding mechanism for cross-chain liquidity providers (XLPs). The first innovation, bulk authorization , allows users to authorize multiple operations across multiple L2 networks with a single signature. Specifically, the wallet first constructs its own UserOperation on each relevant chain, then integrates these operations into a Merkle tree. Users only need to sign the root of the tree once. When a smart account on each chain verifies a received UserOperation, it only needs to verify that it belongs to the Merkle tree and that the signature is valid to execute the operation. This approach significantly simplifies the cross-chain operation process for users. The second innovation, the auction-based funding mechanism, introduces a role called Cross-chain Liquidity Provider (XLP). XLPs are responsible for providing asset transfer and gas payment services between different chains. When a user locks assets on the source chain and submits a cross-chain request, multiple XLPs can bid on the request through on-chain auction. The XLP that wins the bid provides a cross-chain asset transfer voucher, allowing the user to directly obtain funds and gas payments on the target chain to complete the required cross-chain operation. Only after the transaction is completed will the XLP claim the user's previously locked assets on the source chain. To ensure security and fairness, XLPs must be staked on the Ethereum mainnet (L1) and subject to a strict dispute arbitration mechanism. If an XLP violates the rules, the staked assets will be forfeited, thus ensuring its integrity through economic incentives . It's worth emphasizing that EIL doesn't require any changes to the consensus protocol of the Ethereum mainnet or L2 network during its implementation . All implementations are based on smart contracts and the existing ERC-4337 account abstraction framework. This design not only reduces the difficulty of implementation but also significantly reduces the security risks the chain itself may face. Of course, this design also shifts the pressure and complexity to the wallet and off-chain infrastructure . The wallet needs to support complex multi-chain transaction construction, one-signature multi-chain verification, interaction mechanisms with CrossChainPaymaster and XLP, and needs to provide a simple and user-friendly interface. The off-chain infrastructure, on the other hand, needs to build a robust auction market, monitor XLP fund flows in real time, and manage risks. Ultimately, EIL provides users with a single-chain-like experience. In the future, when users open EIL-enabled wallets, they will no longer need to frequently switch chains, manage cross-chain assets, or endure lengthy cross-chain waits and cumbersome procedures. All complex cross-chain details will be completed automatically outside the user's view, gradually unifying the user experience across the entire Ethereum L2 ecosystem and truly realizing the vision of multi-chain integration and seamless interoperability. EIL also opens up a whole new possibility for the entire Ethereum ecosystem: it not only solves the cross-chain user experience problem, but more importantly, it truly allows multiple L2 networks to "become one" in a secure, decentralized, and trustless way.
Share
PANews2025/11/21 14:00
Zeus Network Builds The Bridge: Connecting Bitcoin And Solana Ecosystems — Here’s How

Zeus Network Builds The Bridge: Connecting Bitcoin And Solana Ecosystems — Here’s How

Zeus Network is positioning itself at the heart of cross-chain innovation by linking Bitcoin’s unmatched security with Solana’s high-speed infrastructure. If successful, Zeus Network could become a cornerstone of cross-chain adoption, reshaping how value flows between blockchains in the ecosystem. Unlocking New Use Cases For Bitcoin In Solana DeFi Zeus Network is stepping into the spotlight as the project is designed to connect Bitcoin and Solana into one seamless ecosystem, the two most powerful blockchains in the crypto space. SkyeOps, in a post on X, has highlighted the core of Zeus Network’s technology, a decentralized permissionless communication layer that enables interaction between BTC and SOL. This innovative architecture is referred to as Layer 1.5, a hybrid model that leverages BTC security while tapping into SOL performance. Related Reading: Bitcoin Lightning Payment Zaps Across Satellite In Historic First SkyeOps identifies APOLLO as one of Zeus Network’s flagship products, a decentralized Bitcoin-paged token zBTC, an application that enables operations natively on the Solana blockchain. According to the analyst, this is a revolutionary step because it allows Bitcoin holders to participate and earn yield in Solana’s vibrant DeFi ecosystem without having to surrender custody of their BTC to a centralized third party. Furthermore, the network utilizes a novel architecture combining ZeusNode and the Zeus Program Library (ZPL) to facilitate secure cross-chain interactions. The Zeusnode serves as the backbone of the network, with a decentralized system of Guardians who validate and sign cross-chain transactions. Meanwhile, Zeus Program Library (ZPL) provides the essential tools that empower developers to build new applications and services that leverage BTC functionality directly on Solana. Bitcoin Liquidity On Solana Hits An All-Time High The founder of Sensei Holdings and Namaste group, Solana Sensei, has also pointed out a major milestone, celebrating the fact that the supply of BTC on the Solana network has hit a new all-time high, surpassing $1 billion for the first time. Related Reading: Bitcoin Consolidates Gains – Is a Bigger Move Coming Next? According to Solana Sensei, bringing the digital gold onto Solana’s high-performance blockchain enables BTC to gain the speed, low fees, composability, and deep liquidity of the most performant L1 in all cryptocurrencies. As a result, Bitcoin can operate at internet scale, enabling instantaneous trading, use as collateral in lending markets, seamless settlement in DeFi applications, and integration with real-world assets. This connection will create a perfect dynamic. Solana supercharges BTC utility, while BTC lends SOL the ultimate credibility and security as the backbone store of value. “Together, they are turning the vision of Web3 into a true global financial layer. My two favorite cryptos are winning,” Solana Sensei noted. Featured image from Pixabay, chart from Tradingview.com
Share
NewsBTC2025/09/19 05:00