The study evaluates a machine-learning framework for predicting vulnerable code changes, showing Random Forest delivers the highest accuracy, robust performance across reduced feature sets, and significantly stronger precision and recall during real-world online deployment using six years of AOSP data.The study evaluates a machine-learning framework for predicting vulnerable code changes, showing Random Forest delivers the highest accuracy, robust performance across reduced feature sets, and significantly stronger precision and recall during real-world online deployment using six years of AOSP data.

New Study Shows Random Forest Models Can Spot 80% of Vulnerabilities Before Code Merge

2025/11/19 17:00

ABSTRACT

I. INTRODUCTION

II. BACKGROUND

III. DESIGN

  • DEFINITIONS
  • DESIGN GOALS
  • FRAMEWORK
  • EXTENSIONS

IV. MODELING

  • CLASSIFIERS
  • FEATURES

V. DATA COLLECTION

VI. CHARACTERIZATION

  • VULNERABILITY FIXING LATENCY
  • ANALYSIS OF VULNERABILITY FIXING CHANGES
  • ANALYSIS OF VULNERABILITY-INDUCING CHANGES

VII. RESULT

  • N-FOLD VALIDATION
  • EVALUATION USING ONLINE DEPLOYMENT MODE

VIII. DISCUSSION

  • IMPLICATIONS ON MULTI-PROJECTS
  • IMPLICATIONS ON ANDROID SECURITY WORKS
  • THREATS TO VALIDITY
  • ALTERNATIVE APPROACHES

IX. RELATED WORK

CONCLUSION AND REFERENCES

\ \

VII. RESULT

The section conducts a comprehensive evaluation of the accuracy of our framework across both the training and inference phases, reflecting real-world performance.

\ A. N-FOLD VALIDATION

We first identify the optimal classifier type, followed by the feature dataset reduction.

==Classifier Selection.== To select the most accurate classifier type, all six types of classifiers are evaluated using the complete set of devised feature data. The training dataset incorporates information about all known ViCs. The evaluation employs the Weka v1.8 [38] toolkit with the default parameter configurations for each classifier, ensuring a fair comparison of their inherent performance.

\ Table IV shows the 12-fold validation result. The Random Forest classifier demonstrates the highest classification accuracy among the six types tested. It achieves ~60% recall for ViCs with 85% precision, while misclassifying only 3.9% of LNCs (calculated as 1–0.992×0.969). Based on the evaluation result, the rest of this study uses Random Forrest.

\ The superior performance of Random Forrest over the Decision Tree classifier is expected, as shown by the relative operating characteristic curve (ROC) area of 0.955 vs. 0.786. The Quinlan C4.5 classifier also maintains the notably lower precision and recall than Random Forest for classifying ViCs.

\ The logistic regression classifier exhibits the second-best performance in terms of ROC area, mainly thanks to its relatively high precision (0.768) for ViCs. However, its recall for ViCs is significantly lower (0.414) compared to the Decision Tree and Quinlan C4.5 classifiers. Similarly, the naïve Bayes classifier underperforms the logistic regression classifier across all three metrics (recall, precision, and ROC area).

\ Finally, the SVM classifier demonstrates the highest recall for LNCs and a good precision for LNCs, indicating that the model is over-fitted to the LNC samples. It can be confirmed by the fact that SVM does not show a good recall for ViCs. The over-fitting is likely because of the imbalanced training dataset, where the LNC samples significantly outnumber the ViC samples. SVM performance generally benefits from a balanced ratio of positive and negative examples (e.g., 1:1), which is particularly difficult in vulnerability classification tasks.

\ ==Feature Reduction.== Let us evaluate the performance of the Random Forrest classifier using various subsets of the devised feature data types. The process is to identify a highly effective feature subset that maintains high accuracy, while requiring less data collection during inference compared to using the full feature datasets.

\ Table V presents the evaluation results. As expected, the first row, using all six feature sets (VH, CC, RP, TM, HH, and PT) represents the best case. Removing the HH (Human History), PT (Process Tracking), or TM (Text Mining) feature sets individually leads to minor reductions in the recall (0.4–0.9%) and precision (0.6-2.8%) for classifying ViCs. Practically, it translates to ~5 misclassified ViCs out of the 585 ViCs and ~14 misclassified LNCs out of the 7,453 LNCs. The ROC area remains largely consistent across those three variations (0.954–0.957 for the 2nd, 3rd, and 4th rows), compared to the baseline of 0.955 (the 1st row in Table V).

\ Let us further investigate the accuracy achieved after removing both the HH (Human History) and PT (Process Tracking) feature sets, followed by the removal of all three (HH, PT, and TM). The results show that the VH (Vulnerability History), CC, and RP (Review Pattern) feature sets still provide high accuracy, exhibiting only a 0.3% reduction in LNC recall and a 4.5% reduction in ViC precision over when all features are used. The following discusses each of the three remaining feature sets in more details:

\ The VH (Vulnerability History) feature set aligns with the known factors used in the buggy component prediction (e.g., temporal, spatial, and churn localities). The results in this study demonstrate that those three types of localities remain relevant and effective for predicting vulnerabilities at the code change level. Among the six VH feature data types, VHtemporal_avg is the most impactful. It is confirmed by the fact that none of the other five VH feature data types alone could correctly classify a single ViC in isolation during the 12-fold validation experiment.

\ The CC (Change Complexity) feature set aligns with the established principle that complexity often leads to software defects, a relationship repeatedly observed when analyzing defect ratios of software components (e.g., files or modules). The data in this study further confirms that more complex code changes are indeed more likely to introduce vulnerabilities. Our VP framework thus signals software engineers to pay extra attention by selectively flagging a subset of code changes as higher risk (e.g., using predicted chances of vulnerabilities) It is to help identify and fix potential coding errors before those code changes are merged into a source code repository.

The data confirms the importance of the novel RP (Review Pattern) feature set in the VP framework. Complex code changes are likely to contain software faults, placing a burden on code reviewers to detect coding errors and guide authors toward fixes. While the RP feature set alone does not provide the highest accuracy for ViCs (e.g., 59.5% precision), combining it with the CC (Change Complexity) feature set significantly boosts the precision for ViCs (e.g., 88.6%).

\ The pairing helps identify situations such as: when complex code changes lack rigorous review before submission; or when authors self-approve complex changes without any explicit peer code reviews recorded. However, the RP and CC feature sets do not offer high ViC recall (e.g., 31.8%) as the pair target the specific code change characteristics. Many other factors contribute to ViCs slipped through code reviews and other pre-submit testing.

\ Further removing RP (Review Pattern) from the VH, CC, and RP sets significantly reduces accuracy (i.e., 80.5% precision for ViCs drops to 59%). Interestingly, even when both RP and CC (Change Complexity) are removed, the VH (Vulnerability History) features set alone still provides the higher accuracy than the VH and CC sets combined (e.g., the ROC area of 91.8% vs. 80.2%). It is partly due to VH leveraging the N-fold validation setting (i.e., learning from future ViCs to predict past ViCs). The next subsection (VII.B) addresses it using online inference and demonstrates a general counterexample applicable to all feature data types.

\ ==Potential as a Global Model==. To enable immediate deployment of a VP model across multiple projects, this study also investigates which feature data types are likely target project agnostic. Among the six feature sets, four (CC, RP, TM, and PT) are potentially not project-specific. In contrast, the HH (Human History) and VH (Vulnerability History) feature sets focus on vulnerability statistics tied to specific engineers and software modules, respectively. It suggests us that models trained using those two feature sets would not be directly transferable to other projects with different engineers and software modules.

\ We explore the possibility of a global VP model though another 12-fold validation study. Because one may argue TM (Text Mining) could be programming language-specific, the accuracy of a global model is evaluated without and with TM to assess its impact. Table VI shows that using only the CC, RP (Review Pattern), and PT (Process Tracking) feature sets yields relatively low ViC recall (32%) but notably high ViC precision (~90%). The precision increases further if TM is used together with CC, RP, and PT (~95%). The result is promising, as those feature sets could potentially be used across multiple projects due to their ability to minimize false positives.

\ Let us further reduce the feature sets by considering individual features. Using only the five features listed in the last row of Table VI (CCadd, CCrevision, CCrelative_revision, RPtime, and RPweekday) the VP model achieves a ROC area of 0.786, while maintaining a high ViC precision of 73.4%. It comes at the cost of a notable reduction in recall (i.e., to ~32% from ~60% when all feature sets are used). However, we argue that the penalty is minimal, as evidenced by the still-high LNC precision of 94.9%. Importantly, our approach remains significantly better than not using the VP framework at all, since it retains a ViC recall of 32.1%.

In our N-fold cross-validation, the recall for ViCs is not notably high. Yet it confirms the extra security coverage that can be provided by the VP framework without having to conduct extensive security testing. The relatively low recall is likely due to the validation process not fully capturing the inherent temporal relationships, dependencies, and patterns within the feature data. For instance, N-fold validation can reorder a ViC and its corresponding VfC such that the VfC precedes the ViC, violating the natural order. Consequently, we evaluate the VP framework using its online deployment mode to better reflect real-world scenarios.

\ B. EVALUATION USING ONLINE DEPLOYMENT MODE

This subsection evaluates the VP framework under its production deployment settings (namely, online deployment mode) using about six years of AOSP vulnerabilities data.

\ To achieve maximum accuracy, this experiment employs the Random Forest classifier and leverages all devised feature data types. The evaluation data originates from the AOSP frameworks/av project. Each month, the VP framework assesses all code changes submitted in that month using the latest model, trained on data available before that month begins. For this evaluation, it is assumed that a ViC is known if and only if it is merged. However, a more realistic scenario considers a ViC known if its corresponding VfC is merged. The assumption highlights the need for thorough security testing (e.g., fuzzing) to identify ViCs within an average of half a month after they are merged. Thus, existing security testing techniques are crucial to fully realize the potential of the VP framework.

\ Figure 7 presents the evaluation results of the online deployment mode. For ViCs, the framework demonstrates an average recall of 79.7% and an average precision of 98.2%. For LNCs, it achieves an average recall of 99.8% and an average precision of 98.5%. Those results indicates that the online VP framework can identify ~80% of ViCs with ~98% accuracy, while only misdiagnosing ~1.7% of LNCs (assuming no hidden vulnerabilities within LNCs) at presubmit time before code changes are merged. The actual misdiagnosis rate is likely lower than 1.7% due to potential

future discovery of vulnerabilities within LNCs. Similarly, the exact ViC accuracy metric values can change depending on the classification of newly discovered ViCs in the future. The direction and magnitude of such metric value changes depend on how those new ViCs were previously classified. Overall, these promising results warrant further investigation for industry and open source community deployments.

\ Significant variations exist in the ViC recall values (i.e., a standard deviation of 0.249). While one might assume low recall in months with few ViCs (e.g., <5), the sample correlation coefficient analysis shows no significant link (- 0.068) between the ViC recall and count (captured in Figure 8). In contrast, the LNC recall and precision values show less variations. Among those two, the precision exhibits slightly wider variation than the recall (i.e., a standard deviation of 0.017. It likely stems from the abundance of LNCs each month and the high LNC precision of the VP framework (as shown in Subsection VI.A).

\ The online mode demonstrates notably higher accuracy than the 12-fold validation using the same feature data and classifier. It is likely due to the online mode giving greater weights to recent history within its learning model, effectively leveraging the strong temporal correlations found in certain feature values. For example, a file in a ViC is likely to contain another ViC in the near future if the same software engineers continue working on the file (e.g., as author and reviewers) and are performing similar tasks (e.g., as part of a workstream to develop a new feature). The 12-fold validation, with its shuffled training and test data, does not fully capture such temporal causality. Consequently, the online mode results provide a more realistic assessment of the VP framework accuracy than the 12-fold validation ones.

\ Figure 8 reveals that an average of 7.4% of reviewed and merged code changes are classified as ViCs. The framework flags an average of 6.875 LNCs per month for additional security review. This manageable volume (<2 code changes per week) represents an acceptable review cost, especially considering the large number of full-time software engineers worked on the target project.

:::info Author:

  1. Keun Soo Yim

:::

:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Developers of Altcoin Traded on Binance Reveal Reason for Major Price Drop – “Legal Process Has Begun”

Developers of Altcoin Traded on Binance Reveal Reason for Major Price Drop – “Legal Process Has Begun”

The post Developers of Altcoin Traded on Binance Reveal Reason for Major Price Drop – “Legal Process Has Begun” appeared on BitcoinEthereumNews.com. Private computing network Nillion explained that the sharp volatility seen in the NIL token price yesterday was caused by a market maker selling a large amount without authorization. The company stated that the party in question did not respond to any communication from the team during and after the sale. Nillion announced that it initiated a buyback process immediately following the incident, using funds from the treasury. It also stated that it had worked with exchanges to freeze accounts related to the sale and initiate legal action against the person or institution responsible. The company maintained that such unauthorized transactions occur from time to time in the crypto space, but that they would not remain passive this time. Nillion also announced that any funds recovered from the unauthorized token sales would be used for additional buybacks. NIL price has lost 36.3% of its value in the last 24 hours and is trading at $0.118 at the time of writing. Chart showing the decline in the price of NIL. NIL broke its all-time high price record at $0.95 about 8 months ago and is trading 87% lower than that record level at the time of writing. *This is not investment advice. Follow our Telegram and Twitter account now for exclusive news, analytics and on-chain data! Source: https://en.bitcoinsistemi.com/developers-of-altcoin-traded-on-binance-reveal-reason-for-major-price-drop-legal-process-has-begun/
Share
BitcoinEthereumNews2025/11/21 13:29
Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals

Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals

BitcoinWorld Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals The financial world often keeps us on our toes, and Wednesday was no exception. Investors watched closely as the US stock market concluded the day with a mixed performance across its major indexes. This snapshot offers a crucial glimpse into current investor sentiment and economic undercurrents, prompting many to ask: what exactly happened? Understanding the Latest US Stock Market Movements On Wednesday, the closing bell brought a varied picture for the US stock market. While some indexes celebrated gains, others registered slight declines, creating a truly mixed bag for investors. The Dow Jones Industrial Average showed resilience, climbing by a notable 0.57%. This positive movement suggests strength in some of the larger, more established companies. Conversely, the S&P 500, a broader benchmark often seen as a barometer for the overall market, experienced a modest dip of 0.1%. The technology-heavy Nasdaq Composite also saw a slight retreat, sliding by 0.33%. This particular index often reflects investor sentiment towards growth stocks and the tech sector. These divergent outcomes highlight the complex dynamics currently at play within the American economy. It’s not simply a matter of “up” or “down” for the entire US stock market; rather, it’s a nuanced landscape where different sectors and company types are responding to unique pressures and opportunities. Why Did the US Stock Market See Mixed Results? When the US stock market delivers a mixed performance, it often points to a tug-of-war between various economic factors. Several elements could have contributed to Wednesday’s varied closings. For instance, positive corporate earnings reports from certain industries might have bolstered the Dow. At the same time, concerns over inflation, interest rate policies by the Federal Reserve, or even global economic uncertainties could have pressured growth stocks, affecting the S&P 500 and Nasdaq. Key considerations often include: Economic Data: Recent reports on employment, manufacturing, or consumer spending can sway market sentiment. Corporate Announcements: Strong or weak earnings forecasts from influential companies can significantly impact their respective sectors. Interest Rate Expectations: The prospect of higher or lower interest rates directly influences borrowing costs for businesses and consumer spending, affecting future profitability. Geopolitical Events: Global tensions or trade policies can introduce uncertainty, causing investors to become more cautious. Understanding these underlying drivers is crucial for anyone trying to make sense of daily market fluctuations in the US stock market. Navigating Volatility in the US Stock Market A mixed close, while not a dramatic downturn, serves as a reminder that market volatility is a constant companion for investors. For those involved in the US stock market, particularly individuals managing their portfolios, these days underscore the importance of a well-thought-out strategy. It’s important not to react impulsively to daily movements. Instead, consider these actionable insights: Diversification: Spreading investments across different sectors and asset classes can help mitigate risk when one area underperforms. Long-Term Perspective: Focusing on long-term financial goals rather than short-term gains can help weather daily market swings. Stay Informed: Keeping abreast of economic news and company fundamentals provides context for market behavior. Consult Experts: Financial advisors can offer personalized guidance based on individual risk tolerance and objectives. Even small movements in major indexes can signal shifts that require attention, guiding future investment decisions within the dynamic US stock market. What’s Next for the US Stock Market? Looking ahead, investors will be keenly watching for further economic indicators and corporate announcements to gauge the direction of the US stock market. Upcoming inflation data, statements from the Federal Reserve, and quarterly earnings reports will likely provide more clarity. The interplay of these factors will continue to shape investor confidence and, consequently, the performance of the Dow, S&P 500, and Nasdaq. Remaining informed and adaptive will be key to understanding the market’s trajectory. Conclusion: Wednesday’s mixed close in the US stock market highlights the intricate balance of forces influencing financial markets. While the Dow showed strength, the S&P 500 and Nasdaq experienced slight declines, reflecting a nuanced economic landscape. This reminds us that understanding the ‘why’ behind these movements is as important as the movements themselves. As always, a thoughtful, informed approach remains the best strategy for navigating the complexities of the market. Frequently Asked Questions (FAQs) Q1: What does a “mixed close” mean for the US stock market? A1: A mixed close indicates that while some major stock indexes advanced, others declined. It suggests that different sectors or types of companies within the US stock market are experiencing varying influences, rather than a uniform market movement. Q2: Which major indexes were affected on Wednesday? A2: On Wednesday, the Dow Jones Industrial Average gained 0.57%, while the S&P 500 edged down 0.1%, and the Nasdaq Composite slid 0.33%, illustrating the mixed performance across the US stock market. Q3: What factors contribute to a mixed stock market performance? A3: Mixed performances in the US stock market can be influenced by various factors, including specific corporate earnings, economic data releases, shifts in interest rate expectations, and broader geopolitical events that affect different market segments uniquely. Q4: How should investors react to mixed market signals? A4: Investors are generally advised to maintain a long-term perspective, diversify their portfolios, stay informed about economic news, and avoid impulsive decisions. Consulting a financial advisor can also provide personalized guidance for navigating the US stock market. Q5: What indicators should investors watch for future US stock market trends? A5: Key indicators to watch include upcoming inflation reports, statements from the Federal Reserve regarding monetary policy, and quarterly corporate earnings reports. These will offer insights into the future direction of the US stock market. Did you find this analysis of the US stock market helpful? Share this article with your network on social media to help others understand the nuances of current financial trends! To learn more about the latest stock market trends, explore our article on key developments shaping the US stock market‘s future performance. This post Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals first appeared on BitcoinWorld.
Share
Coinstats2025/09/18 05:30