OWASP Global AppSec EU 2026 Vienna: Full Schedule

arrow_back View All Dates

10:30am CEST

Scanning Agentic AI Systems: Beyond Traditional LLM Red Teaming

Thursday June 25, 2026 10:30am - 11:15am CEST

As agentic AI systems evolve from simple LLM interfaces into autonomous and multi-agent workflows. Given the high autonomy of agentic AI systems, there is a growing need to perform a detailed risk assessment, which means traditional LLM-focused red teaming is no longer enough. Unlike standalone LLMs with text input and output, agentic systems interact with tools, memory, external data, and other agents, creating many new attack surfaces. Attacks may be introduced through emails, tool descriptions, or environmental content, and their impact can go beyond model responses to affect system behavior, planning, and perform harmful real-world actions.

In this talk, we share our hands-on journey building a comprehensive red teaming scanning solution tailored for agentic AI systems. We begin by analyzing why current scanning tools fall short, specifically their emphasis on structured components (e.g., protocols like MCP, A2A, and Skills) while overlooking unstructured and highly dynamic attack vectors where most real-world risks emerge. We then walk through the technical challenges of simulating realistic attacks without harming production environments, handling the diversity of agent architectures, frameworks, and agency-levels, and designing scanners that generalize across heterogeneous systems.

We present a practical full scanning pipeline that creates a novel holistic solution, including sandboxing and emulation strategies, automated system discovery pipelines, abstraction-based scanning mechanisms, and a risk-aware robustness scoring framework that goes beyond binary attack success. Throughout the talk, we highlight concrete lessons learned, trade-offs between cost and reliability, and real examples of agent-specific vulnerabilities.
We conclude with a concrete end-to-end scanning workflow and discuss open challenges such as adaptive scanner generation and black-box agent discovery. Attendees will leave with a deep understanding of why agentic AI requires fundamentally new red teaming methodologies and with actionable techniques for securing real-world autonomous AI systems.

Speakers

Roman Vainshtein

Research Director, GenAI Trust, Fujitsu Research of Europe

I am Research Director of the Generative AI Trust and Security Research team at Fujitsu Research of Europe, where I lead efforts to enhance the security, trustworthiness, and resilience of Generative AI systems. My work focuses on bridging the gap between AI security, red-teaming... Read More →

Amit Giloni

Principal Researcher, GenAI Trust team, Fujitsu Research

Dr. Amit Giloni is a Principal Researcher at Fujitsu Research of Europe, where she is part of the GenAI Trust team.Her research spans multiple areas of machine learning, including classical ML, deep learning, generative AI, and agentic AI. She focuses on key challenges in trustworthy... Read More →

Roy Betser

Senior Researcher, GenAI Trust team, Fujitsu Research

Roy Betser is a PhD candidate int he Technion and an AI security senior researcher in Fujitsu Research of Europe, where heis part of the GenAI Trust team. His research focuses on analyzing representation and embedding spaces in foundation models and on developing practical trust and... Read More →

Thursday June 25, 2026 10:30am - 11:15am CEST
Hall G2 (Level -2)

Testing

Audience Advanced

11:30am CEST

Developing Effective Security Testing Skills with Objective Structured Assessments

Thursday June 25, 2026 11:30am - 12:15pm CEST

Hall G2 (Level -2)

Technical skill development and evaluation for application (software) security testers remains underdeveloped. There is no widely adopted framework defining core competencies, proficiency levels, or objective assessment criteria. In the absence of such standards, the industry has defaulted to a fragmented ecosystem of private organizations offering training and certifications that insufficiently prepare the next generation of security testers for real-world testing.

This environment disproportionately rewards those who benefit from exceptional mentorship or possess the time, resources, and aptitude for intensive self-directed learning. The popular mantra “Try Harder” reflects this culture of self-made expertise, but it also serves as a substitute for formalized training models. Further, aspiring security professionals are left to

In contrast, more mature, life-critical disciplines that demand high levels of technical skill (such as aviation and surgery) are built upon standardized curricula, clearly defined skill progressions, and objective methods for evaluating competence. This is not by chance; over many decades, these (and related) fields have honed in how to achieve optimal outcomes through evidence-based training programs and practices.

In this talk, we will examine the past, present, and prospective future of application security tester training in comparison to more mature professions that demand a high level of technical skill. We will introduce a novel framework for evaluating technical skills and demonstrate its application in combination with a comprehensive AppSec curriculum. Both the assessment framework and the curriculum will be released to the open-source community at the time of presentation.

Speakers

Ryan Armstrong

AppSec Manager, Tester, and Teacher, Digital Boundary Group (DBG)

Ryan Armstrong is the Manager of Application Security Services at Digital Boundary Group (DBG). Ryan began with DBG as an application penetration tester and security consultant following completion of his PhD in Biomedical Engineering at Western University in 2016. With a passion... Read More →

Thursday June 25, 2026 11:30am - 12:15pm CEST
Hall G2 (Level -2)

Testing

Audience Intermediate

2:15pm CEST

This Build can Break You - Evil Runners and eBPF for Detection

Thursday June 25, 2026 2:15pm - 3:00pm CEST

Hall G2 (Level -2)

CI/CD pipelines play an important role in modern software development. From a security perspective, this methodology contributes to more secure products, as automated checks can be applied on every run. Developers define tasks in a metadata file, and the system executes the defined jobs automatically. But what if the build chain itself becomes the security problem, allowing attackers to manipulate artifacts or take control of backend infrastructure? Let’s take a deep dive into “Poisoned Pipeline Execution” (OWASP CICD-SEC-4).

Builds are typically carried out in multiple steps using Runners—agents that pick up jobs and execute build instructions. These instructions, such as compiling a program or building a container image, are usually performed inside containers. Containers may provide isolation, but the effectiveness in terms of security strongly depends on the Runner’s configuration. Attackers can abuse Runners to execute arbitrary commands, leading to information disclosure or privilege escalation. While such attacks are well documented, effective detection mechanisms are often lacking.

Any viable detection method must be independent of the source code, language-agnostic, and container-friendly. The eBPF technology, which enables tracing of kernel-level activity, is well suited for this purpose. In this talk, we explore security vulnerabilities in CI Runners, how they become targets for attackers, and how malicious activities can be detected using eBPF.

Speakers

Reinhard Kugler

Principal Security Consultant, SBA Research

Reinhard’s focus relies on security testing of IT and industrial cyber-physical systems. Based on his prior experience in cyber defense, he works with companies to develop security capabilities and secure products. Reinhard is an experienced instructor and develops tailored security... Read More →

Thursday June 25, 2026 2:15pm - 3:00pm CEST
Hall G2 (Level -2)

Testing

Audience Intermediate

3:30pm CEST

Boiling the Ocean for Signal: Lessons from High-Volume OSS Malware Detection

Thursday June 25, 2026 3:30pm - 4:15pm CEST

Hall G2 (Level -2)

Malicious open source packages are on the rise, targeting more and more ecosystems. And while open source maintainers and users struggle to secure the immense attack surface of today’s software development practice, attackers continue to evolve their techniques.

This talk presents lessons learned from developing and operating an end-to-end malware detection pipeline in an enterprise setup that automatically scans tens of thousands packages a day, and is followed by human review of reported malware. It provides an overview about and fundamental design decisions, starting from a suitable classification scheme and the selection of meaningful signals with a low signal-to-noise ratio, to the compilation of Indicators of Compromise and the final reporting of confirmed malicious packages to the respective registries and third-party databases like OSV. The individual sections and learnings will be motivated and illustrated through real-world samples as well as descriptive statistics obtained from our system.

Session attendees will learn about:
- Latest open source malware trends,
- common evasion techniques used by attackers, from encoding techniques, code transformations and payload splitting to prompt instructions aiming to sabotage LLM-based detectors,
- the shortcomings of current malware datasets in regard to supporting developers in the evaluation of malware scanners, e.g., the lack of accompanying metadata and qualitative descriptions,
- the importance and complementarity of code and metadata-based detection signals,
- requirements and design decisions for an end-to-end OSS malware scanner, e.g., the realization that a binary classification benign/malicious is not colorful enough for the breadth of software distributed through OSS registries like npm or PyPI, and
- descriptive statistics obtained from our system, showing the prevalence of techniques used in the wild, e.g., the prevalence of different malware triggers and targeted platforms.

As such, the presentation targets both open source users interested in the latest malware trends and safeguards, as well as builders wanting to create an end-to-end OSS scan pipeline, e.g., because their ecosystem is already targeted by attackers but not yet or not sufficiently covered by state-of-the-art scanners.

Speakers

Henrik Plate

Security Researcher, Endor Labs

In his current position, Henrik aims at improving the security of today’s software supply chains, and in particular the secure consumption of open source. He formerly worked for SAP Security Research, where he led the focus topic "open source security" starting in 2014. He co-authored... Read More →

Thursday June 25, 2026 3:30pm - 4:15pm CEST
Hall G2 (Level -2)

Testing

Audience Intermediate

OWASP Global AppSec EU 2026 Vienna

10:30am CEST

Roman Vainshtein

Amit Giloni

Roy Betser

11:30am CEST

Ryan Armstrong

2:15pm CEST

Reinhard Kugler

3:30pm CEST

Henrik Plate

Get help with the event