Loading…
Type: Testing clear filter
Thursday, June 25
 

10:30am CEST

Scanning Agentic AI Systems: Beyond Traditional LLM Red Teaming
Thursday June 25, 2026 10:30am - 11:15am CEST
As agentic AI systems evolve from simple LLM interfaces into autonomous and multi-agent workflows. Given the high autonomy of agentic AI systems, there is a growing need to perform a detailed risk assessment, which means traditional LLM-focused red teaming is no longer enough. Unlike standalone LLMs with text input and output, agentic systems interact with tools, memory, external data, and other agents, creating many new attack surfaces. Attacks may be introduced through emails, tool descriptions, or environmental content, and their impact can go beyond model responses to affect system behavior, planning, and perform harmful real-world actions.

In this talk, we share our hands-on journey building a comprehensive red teaming scanning solution tailored for agentic AI systems. We begin by analyzing why current scanning tools fall short, specifically their emphasis on structured components (e.g., protocols like MCP, A2A, and Skills) while overlooking unstructured and highly dynamic attack vectors where most real-world risks emerge. We then walk through the technical challenges of simulating realistic attacks without harming production environments, handling the diversity of agent architectures, frameworks, and agency-levels, and designing scanners that generalize across heterogeneous systems.

We present a practical full scanning pipeline that creates a novel holistic solution, including sandboxing and emulation strategies, automated system discovery pipelines, abstraction-based scanning mechanisms, and a risk-aware robustness scoring framework that goes beyond binary attack success. Throughout the talk, we highlight concrete lessons learned, trade-offs between cost and reliability, and real examples of agent-specific vulnerabilities.
We conclude with a concrete end-to-end scanning workflow and discuss open challenges such as adaptive scanner generation and black-box agent discovery. Attendees will leave with a deep understanding of why agentic AI requires fundamentally new red teaming methodologies and with actionable techniques for securing real-world autonomous AI systems.
Speakers
avatar for Roman Vainshtein

Roman Vainshtein

Research Director, GenAI Trust, Fujitsu Research of Europe

I am Research Director of the Generative AI Trust and Security Research team at Fujitsu Research of Europe, where I lead efforts to enhance the security, trustworthiness, and resilience of Generative AI systems. My work focuses on bridging the gap between AI security, red-teaming... Read More →
avatar for Amit Giloni

Amit Giloni

Principal Researcher, GenAI Trust team, Fujitsu Research
Dr. Amit Giloni is a Principal Researcher at Fujitsu Research of Europe, where she is part of the GenAI Trust team.Her research spans multiple areas of machine learning, including classical ML, deep learning, generative AI, and agentic AI. She focuses on key challenges in trustworthy... Read More →
avatar for Roy Betser

Roy Betser

Senior Researcher, GenAI Trust team, Fujitsu Research

Roy Betser is a PhD candidate int he Technion and an AI security senior researcher in Fujitsu Research of Europe, where heis part of the GenAI Trust team. His research focuses on analyzing representation and embedding spaces in foundation models and on developing practical trust and... Read More →
Thursday June 25, 2026 10:30am - 11:15am CEST
Hall G2 (Level -2)
  Testing

11:30am CEST

Developing Effective Security Testing Skills with Objective Structured Assessments
Thursday June 25, 2026 11:30am - 12:15pm CEST
Technical skill development and evaluation for application (software) security testers remains underdeveloped. There is no widely adopted framework defining core competencies, proficiency levels, or objective assessment criteria. In the absence of such standards, the industry has defaulted to a fragmented ecosystem of private organizations offering training and certifications that insufficiently prepare the next generation of security testers for real-world testing.

This environment disproportionately rewards those who benefit from exceptional mentorship or possess the time, resources, and aptitude for intensive self-directed learning. The popular mantra “Try Harder” reflects this culture of self-made expertise, but it also serves as a substitute for formalized training models. Further, aspiring security professionals are left to

In contrast, more mature, life-critical disciplines that demand high levels of technical skill (such as aviation and surgery) are built upon standardized curricula, clearly defined skill progressions, and objective methods for evaluating competence. This is not by chance; over many decades, these (and related) fields have honed in how to achieve optimal outcomes through evidence-based training programs and practices.

In this talk, we will examine the past, present, and prospective future of application security tester training in comparison to more mature professions that demand a high level of technical skill. We will introduce a novel framework for evaluating technical skills and demonstrate its application in combination with a comprehensive AppSec curriculum. Both the assessment framework and the curriculum will be released to the open-source community at the time of presentation.
Speakers
avatar for Ryan Armstrong

Ryan Armstrong

AppSec Manager, Tester, and Teacher, Digital Boundary Group (DBG)
Ryan Armstrong is the Manager of Application Security Services at Digital Boundary Group (DBG). Ryan began with DBG as an application penetration tester and security consultant following completion of his PhD in Biomedical Engineering at Western University in 2016. With a passion... Read More →
Thursday June 25, 2026 11:30am - 12:15pm CEST
Hall G2 (Level -2)

2:15pm CEST

This Build can Break You - Evil Runners and eBPF for Detection
Thursday June 25, 2026 2:15pm - 3:00pm CEST
CI/CD pipelines play an important role in modern software development. From a security perspective, this methodology contributes to more secure products, as automated checks can be applied on every run. Developers define tasks in a metadata file, and the system executes the defined jobs automatically. But what if the build chain itself becomes the security problem, allowing attackers to manipulate artifacts or take control of backend infrastructure? Let’s take a deep dive into “Poisoned Pipeline Execution” (OWASP CICD-SEC-4).

Builds are typically carried out in multiple steps using Runners—agents that pick up jobs and execute build instructions. These instructions, such as compiling a program or building a container image, are usually performed inside containers. Containers may provide isolation, but the effectiveness in terms of security strongly depends on the Runner’s configuration. Attackers can abuse Runners to execute arbitrary commands, leading to information disclosure or privilege escalation. While such attacks are well documented, effective detection mechanisms are often lacking.

Any viable detection method must be independent of the source code, language-agnostic, and container-friendly. The eBPF technology, which enables tracing of kernel-level activity, is well suited for this purpose. In this talk, we explore security vulnerabilities in CI Runners, how they become targets for attackers, and how malicious activities can be detected using eBPF.
Speakers
avatar for Reinhard Kugler

Reinhard Kugler

Principal Security Consultant, SBA Research

Reinhard’s focus relies on security testing of IT and industrial cyber-physical systems. Based on his prior experience in cyber defense, he works with companies to develop security capabilities and secure products. Reinhard is an experienced instructor and develops tailored security... Read More →
Thursday June 25, 2026 2:15pm - 3:00pm CEST
Hall G2 (Level -2)

3:30pm CEST

Boiling the Ocean for Signal: Lessons from High-Volume OSS Malware Detection
Thursday June 25, 2026 3:30pm - 4:15pm CEST
Malicious open source packages are on the rise, targeting more and more ecosystems. And while open source maintainers and users struggle to secure the immense attack surface of today’s software development practice, attackers continue to evolve their techniques.

This talk presents lessons learned from developing and operating an end-to-end malware detection pipeline in an enterprise setup that automatically scans tens of thousands packages a day, and is followed by human review of reported malware. It provides an overview about and fundamental design decisions, starting from a suitable classification scheme and the selection of meaningful signals with a low signal-to-noise ratio, to the compilation of Indicators of Compromise and the final reporting of confirmed malicious packages to the respective registries and third-party databases like OSV. The individual sections and learnings will be motivated and illustrated through real-world samples as well as descriptive statistics obtained from our system.

Session attendees will learn about:
- Latest open source malware trends,
- common evasion techniques used by attackers, from encoding techniques, code transformations and payload splitting to prompt instructions aiming to sabotage LLM-based detectors,
- the shortcomings of current malware datasets in regard to supporting developers in the evaluation of malware scanners, e.g., the lack of accompanying metadata and qualitative descriptions,
- the importance and complementarity of code and metadata-based detection signals,
- requirements and design decisions for an end-to-end OSS malware scanner, e.g., the realization that a binary classification benign/malicious is not colorful enough for the breadth of software distributed through OSS registries like npm or PyPI, and
- descriptive statistics obtained from our system, showing the prevalence of techniques used in the wild, e.g., the prevalence of different malware triggers and targeted platforms.

As such, the presentation targets both open source users interested in the latest malware trends and safeguards, as well as builders wanting to create an end-to-end OSS scan pipeline, e.g., because their ecosystem is already targeted by attackers but not yet or not sufficiently covered by state-of-the-art scanners.
Speakers
avatar for Henrik Plate

Henrik Plate

Security Researcher, Endor Labs

In his current position, Henrik aims at improving the security of today’s software supply chains, and in particular the secure consumption of open source. He formerly worked for SAP Security Research, where he led the focus topic "open source security" starting in 2014. He co-authored... Read More →
Thursday June 25, 2026 3:30pm - 4:15pm CEST
Hall G2 (Level -2)
 
Friday, June 26
 

10:30am CEST

OWASP AI Security Verification Standard (AISVS)
Friday June 26, 2026 10:30am - 11:15am CEST
AI systems face threats that traditional application security standards weren't built to address. This includes prompt injection, training data poisoning, model extraction, agentic autonomy risks, and more. The OWASP AI Security Verification Standard (AISVS) provides 400+ testable requirements across 14 chapters, covering everything from input validation and model lifecycle management to MCP protocol security and autonomous agent controls. This lightning talk introduces the standard's structure, its three verification levels, and how security teams can use it today to assess and harden AI-powered applications. We'll show where AISVS fits alongside existing frameworks like ASVS, NIST AI RMF, and ISO 42001 and where it deliberately doesn't overlap.
Speakers
avatar for Jim Manico

Jim Manico

Founder and CEO, Manicode Security
Jim Manico is the founder of Manicode Security, where he specializes in training software developers on secure coding and security engineering. He is actively involved in multiple ventures, serving as an investor/advisor for companies like 10Security, MergeBase, Nucleus Security... Read More →
avatar for Rico Komenda

Rico Komenda

Senior Product Security Engineer
Rico is a senior product security engineer. His main security areas are in application security, cloud security, offensive security and AI security.For him, general security intelligence in various aspects is a top priority. Today’s security world is constantly changing and you... Read More →
avatar for Otto Sulin

Otto Sulin

Head of Security, Supermetrics


avatar for Russ Memisyazici

Russ Memisyazici

Aras “Russ” Memişyazıcı, M.Sc. is a senior technology and architecture leader specializing in AI security, cloud transformation, application security, and enterprise modernization. He currently serves as a Global Head of Reference Architecture at Aon, where his work focuses... Read More →
avatar for Raza Sharif

Raza Sharif

Raza is the founder of CyberSecAI, a UK-based cyber security consultancy, and co-lead of the OWASP AISVS project. With 20+ years securing critical infrastructure for the private sector, governments and financial institutions, he focuses on AI security architecture and securing the... Read More →
Friday June 26, 2026 10:30am - 11:15am CEST
Hall G2 (Level -2)

11:30am CEST

Effort is All You Need: Testing LLM Applications in the Real World
Friday June 26, 2026 11:30am - 12:15pm CEST
Security testing of GenAI systems is often reduced to "LLM red teaming": probing a model in isolation to see what unsafe/offensive content it will generate. In practice, this approach falls short. As security practitioners, we need to assess complete LLM application use cases, focusing on how inputs and outputs propagate through application logic and enable concrete security risks such as data exfiltration, cross-site scripting, and authorization bypass.

In this talk, we share practical experience and supporting open-source tooling we developed for assessing LLM applications. These focus on testing systems where the LLM is embedded in application logic rather than exposed as a simple inference endpoint.

It covers approaches for testing non-conversational GenAI workflows, WebSockets, and custom APIs; building scoped prompt injection datasets aligned with application logic and engagement constraints; applying effort-based jailbreak techniques (e.g. anti-spotlighting, best-of-n, crescendo, ...) to evaluate guardrail robustness and demonstrate practical bypasses; and conducting meaningful testing in isolated or air-gapped environments.

Speakers
avatar for Donato Capitella

Donato Capitella

Principal Security Consultant, Reversec

Donato Capitella is a Software Engineer and Principal Security Consultant at Reversec, with over 15 years of experience in offensive security and software engineering. Donato spent the past 3 years conducting research and assessments on Generative AI applications, covering topics... Read More →
avatar for Thomas Cross

Thomas Cross

Security Consultant, Reversec

Friday June 26, 2026 11:30am - 12:15pm CEST
Hall G2 (Level -2)

1:15pm CEST

What Our Pen Tests Never Found — And How Attackers Did
Friday June 26, 2026 1:15pm - 2:00pm CEST
Penetration testing is a crucial part of application security practices, yet attackers often succeed in ways no test ever reported. No injection, no memory corruption, no failed authentication. The applications behaved exactly as designed — and that was enough.

In this talk, we will explore what penetrating testing is intended to detect and how attackers actually compromise the systems. This talk will address why well-scoped penetration testing frequently revealed "no critical findings" while attackers later leveraged legitimate workflows, permission assumptions, and trust boundaries to cause serious harm.
Based on real world examples and post incident analysis, this talk will walk through security issues that were frequently overlooked during testing, not because testers lacked skills, but because the testing process made assumptions that attackers did not follow. We will focus on examining the blind spots in the penetration testing process, which include behaviors that only appear in production, cross-feature chaining, abuse of business logic, and trust assumptions built into system architecture.

The objective of this talk will be to comprehend where pen testing ends and how defenders might modify their testing tactics accordingly, rather than to replace it. This talk will break down the classes of issues pen tests routinely miss, how attackers discover them post-deployment, and what changed when testing strategies shifted from endpoint coverage to adversary-aware validation.

Attendees will leave with practical techniques to evolve their AppSec testing without increasing cost or abandoning penetration testing.
Speakers
avatar for Ramya M

Ramya M

Application Analyst, Okta, Inc,

Ramya M is a cybersecurity professional, currently working at Okta, Inc., specializing in application security, product security, identity security, and secure SDLC automation. She has led enterprise-scale initiatives across secure coding, DevSecOps hardening, vulnerability triage... Read More →
Friday June 26, 2026 1:15pm - 2:00pm CEST
Hall G2 (Level -2)

2:15pm CEST

Trust No History: Why Every "Remembered" Interaction is a Potential Backdoor
Friday June 26, 2026 2:15pm - 3:00pm CEST
As AI transitions from stateless tools to autonomous agents, the context window has become the primary attack surface. By giving agents the ability to remember, summarize, and collaborate, we have created a machine that can be gaslit. This session moves beyond transient prompt injections into the realm of persistent memory corruption. We explore how an adversary can rewrite an agent’s history, bias its knowledge base, and plant sleeper instructions that trigger long after the initial interaction. We will dissect the systematic subversion of the agentic memory stack and demonstrate why developers must stop treating agent memory as a passive data store and start defending it as the engine of the agent’s survival
Speakers
avatar for Rico Komenda

Rico Komenda

Senior Product Security Engineer
Rico is a senior product security engineer. His main security areas are in application security, cloud security, offensive security and AI security.For him, general security intelligence in various aspects is a top priority. Today’s security world is constantly changing and you... Read More →
avatar for Barno Kaharova

Barno Kaharova

Senior Consultant, AI Security Expert, adesso SE

Barno is a expert specializing in data engineering, data modeling, and machine learning security. Driven by a passion for innovation, she develops cutting-edge methodologies to protect AI systems from adversarial threats, pushing the boundaries of what’s possible in AI security... Read More →
Friday June 26, 2026 2:15pm - 3:00pm CEST
Hall G2 (Level -2)
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.