Posts in Security (20 found)
devansh Today

Hitchhiker's Guide to Attack Surface Management

I first heard about the word "ASM" (i.e., Attack Surface Management) probably in late 2018, and I thought it must be some complex infrastructure for tracking assets of an organization. Looking back, I realize I almost had a similar stack for discovering, tracking, and detecting obscure assets of organizations, and I was using it for my bug hunting adventures. I feel my stack was kinda goated, as I was able to find obscure assets of Apple, Facebook, Shopify, Twitter, and many other Fortune 100 companies, and reported hundreds of bugs, all through automation. Back in the day, projects like ProjectDiscovery were not present, so if I had to write an effective port scanner, I had to do it from scratch. (Masscan and nmap were present, but I had my fair share of issues using them, this is a story for another time). I used to write DNS resolvers (massdns had a high error rate), port scanners, web scrapers, directory brute-force utilities, wordlists, lots of JavaScript parsing logic using regex, and a hell of a lot of other things. I used to have up to 50+ self-developed tools for bug-bounty recon stuff and another 60-something helper scripts written in bash. I used to orchestrate (gluing together with duct tape is a better word) and slap together scripts like a workflow, and save the output in text files. Whenever I dealt with a large number of domains, I used to distribute the load over multiple servers (server spin-up + SSH into it + SCP for pushing and pulling files from it). The setup was very fragile and error-prone, and I spent countless nights trying to debug errors in the workflows. But it was all worth it. I learned the art of Attack Surface Management without even trying to learn about it. I was just a teenager trying to make quick bucks through bug hunting, and this fragile, duct-taped system was my edge. Fast forward to today, I have now spent almost a decade in the bug bounty scene. I joined HackerOne in 2020 (to present) as a vulnerability triager, where I have triaged and reviewed tens of thousands of vulnerability submissions. Fair to say, I have seen a lot of things, from doomsday level 0-days, to reports related to leaked credentials which could have led to entire infrastructure compromise, just because some dev pushed an AWS secret key in git logs, to things where some organizations were not even aware they were running Jenkins servers on some obscure subdomain which could have allowed RCE and then lateral movement to other layers of infrastructure. A lot of these issues I have seen were totally avoidable, only if organizations followed some basic attack surface management techniques. If I search "Guide to ASM" on Internet, almost none of the supposed guides are real resources. They funnel you to their own ASM solution, and the guide is just present there to provide you with some surface-level information, and is mostly a marketing gimmick. This is precisely why I decided to write something where I try to cover everything I learned and know about ASM, and how to protect your organization's assets before bad actors could get to them. This is going to be a rough and raw guide, and will not lead you to a funnel where I am trying to sell my own ASM SaaS to you. I have nothing to sell, other than offering what I know. But in case you are an organization who needs help implementing the things I am mentioning below, you can reach out to me via X or email (both available on the homepage of this blog). This guide will provide you with insights into exactly how big your attack surface really is. CISOs can look at it and see if their organizations have all of these covered, security researchers and bug hunters can look at this and maybe find new ideas related to where to look during recon. Devs can look at it and see if they are unintentionally leaving any door open for hackers. If you are into security, it has something to offer you. Attack surface is one of those terms getting thrown around in security circles so much that it's become almost meaningless noise. In theory, it sounds simple enough, right. Attack surface is every single potential entry point, interaction vector, or exploitable interface an attacker could use to compromise your systems, steal your data, or generally wreck your day. But here's the thing, it's the sum total of everything you've exposed to the internet. Every API endpoint you forgot about, every subdomain some dev spun up for "testing purposes" five years ago and then abandoned, every IoT device plugged into your network, every employee laptop connecting from a coffee shop, every third-party vendor with a backdoor into your environment, every cloud storage bucket with permissions that make no sense, every Slack channel, every git commit leaking credentials, every paste on Pastebin containing your database passwords. Most organizations think about attack surface in incredibly narrow terms. They think if they have a website, an email server, and maybe some VPN endpoints, they've got "good visibility" into their assets. That's just plain wrong. Straight up wrong. Your actual attack surface would terrify you if you actually understood it. You run , and is your main domain. You probably know about , , maybe . But what about that your intern from 2015 spun up and just never bothered to delete. It's not documented anywhere. Nobody remembers it exists. Domain attack surface goes way beyond what's sitting in your asset management system. Every subdomain is a potential entry point. Most of these subdomains are completely forgotten. Subdomain enumeration is reconnaissance 101 for attackers and bug hunters. It's not rocket science. Setting up a tool that actively monitors through active and passive sources for new subdomains and generates alerts is honestly an hour's worth of work. You can use tools like Subfinder, Amass, or just mine Certificate Transparency logs to discover every single subdomain connected to your domain. Certificate Transparency logs were designed to increase security by making certificate issuance public, and they've become an absolute reconnaissance goldmine. Every time you get an SSL certificate for , that information is sitting in public logs for anyone to find. Attackers systematically enumerate these subdomains using Certificate Transparency log searches, DNS brute-forcing with massive wordlists, reverse DNS lookups to map IP ranges back to domains, historical DNS data from services like SecurityTrails, and zone transfer exploitation if your DNS is misconfigured. Attackers are looking for old development environments still running vulnerable software, staging servers with production data sitting on them, forgotten admin panels, API endpoints without authentication, internal tools accidentally exposed, and test environments with default credentials nobody changed. Every subdomain is an asset. Every asset is a potential vulnerability. Every vulnerability is an entry point. Domains and subdomains are just the starting point though. Once you've figured out all the subdomains belonging to your organization, the next step is to take a hard look at IP address space, which is another absolutely massive component of your attack surface. Organizations own, sometimes lease, IP ranges, sometimes small /24 blocks, sometimes massive /16 ranges, and every single IP address in those blocks and ranges that responds to external traffic is part of your attack surface. And attackers enumerate them all if you won't. They use WHOIS lookups to identify your IP ranges, port scanning to find what services are running where, service fingerprinting to identify exact software versions, and banner grabbing to extract configuration information. If you have a /24 network with 256 IP addresses and even 10% of those IPs are running services, you've got 25 potential attack vectors. Scale that to a /20 or /16 and you're looking at thousands of potential entry points. And attackers aren't just looking at the IPs you know about. They're looking at adjacent IP ranges you might have acquired through mergers, historical IP allocations that haven't been properly decommissioned, and shared IP ranges where your servers coexist with others. Traditional infrastructure was complicated enough, and now we have cloud. It's literally exploded organizations' attack surfaces in ways that are genuinely difficult to even comprehend. Every cloud service you spin up, be it an EC2 instance, S3 bucket, Lambda function, or API Gateway endpoint, all of this is a new attack vector. In my opinion and experience so far, I think the main issue with cloud infrastructure is that it's ephemeral and distributed. Resources get spun up and torn down constantly. Developers create instances for testing and forget about them. Auto-scaling groups generate new resources dynamically. Containerized workloads spin up massive Kubernetes clusters you have minimal visibility into. Your cloud attack surface could be literally anything. Examples are countless, but I'd categorize them into 8 different categories. Compute instances like EC2, Azure VMs, GCP Compute Engine instances exposed to the internet. Storage buckets like S3, Azure Blob Storage, GCP Cloud Storage with misconfigured permissions. Serverless stuff like Lambda functions with public URLs or overly permissive IAM roles. API endpoints like API Gateway, Azure API Management endpoints without proper authentication. Container registries like Docker images with embedded secrets or vulnerabilities. Kubernetes clusters with exposed API servers, misconfigured network policies, vulnerable ingress controllers. Managed databases like RDS, CosmosDB, Cloud SQL instances with weak access controls. IAM roles and service accounts with overly permissive identities that enable privilege escalation. I've seen instances in the past where a single misconfigured S3 bucket policy exposed terabytes of data. An overly permissive Lambda IAM role enabled lateral movement across an entire AWS account. A publicly accessible Kubernetes API server gave an attacker full cluster control. Honestly, cloud kinda scares me as well. And to top it off, multi-cloud infrastructure makes everything worse. If you're running AWS, Azure, and GCP together, you've just tripled your attack surface management complexity. Each cloud provider has different security models, different configuration profiles, and different attack vectors. Every application now uses APIs, and all applications nowadays are like a constellation of APIs talking to each other. Every API you use in your organization is your attack surface. The problem with APIs is that they're often deployed without the same security scrutiny as traditional web applications. Developers spin up API endpoints for specific features and those endpoints accumulate over time. Some of them are shadow APIs, meaning API endpoints which aren't documented anywhere. These endpoints are the equivalent of forgotten subdomains, and attackers can find them through analyzing JavaScript files for API endpoint references, fuzzing common API path patterns, examining mobile app traffic to discover backend APIs, and mining old documentation or code repositories for deprecated endpoints. Your API attack surface includes REST APIs exposed to the internet, GraphQL endpoints with overly broad query capabilities, WebSocket connections for real-time functionality, gRPC services for inter-service communication, and legacy SOAP APIs that never got decommissioned. If your organization has mobile apps, be it iOS, Android, or both, this is a direct window to your infrastructure and should be part of your attack surface management strategy. Mobile apps communicate with backend APIs and those API endpoints are discoverable by reversing the app. The reversed source of the app could reveal hard-coded API keys, tokens, and credentials. Using JADX plus APKTool plus Dex2jar is all a motivated attacker needs. Web servers often expose directories and files that weren't meant to be publicly accessible. Attackers systematically enumerate these using automated tools like ffuf, dirbuster, gobuster, and wfuzz with massive wordlists to discover hidden endpoints, configuration files, backup files, and administrative interfaces. Common exposed directories include admin panels, backup directories containing database dumps or source code, configuration files with database credentials and API keys, development directories with debug information, documentation directories revealing internal systems, upload directories for file storage, and old or forgotten directories from previous deployments. Your attack surface must include directories which are accidentally left accessible during deployments, staging servers with production data, backup directories with old source code versions, administrative interfaces without authentication, API documentation exposing endpoint details, and test directories with debug output enabled. Even if you've removed a directory from production, old cached versions may still be accessible through web caches or CDNs. Search engines also index these directories, making them discoverable through dorking techniques. If your organization is using IoT devices, and everyone uses these days, this should be part of your attack surface management strategy. They're invisible to traditional security tools. Your EDR solution doesn't protect IoT devices. Your vulnerability scanner can't inventory them. Your patch management system can't update them. Your IoT attack surface could include smart building systems like HVAC, lighting, access control. Security cameras and surveillance systems. Printers and copiers, which are computers with network access. Badge readers and physical access systems. Industrial control systems and SCADA devices. Medical devices in healthcare environments. Employee wearables and fitness trackers. Voice assistants and smart speakers. The problem with IoT devices is that they're often deployed without any security consideration. They have default credentials that never get changed, unpatched firmware with known vulnerabilities, no encryption for data in transit, weak authentication mechanisms, and insecure network configurations. Social media presence is an attack surface component that most organizations completely ignore. Attackers can use social media for reconnaissance by looking at employee profiles on LinkedIn to reveal organizational structure, technologies in use, and current projects. Twitter/X accounts can leak information about deployments, outages, and technology stack. Employee GitHub profiles expose email patterns and development practices. Company blogs can announce new features before security review. It could also be a direct attack vector. Attackers can use information from social media to craft convincing phishing attacks. Hijacked social media accounts can be used to spread malware or phishing links. Employees can accidentally share sensitive information. Fake accounts can impersonate your brand to defraud customers. Your employees' social media presence is part of your attack surface whether you like it or not. Third-party vendors, suppliers, contractors, or partners with access to your systems should be part of your attack surface. Supply chain attacks are becoming more and more common these days. Attackers can compromise a vendor with weaker security and then use that vendor's access to reach your environment. From there, they pivot from the vendor network to your systems. This isn't a hypothetical scenario, it has happened multiple times in the past. You might have heard about the SolarWinds attack, where attackers compromised SolarWinds' build system and distributed malware through software updates to thousands of customers. Another famous case study is the MOVEit vulnerability in MOVEit Transfer software, exploited by the Cl0p ransomware group, which affected over 2,700 organizations. These are examples of some high-profile supply chain security attacks. Your third-party attack surface could include things like VPNs, remote desktop connections, privileged access systems, third-party services with API keys to your systems, login credentials shared with vendors, SaaS applications storing your data, and external IT support with administrative access. It's obvious you can't directly control third-party security. You can audit them, have them pen-test their assets as part of your vendor compliance plan, and include security requirements in contracts, but ultimately their security posture is outside your control. And attackers know this. GitHub, GitLab, Bitbucket, they all are a massive attack surface. Attackers search through code repositories in hopes of finding hard-coded credentials like API keys, database passwords, and tokens. Private keys, SSH keys, TLS certificates, and encryption keys. Internal architecture documentation revealing infrastructure details in code comments. Configuration files with database connection strings and internal URLs. Deprecated code with vulnerabilities that's still in production. Even private repositories aren't safe. Attackers can compromise developer accounts to access private repositories, former employees retain access after leaving, and overly broad repository permissions grant access to too many people. Automated scanners continuously monitor public repositories for secrets. The moment a developer accidentally pushes credentials to a public repository, automated systems detect it within minutes. Attackers have already extracted and weaponized those credentials before the developer realizes the mistake. CI/CD pipelines are massive another attack vector. Especially in recent times, and not many organizations are giving attention to this attack vector. This should totally be part of your attack surface management. Attackers compromise GitHub Actions workflows with malicious code injection, Jenkins servers with weak authentication, GitLab CI/CD variables containing secrets, and build artifacts with embedded malware. The GitHub Actions supply chain attack, CVE-2025-30066, demonstrated this perfectly. Attackers compromised the Action used in over 23,000 repositories, injecting malicious code that leaked secrets from build logs. Jenkins specifically is a goldmine for attackers. An exposed Jenkins instance provides complete control over multiple critical servers, access to hardcoded AWS keys, Redis credentials, and BitBucket tokens, ability to manipulate builds and inject malicious code, and exfiltration of production database credentials containing PII. Modern collaboration tools are massive attack surface components that most organizations underestimate. Slack has hidden security risks despite being invite-only. Slack attack surface could include indefinite data retention where every message, channel, and file is stored forever unless admins configure retention periods. Public channels accessible to all users so one breached account opens the floodgates. Third-party integrations with excessive permissions accessing messages and user data. Former contractor access where individuals retain access long after projects end. Phishing and impersonation where it's easy to change names and pictures to impersonate senior personnel. In 2022, Slack leaked hashed passwords for five years affecting 0.5% of users. Slack channels commonly contain API keys, authentication tokens, database credentials, customer PII, financial data, internal system passwords, and confidential project information. The average cost of a breached record was $164 in 2022. When 1 in 166 messages in Slack contains confidential information, every new message adds another dollar to total risk exposure. With 5,000 employees sending 30 million Slack messages per year, that's substantial exposure. Trello board exposure is a significant attack surface. Trello attack vectors include public boards with sensitive information accidentally shared publicly, default public visibility where boards are created as public by default in some configurations, unsecured REST API allowing unauthenticated access to user data, and scraping attacks where attackers use email lists to enumerate Trello accounts. The 2024 Trello data breach exposed 15 million users' personal information when a threat actor named "emo" exploited an unsecured REST API using 500 million email addresses to compile detailed user profiles. Security researcher David Shear documented hundreds of public Trello boards exposing passwords, credentials, IT support customer access details, website admin logins, and client server management credentials. IT companies were using Trello to troubleshoot client requests and manage infrastructure, storing all credentials on public Trello boards. Jira misconfiguration is a widespread attack surface issue. Common misconfigurations include public dashboards and filters with "Everyone" access actually meaning public internet access, anonymous access enabled allowing unauthenticated users to browse, user picker functionality providing complete lists of usernames and email addresses, and project visibility allowing sensitive projects to be accessible without authentication. Confluence misconfiguration exposes internal documentation. Confluence attack surface components include anonymous access at site level allowing public access, public spaces where space admins grant anonymous permissions, inherited permissions where all content within a space inherits space-level access, and user profile visibility allowing anonymous users to view profiles of logged-in users. When anonymous access is enabled globally and space admins allow anonymous users to access their spaces, anyone on the internet can access that content. Confluence spaces often contain internal documentation with hardcoded credentials, financial information, project details, employee information, and API documentation with authentication details. Cloud storage misconfiguration is epidemic. Google Drive misconfiguration attack surface includes "Anyone with the link" sharing making files accessible without authentication, overly permissive sharing defaults making it easy to accidentally share publicly, inherited folder permissions exposing everything beneath, unmanaged third-party apps with excessive read/write/delete permissions, inactive user accounts where former employees retain access, and external ownership blind spots where externally-owned content is shared into the environment. Metomic's 2023 Google Scanner Report found that of 6.5 million Google Drive files analyzed, 40.2% contained sensitive information, 34.2% were shared externally, and 0.5% were publicly accessible, mostly unintentionally. In December 2023, Japanese game developer Ateam suffered a catastrophic Google Drive misconfiguration that exposed personal data of nearly 1 million people for over six years due to "Anyone with the link" settings. Based on Valence research, 22% of external data shares utilize open links, and 94% of these open link shares are inactive, forgotten files with public URLs floating around the internet. Dropbox, OneDrive, and Box share similar attack surface components including misconfigured sharing permissions, weak or missing password protection, overly broad access grants, third-party app integrations with excessive permissions, and lack of visibility into external sharing. Features that make file sharing convenient create data leakage risks when misconfigured. Pastebin and similar paste sites are both reconnaissance sources and attack vectors. Paste site attack surface includes public data dumps of stolen credentials, API keys, and database dumps posted publicly, malware hosting of obfuscated payloads, C2 communications where malware uses Pastebin for command and control, credential leakage from developers accidentally posting secrets, and bypassing security filters since Pastebin is legitimate so security tools don't block it. For organizations, leaked API keys or database credentials on Pastebin lead to unauthorized access, data exfiltration, and service disruption. Attackers continuously scan Pastebin for mentions of target organizations using automated tools. Security teams must actively monitor Pastebin and similar paste sites for company name mentions, email domain references, and specific keywords related to the organization. Because paste sites don't require registration or authentication and content is rarely removed, they've become permanent archives of leaked secrets. Container registries expose significant attack surface. Container registry attack surface includes secrets embedded in image layers where 30,000 unique secrets were found in 19,000 images, with 10% of scanned Docker images containing secrets, and 1,200 secrets, 4%, being active and valid. Immutable cached layers contain 85% of embedded secrets that can't be removed, exposed registries with 117 Docker registries accessible without authentication, unsecured registries allowing pull, push, and delete operations, and source code exposure where full application code is accessible by pulling images. GitGuardian's analysis of 200,000 publicly available Docker images revealed a staggering secret exposure problem. Even more alarming, 99% of images containing active secrets were pulled in 2024, demonstrating real-world exploitation. Unit 42's research identified 941 Docker registries exposed to the internet, with 117 accessible without authentication containing 2,956 repositories, 15,887 tags, and full source code and historical versions. Out of 117 unsecured registries, 80 allow pull operations to download images, 92 allow push operations to upload malicious images, and 7 allow delete operations for ransomware potential. Sysdig's analysis of over 250,000 Linux images on Docker Hub found 1,652 malicious images including cryptominers, most common, embedded secrets, second most prevalent, SSH keys and public keys for backdoor implants, API keys and authentication tokens, and database credentials. The secrets found in container images included AWS access keys, database passwords, SSH private keys, API tokens for cloud services, GitHub personal access tokens, and TLS certificates. Shadow IT includes unapproved SaaS applications like Dropbox, Google Drive, and personal cloud storage used for work. Personal devices like BYOD laptops, tablets, and smartphones accessing corporate data. Rogue cloud deployments where developers spin up AWS instances without approval. Unauthorized messaging apps like WhatsApp, Telegram, and Signal used for business communication. Unapproved IoT devices like smart speakers, wireless cameras, and fitness trackers on the corporate network. Gartner estimates that shadow IT makes up 30-40% of IT spending in large companies, and 76% of organizations surveyed experienced cyberattacks due to exploitation of unknown, unmanaged, or poorly managed assets. Shadow IT expands your attack surface because it's not protected by your security controls, it's not monitored by your security team, it's not included in your vulnerability scans, it's not patched by your IT department, and it often has weak or default credentials. And you can't secure what you don't know exists. Bring Your Own Device, BYOD, policies sound great for employee flexibility and cost savings. For security teams, they're a nightmare. BYOD expands your attack surface by introducing unmanaged endpoints like personal devices without EDR, antivirus, or encryption. Mixing personal and business use where work data is stored alongside personal apps with unknown security. Connecting from untrusted networks like public Wi-Fi and home networks with compromised routers. Installing unapproved applications with malware or excessive permissions. Lacking consistent security updates with devices running outdated operating systems. Common BYOD security issues include data leakage through personal cloud backup services, malware infections from personal app downloads, lost or stolen devices containing corporate data, family members using devices that access work systems, and lack of IT visibility and control. The 60% of small and mid-sized businesses that close within six months of a major cyberattack often have BYOD-related security gaps as contributing factors. Remote access infrastructure like VPNs and Remote Desktop Protocol, RDP, are among the most exploited attack vectors. SSL VPN appliances from vendors like Fortinet, SonicWall, Check Point, and Palo Alto are under constant attack. VPN attack vectors include authentication bypass vulnerabilities with CVEs allowing attackers to hijack active sessions, credential stuffing through brute-forcing VPN logins with leaked credentials, exploitation of unpatched vulnerabilities with critical CVEs in VPN appliances, and configuration weaknesses like default credentials, weak passwords, and lack of MFA. Real-world attacks demonstrate the risk. Check Point SSL VPN CVE-2024-24919 allowed authentication bypass for session hijacking. Fortinet SSL-VPN vulnerabilities were leveraged for lateral movement and privilege escalation. SonicWall CVE-2024-53704 allowed remote authentication bypass for SSL VPN. Once inside via VPN, attackers conduct network reconnaissance, lateral movement, and privilege escalation. RDP is worse. Sophos found that cybercriminals abused RDP in 90% of attacks they investigated. External remote services like RDP were the initial access vector in 65% of incident response cases. RDP attack vectors include exposed RDP ports with port 3389 open to the internet, weak authentication with simple passwords vulnerable to brute force, lack of MFA with no second factor for authentication, and credential reuse from compromised passwords in data breaches. In one Darktrace case, attackers compromised an organization four times in six months, each time through exposed RDP ports. The attack chain went successful RDP login, internal reconnaissance via WMI, lateral movement via PsExec, and objective achievement. The Palo Alto Unit 42 Incident Response report found RDP was the initial attack vector in 50% of ransomware deployment cases. Email infrastructure remains a primary attack vector. Your email attack surface includes mail servers like Exchange, Office 365, and Gmail with configuration weaknesses, email authentication with misconfigured SPF, DKIM, and DMARC records, phishing-susceptible users targeted through social engineering, email attachments and links as malware delivery mechanisms, and compromised accounts through credential stuffing or password reuse. Email authentication misconfiguration is particularly insidious. If your SPF, DKIM, and DMARC records are wrong or missing, attackers can spoof emails from your domain, your legitimate emails get marked as spam, and phishing emails impersonating your organization succeed. Email servers themselves are also targets. The NSA released guidance on Microsoft Exchange Server security specifically because Exchange servers are so frequently compromised. Container orchestration platforms like Kubernetes introduce massive attack surface complexity. The Kubernetes attack surface includes the Kubernetes API server with exposed or misconfigured API endpoints, container images with vulnerabilities in base images or application layers, container registries like Docker Hub, ECR, and GCR with weak access controls, pod security policies with overly permissive container configurations, network policies with insufficient micro-segmentation between pods, secrets management with hardcoded secrets or weak secret storage, and RBAC misconfigurations with overly broad service account permissions. Container security issues include containers running as root with excessive privileges, exposed Docker daemon sockets allowing container escape, vulnerable dependencies in container images, and lack of runtime security monitoring. The Docker daemon attack surface is particularly concerning. Running containers with privileged access or allowing docker.sock access can enable container escape and host compromise. Serverless computing like AWS Lambda, Azure Functions, and Google Cloud Functions promised to eliminate infrastructure management. Instead, it just created new attack surfaces. Serverless attack surface components include function code vulnerabilities like injection flaws and insecure dependencies, IAM misconfigurations with overly permissive Lambda execution roles, environment variables storing secrets as plain text, function URLs with publicly accessible endpoints without authentication, and event source mappings with untrusted input from various cloud services. The overabundance of event sources expands the attack surface. Lambda functions can be triggered by S3 events, API Gateway requests, DynamoDB streams, SNS topics, EventBridge schedules, IoT events, and dozens more. Each event source is a potential injection point. If function input validation is insufficient, attackers can manipulate event data to exploit the function. Real-world Lambda attacks include credential theft by exfiltrating IAM credentials from environment variables, lateral movement using over-permissioned roles to access other AWS resources, and data exfiltration by invoking functions to query and extract database contents. The Scarlet Eel adversary specifically targeted AWS Lambda for credential theft and lateral movement. Microservices architecture multiplies attack surface by decomposing monolithic applications into dozens or hundreds of independent services. Each microservice has its own attack surface including authentication mechanisms where each service needs to verify requests, authorization rules where each service enforces access controls, API endpoints for service-to-service communication channels, data stores where each service may have its own database, and network interfaces where each service exposes network ports. Microservices security challenges include east-west traffic vulnerabilities with service-to-service communication without encryption or authentication, authentication and authorization complexity from managing auth across 40 plus services multiplied by 3 environments equaling 240 configurations, service-to-service trust where services blindly trust internal traffic, network segmentation failures with flat networks allowing unrestricted pod-to-pod communication, and inconsistent security policies with different services having different security standards. One compromised microservice can enable lateral movement across the entire application. Without proper network segmentation and zero trust architecture, attackers pivot from service to service. How do you measure something this large, right. Attack surface measurement is complex. Attack surface metrics include the total number of assets with all discovered systems, applications, and devices, newly discovered assets found through continuous discovery, the number of exposed assets accessible from the internet, open ports and services with network services listening for connections, vulnerabilities by severity including critical, high, medium, and low CVEs, mean time to detect, MTTD, measuring how quickly new assets are discovered, mean time to remediate, MTTR, measuring how quickly vulnerabilities are fixed, shadow IT assets that are unknown or unmanaged, third-party exposure from vendor and partner access points, and attack surface change rate showing how rapidly the attack surface evolves. Academic research has produced formal attack surface measurement methods. Pratyusa Manadhata's foundational work defines attack surface as a three-tuple, System Attackability, Channel Attackability, Data Attackability. But in practice, most organizations struggle with basic attack surface visibility, let alone quantitative measurement. Your attack surface isn't static. It changes constantly. Changes happen because developers deploy new services and APIs, cloud auto-scaling spins up new instances, shadow IT appears as employees adopt unapproved tools, acquisitions bring new infrastructure into your environment, IoT devices get plugged into your network, and subdomains get created for new projects. Static, point-in-time assessments are obsolete. You need continuous asset discovery and monitoring. Continuous discovery methods include automated network scanning for regular scans to detect new devices, cloud API polling to query cloud provider APIs for resource changes, DNS monitoring to track new subdomains via Certificate Transparency logs, passive traffic analysis to observe network traffic and identify assets, integration with CMDB or ITSM to sync with configuration management databases, and cloud inventory automation using Infrastructure as Code to track deployments. Understanding your attack surface is step one. Reducing it is the goal. Attack surface reduction begins with asset elimination by removing unnecessary assets entirely. This includes decommissioning unused servers and applications, deleting abandoned subdomains and DNS records, shutting down forgotten development environments, disabling unused network services and ports, and removing unused user accounts and service identities. Access control hardening implements least privilege everywhere by enforcing multi-factor authentication, MFA, for all remote access, using role-based access control, RBAC, for cloud resources, implementing zero trust network architecture, restricting network access with micro-segmentation, and applying the principle of least privilege to IAM roles. Exposure minimization reduces what's visible to attackers by moving services behind VPNs or bastion hosts, using private IP ranges for internal services, implementing network address translation, NAT, for outbound access, restricting API endpoints to authorized sources only, and disabling unnecessary features and functionalities. Security hardening strengthens what remains by applying security patches promptly, using security configuration baselines, enabling encryption for data in transit and at rest, implementing Web Application Firewalls, WAF, for web apps, and deploying endpoint detection and response, EDR, on all devices. Monitoring and detection watch for attacks in progress by implementing real-time threat detection, enabling comprehensive logging and SIEM integration, deploying intrusion detection and prevention systems, IDS/IPS, monitoring for anomalous behavior patterns, and using threat intelligence feeds to identify known bad actors. Your attack surface is exponentially larger than you think it is. Every asset you know about probably has three you don't. Every known vulnerability probably has ten undiscovered ones. Every third-party integration probably grants more access than you realize. Every collaboration tool is leaking more data than you imagine. Every paste site contains more of your secrets than you want to admit. And attackers know this. They're not just looking at what you think you've secured. They're systematically enumerating every possible entry point. They're mining Certificate Transparency logs for forgotten subdomains. They're scanning every IP in your address space. They're reverse-engineering your mobile apps. They're buying employee credentials from data breach databases. They're compromising your vendors to reach you. They're scraping Pastebin for your leaked secrets. They're pulling your public Docker images and extracting the embedded credentials. They're accessing your misconfigured S3 buckets and exfiltrating terabytes of data. They're exploiting your exposed Jenkins instances to compromise your entire infrastructure. They're manipulating your AI agents to exfiltrate private Notion data. The asymmetry is brutal. You have to defend every single attack vector. They only need to find one that works. So what do you do. Start by accepting that you don't have complete visibility. Nobody does. But you can work toward better visibility through continuous discovery, automated asset management, and integration of security tools that help map your actual attack surface. Implement attack surface reduction aggressively. Every asset you eliminate is one less thing to defend. Every service you shut down is one less potential vulnerability. Every piece of shadow IT you discover and bring under management is one less blind spot. Every misconfigured cloud storage bucket you fix is terabytes of data no longer exposed. Every leaked secret you rotate is one less credential floating around the internet. Adopt zero trust architecture. Stop assuming that anything, internal services, microservices, authenticated users, collaboration tools, is inherently trustworthy. Verify everything. Monitor paste sites and code repositories. Your secrets are out there. Find them before attackers weaponize them. Secure your collaboration tools. Slack, Trello, Jira, Confluence, Notion, Google Drive, and Airtable are all leaking data. Lock them down. Fix your container security. Scan images for secrets. Use secret managers instead of environment variables. Secure your registries. Harden your CI/CD pipelines. Jenkins, GitHub Actions, and GitLab CI are high-value targets. Protect them. And test your assumptions with red team exercises and continuous security testing. Your attack surface is what an attacker can reach, not what you think you've secured. The attack surface problem isn't getting better. Cloud adoption, DevOps practices, remote work, IoT proliferation, supply chain complexity, collaboration tool sprawl, and container adoption are all expanding organizational attack surfaces faster than security teams can keep up. But understanding the problem is the first step toward managing it. And now you understand exactly how catastrophically large your attack surface actually is.

0 views
James O'Claire 2 days ago

How creepy is the personalization in ChatGPT?

I’ve been pretty cavalier with using AI. I think once I got used to not fully trusting it’s truthfulness, and instead using it like a teacher that I question and verify. But this past month I’ve been getting more uncomfortable with the answers. Especially ones that I can see are digging up little nuggets of personal information I dropped in over the past year: This is something I’ve looked up half a dozen times. I’ve never used it, debated with friends multiple times it’s usefulness vs SSH. So when I put in the short prompt, I was more just wanting to revist the main talking points in the Tailscale vs SSH debate I’ve had in my head. After the main response, it provides this personalized summary and drops this little nugget of my personal life in, that I do work on my parent’s solar powered off-grid home where I visit a couple times a year. I can’t put my finger on why this bothered me so much. I’m proud of my parents house, I’ll tell anyone about it. I’ve certainly mentioned this to ChatGPT, I definitely used it last year when I built a new solar array for my parents. You can see the picture below building the new one with the older panels I built 12 years ago in the back. So why would it bother me so much? Was it the cognitive dissonance? I’m thinking about tailscale, and it is talking incorrectly about my parents who I miss? Is it that it dug up information about me from a year ago that I forgot, or never really thought about, that it would remember? I mean obviously, I’m on their website, they have my IP. But ChatGPT brings up my location like this fairly often, I think any time I mention a prompt about a product, which I do oftenish as I’ve been curious about how they’ll handle the advertising / product placements. That being said, something about the way it brings up the location, again feels off putting. DuckDuckGo and Google will use IP based location all the time and I’ve never been too bothered by it. But there’s something about the way ChatGPT brings it up, oddly mixing “look up pricing in” with the later “here” as if it’s here with me. Just definitely getting bad vibes. Chunks of code I copy paste into a git repo is like a little fingerprint that can always tie that code back to a moment in time I talked to that instance of OpenAI’s ChatGPT. Little chunks of me that I type into the background of my prompts tie more of my life to ChatGPT, and in ways that it will never forget. I’m not sure what the answer is yet. Maybe OpenAI will smooth out the awkwardness of how it will always remember, if it wants, everything you’ve ever typed to it. My hope is that open local models will become efficient enough to run locally on laptops or small home PCs and deliver private AI chats, but that seems like it’s far off for small budgets.

0 views
マリウス 2 days ago

Cameras, Cameras Everywhere!

We live in an age when a single walk down the street can put you inside at least a dozen different recording ecosystems at once: Fixed municipal CCTV, a bypassing police cruiser’s cameras or body-cam feeds, the license-plate cameras on light poles, the dash-, cabin-, and exterior cameras of nearby cloud-connected vehicles, Ring and Nest doorbells of residences that you might pass by, and the phones and wearables of other pedestrians passing you, that are quietly recording audio and/or video. Each of those systems was justified as a modest safety, convenience, or product feature, yet when stitched together they form a surveillance fabric that reaches far beyond its original intent. Instead of only looking at the big picture all these individual systems paint, let’s instead focus on each individual area and uncover some of the actors complicit in the making of this very surveillance machinery that they profit immensely from. Note: The lists below only mention a few of the most prominent enablers and profiteurs. CCTV is not new, but it’s booming. Market reports show the global video-surveillance/CCTV market measured in tens of billions of dollars and growing rapidly as governments and businesses deploy these solutions. A continued double-digit market growth over the next several years is expected. Cameras haven’t been reliably proven to reduce crime at scale, and the combination of live feeds, long-term storage and automated analytics (including behavior detection and face matching) enable discriminatory policing and concentrate a huge trove of intimate data without adequate oversight. Civil liberties groups and scholars argue CCTV expansion is often implemented with weak limits on access, retention, and third-party sharing. In addition, whenever tragedy strikes it seems like “more video surveillance, now powered by AI” is always the first response: More CCTV to be installed in train stations after knife attack Heidi Alexander has announced that the Government will invest in “improved” CCTV systems across the network, and that facial recognition could be introduced in stations following Saturday’s attack. “We are investing in improved CCTV in stations and the Home Office will soon be launching a consultation on more facial recognition technology which could be deployed in stations as well. So we take the safety of the travelling public incredibly seriously.” Automatic license-plate readers (ALPRs) used to be a tool for parking enforcement and specific investigations, but firms like Flock Safety have taken ALPRs into a new phase by offering cloud-hosted, networked plate-reading systems to neighborhoods, municipalities and private groups. The result is a searchable movement history for any car observed by the network. Supporters point to solved car thefts and missing-person leads. However, clearly these systems amount to distributed mass surveillance, with weak governance and potential for mission creep (including law-enforcement or immigration enforcement access). The ACLU and other groups have documented this tension and pressed for limits. Additionally there has been a plethora of media frenzy on specifically Flock Safety’s products and their reliability : A retired veteran named Lee Schmidt wanted to know how often Norfolk, Virginia’s 176 Flock Safety automated license-plate-reader cameras were tracking him. The answer, according to a U.S. District Court lawsuit filed in September, was more than four times a day, or 526 times from mid-February to early July. No, there’s no warrant out for Schmidt’s arrest, nor is there a warrant for Schmidt’s co-plaintiff, Crystal Arrington, whom the system tagged 849 times in roughly the same period. ( via Jalopnik ) Police departments now carry many more mobile recording tools than a decade ago, that allow the city’s static CCTV to be extended dynamically: Vehicle dash cameras, body-worn cameras (BWCs), and in some places live-streaming CCTV or automated alerts pushed to officers’ phones. Bodycams were originally promoted as accountability tools, and they have provided useful evidence, but they also create new data flows that can be fused with other systems (license-plate databases, facial-recognition engines, location logs), multiplying privacy and misuse risks. Many researchers, advocacy groups and watchdogs warn that pairing BWCs with facial recognition or AI analytics can make ubiquitous identification possible, and that policies and safeguards are lagging . Recent reporting has uncovered operations where real-time facial-recognition systems were used in ways not disclosed to local legislatures or the public, demonstrating how rapidly policy gets outpaced by deployment. One of many recent examples consists of an extended secret live-face-matching program in New Orleans that led to arrests and subsequent controversy about legality and oversight. Drones and aerial systems add another layer. Airborne or rooftop cameras can rapidly expand coverage areas and make “seeing everything” more practical, with similar debates about oversight, warranting, and civil-liberties protections. Modern cars increasingly ship with external and internal cameras, radar, microphones and cloud connections. Tesla specifically has been a headline example where in-car and exterior cameras record for features like Sentry Mode, Autopilot/FSD development, and safety investigations. Reporting has shown that internal videos captured by cars have, on multiple occasions, been accessed by company personnel and shared outside expected channels, sparking alarm about how that sensitive footage is handled. Videos of private interiors, garages and accidents have leaked, and workers have admitted to circulating clips . Regulators, privacy groups and media have flagged the risks of always-on vehicle cameras whose footage can be used beyond owners’ expectations. Automakers and suppliers are rapidly adding cameras for driver monitoring, ADAS (advanced driver-assistance systems), and event recording, which raises questions about consent when cars record passengers, passers-by, or are subject to remote access by manufacturers, insurers or law enforcement, especially with cloud-connected vehicles. Ring doorbells and other cloud-connected home security cameras have created an informal, semi-public surveillance layer. Millions of privately owned cameras facing streets and porches that can be searched, shared, and, in many jurisdictions, accessed by police via relationships or tools. Amazon’s Ring drew intense scrutiny for police partnerships and for security practices that at times exposed footage to unauthorized access. A private company mediates a vast public-facing camera network, and incentives push toward more sharing, not less. Another recent example of creeping features, Ring’s “Search Party” AI pet-finder feature (enabled by default), also raised fresh concerns about consent and the expansion of automated scanning on users’ cloud footage. While smartphones don’t (yet) record video all by themselves, the idea that our phones and earbuds “listen” only when we ask them has been punctured repeatedly. Investigations disclosed that contractors for Apple, Google and Amazon listened to small samples of voice-assistant recordings, often including accidentally captured private conversations, to train and improve models. There have also been appalling edge cases, like smart speakers accidentally sending recordings to contacts, or assistants waking and recording without clear triggers. These incidents underline how easily ambient audio can become recorded, labeled and routed into human or machine review. With AI assistants (Siri, Gemini, etc.) integrated on phones and wearables, for which processing often requires sending audio or text to the cloud, new features make it even harder for users to keep control of what’s retained, analyzed, or used to personalize models. A recent crop of AI wearables, like Humane ’s AI Pin , the Friend AI pendants and similar always-listening companions, aim to deliver an AI interface that’s untethered from a phone. They typically depend on continuous audio capture and sometimes even outward-facing cameras for vision features. The devices sparked two predictable controversies: Humane ’s AI Pin drew mixed reviews, questions about “trust lights” and bystander notice, and eventually a shutdown/asset sale that stranded some buyers, which is yet another example of how the technology and business models create risks for both privacy and consumers. Independent wearables like Friend have also raised alarm among reviewers about always-listening behavior without clear opt-out tools. Even though these devices might not necessarily have cameras (yet) to record video footage, they usually come with always-on microphones and can, at the very least, scan for nearby Bluetooth and WiFi devices to collect valuable insights on the user’s surroundings and, more precisely, other users in close proximity. A device category that banks primarily on its video recording capabilities are smart glasses. Unlike the glassholes from a decade ago, this time it seems fashionable and socially accepted to wear the latest cloud-connected glasses. Faced with the very same issues mentioned previously for different device types, smart glasses, too, create immense risks for privacy, with little to no policy in place to protect bystanders . There are several satellite constellations in orbit that house advanced imaging satellites capable of capturing high-resolution, close-up images of Earth’s surface, sometimes referred to as “spy satellites” . These satellites provide a range of services, from military reconnaissance to commercial imagery. Notable constellations by private companies include GeoEye ’s GeoEye-1 , Maxar ’s WorldView , Airbus ’ Pléiades , Spot Image ’s SPOT , and Planet Labs ’ RapidEye , Dove and SkySat . Surveillance tech frequently arrives with a compelling use case, like detering car theft, finding a missing child, automating a customer queue, or making life easier with audio and visual interactions. But it also tends to become infrastructural and persistent. When private corporations, local governments and individual citizens all accumulate recordings, we end up with a mosaic of surveillance that’s hard to govern because it’s distributed across actors with different incentives. In addition, surveillance technologies rarely affect everyone equally. Studies and analyses show disproportionate impacts on already-targeted communities, with increased policing, mistaken identifications from biased models, and chilling effects on protest, religion or free association. These systems entrench existing power imbalances and are primarily benefitial to the people in charge of watching rather than the majority that’s being watched . Ultimately, surveillance not only makes us more visible, but we’re also more persistently recorded, indexed and analyzable than ever before. Each camera, microphone and AI assistant may be framed as a single, sensible feature. Taken together, however, they form a dense information layer about who we are, where we go and how we behave. The public debate now needs to shift from “Can we build this?” to “Do we really want this?” . For that, we need an informed public that understands the impact of all these individual technologies and what it’s being asked to give up in exchange for the perceived sense of safety these systems offer. Avigilon (Motorola Solutions) Axis Communications Bosch Security Systems Sony Professional Axis Communications Bosch Security Systems Flock Safety Kapsch TrafficCom Motorola Solutions (WatchGuard) PlateSmart Technologies Digital Ally Kustom Signals Motorola Solutions (WatchGuard) Transcend Information Flock Safety Lockheed Martin (Procerus Technologies) Quantum Systems Mercedes-Benz Eufy Security Nest Hello (Google) Ring (Amazon) SkyBell (Honeywell) Bystander privacy (how do you notify people they’re being recorded?) Vendor and lifecycle risk (cloud dependence, subscription models, and what happens to device functionality or stored data if a startup folds) Gentle Monster Gucci (+ Snap) Oakley (+ Meta) Ray-Ban (+ Meta) Spectacles (Snap) BAE Systems General Dynamics (SATCOM) Thales Alenia Space

0 views
codedge 2 days ago

Managing secrets with SOPS in your homelab

Sealed Secrets, Ansible Vault, 1Password or SOPS - there are multiple ways how and where to store your secrets. I went with SOPS and age with my ArgoCD GitOps environment. Managing secrets in your homelab, be it within a Kubernetes cluster or while deploying systems and tooling with Ansible, is a topic that arises with almost 100% certainty. In general you need to decide, whether you want secrets to be held and managed externally or internally. One important advantage I see with internally managed solutions is, that I do not need an extra service. No extra costs and connections, no chicken-egg-problem when hosting your passwords inside your own Kubernetes cluster, but cannot reach it when the cluster is down. Therefore I went with SOPS for both, secrets for my Ansible scripts and secrets I need to set for my K8s cluster. While SOPS can be used with PGP, GnuPG and more, I settled with age as encryption. With SOPS your secrets live, encrypted, inside your repository and can be en-/decrypted on-the-fly whenever needed. The private key for encryption should, of course, never be committed into your git repository or made available to untrusted sources. First, we need to install SOPS, age and generate an age key. SOPS is available for all common operating systems via the package manager. I either use Mac or Arch: Now we need to generate an age key and link it to SOPS as the default key to encrypt with. Generate an age key Our age key will live in . Now we tell SOPS where to find our age key. I put the next line in my . The last thing to do is to put a in your folder from where you want to encrypt your files. This file acts as a configuration regarding the age recipient (key) and how the data should be encrypted. My config file looks like this: You might wonder yourself about the first rule with . I will just quote the [KSOPS docs](To make encrypted secrets more readable, we suggest using the following encryption regex to only encrypt data and stringData values. This leaves non-sensitive fields, like the secret’s name, unencrypted and human readable.) here: To make encrypted secrets more readable, we suggest using the following encryption regex to only encrypt data and stringData values. This leaves non-sensitive fields, like the secret’s name, unencrypted and human readable. All the configuration can be found in the SOPS docs . Let’s now look into the specifics using our new setup with either Ansible or Kubernetes. Ansible can automatically process (decrypt) SOPS-encrypted files with the [Community SOPS Collection](Community SOPS Collection). Additionally in my I enabled this plugin ( see docs ) via Now, taken from the official Ansible docs: After the plugin is enabled, correctly named group and host vars files will be transparently decrypted with SOPS. The files must end with one of these extensions: .sops.yaml .sops.yml .sops.json That’s it. You can now encrypt your group or host vars files and Ansible can automatically decrypt them. SOPS can be used with Kubernetes with the KSOPS Kustomize Plugin . The configuration we already prepared, we only need to apply KSOPS to our cluster. I use the following manifest - see more examples in my homelab repository : Externally managed: this includes either a self-hosted and externally hosted secrets solution like AWS KMS, password manager like 1Password or similar Internally managed: solutions where your secrets live next to your code, no external service is need Arch Linux: Two separate rules depending on the folder, where the encrypted files are located Files ending with are targeted The age key, that should be used for en-/decryption is specified

0 views
iDiallo 3 days ago

Why I Remain a Skeptic Despite Working in Tech

One thing that often surprises my friends and family is how tech-avoidant I am. I don't have the latest gadget, I talk about dumb TVs, and Siri isn't activated on my iPhone. The only thing left is to go to the kitchen, take a sheet of tin foil, and mold it into a hat. To put it simply, I avoid tech when I can. The main reason for my skepticism is that I don't like tracking technology. I can't stop it, I can't avoid it entirely, but I will try as much as I can. Take electric cars, for example. I get excited to see new models rolling out. But over-the-air updates freak me out. Why? Because I'm not the one in control of them. Modern cars now receive software updates wirelessly, similar to smartphones. These over-the-air updates can modify everything from infotainment systems to critical driving functions like powertrain systems, brakes, and advanced driver assistance systems. While this technology offers convenience, it also introduces security concerns, hackers could potentially gain remote access to vehicle systems. The possibility for a hostile take over went from 0 to 1. I buy things from Amazon. It's extremely convenient. But I don't feel comfortable having a microphone constantly listening. They may say that they don't listen to conversations, but you can't respond to a command without listening . It does use some trigger words to activate, but they still occasionally accidentally activate and start recording. Amazon acknowledges that it employs thousands of people worldwide to listen to Alexa voice recordings and transcribe them to improve the AI's capabilities. In 2023, the FTC fined Amazon $31 million for violating children's privacy laws by keeping kids' Alexa voice recordings indefinitely and undermining parents' deletion requests. The same thing with Siri. Apple likes to brag about their privacy features, but they still paid $95 million in a Siri eavesdropping settlement . Vizio TVs take screenshots from 11 million smart TVs and sell viewing data to third parties without users' knowledge or consent. The data is bundled with personal information including sex, age, income, marital status, household size, education level, and home value, then sold to advertisers. The FTC fined Vizio $2.2 million in 2017, but by then the damage was done. This technology isn't limited to Vizio. Most smart TV manufacturers use similar tracking. ACR can analyze exactly what's on your screen regardless of source, meaning your TV knows when you're playing video games, watching Blu-rays, or even casting home movies from your phone. In 2023, Tesla faced a class action lawsuit after reports revealed that employees shared private photos and videos from customer vehicle cameras between 2019 and 2022. The content included private footage from inside customers' garages. One video that circulated among employees showed a Tesla hitting a child on a bike . Tesla's privacy notice states that "camera recordings remain anonymous and are not linked to you or your vehicle," yet employees clearly had access to identify and share specific footage. Amazon links every Alexa interaction to your account and uses the data to profile you for targeted advertising. While Vizio was ordered to delete the data it collected, the court couldn't force third parties who purchased the data to delete it. Once your data is out there, you've lost control of it forever. For me, a technological device that I own should belong to me, and me only. But for some reason, as soon as we add the internet to any device, it stops belonging to us. The promise of smart technology is convenience and innovation. The reality is surveillance and monetization. Our viewing habits, conversations, and driving patterns are products being sold without our meaningful consent. I love tech, and I love solving problems. But as long as I don't have control of the devices I use, I'll remain a tech skeptic. One who works from the inside, hoping to build better solutions. The industry needs people who question these practices, who push back against normalized surveillance, and who remember that technology should serve users, not exploit them. Until then, I'll keep my TV dumb, my Siri disabled, and be the annoying family member who won't join your facebook group.

0 views
Simon Willison 3 days ago

A new SQL-powered permissions system in Datasette 1.0a20

Datasette 1.0a20 is out with the biggest breaking API change on the road to 1.0, improving how Datasette's permissions system works by migrating permission logic to SQL running in SQLite. This release involved 163 commits , with 10,660 additions and 1,825 deletions, most of which was written with the help of Claude Code. Datasette's permissions system exists to answer the following question: Is this actor allowed to perform this action , optionally against this particular resource ? An actor is usually a user, but might also be an automation operating via the Datasette API. An action is a thing they need to do - things like view-table, execute-sql, insert-row. A resource is the subject of the action - the database you are executing SQL against, the table you want to insert a row into. Datasette's default configuration is public but read-only: anyone can view databases and tables or execute read-only SQL queries but no-one can modify data. Datasette plugins can enable all sorts of additional ways to interact with databases, many of which need to be protected by a form of authentication Datasette also 1.0 includes a write API with a need to configure who can insert, update, and delete rows or create new tables. Actors can be authenticated in a number of different ways provided by plugins using the actor_from_request() plugin hook. datasette-auth-passwords and datasette-auth-github and datasette-auth-existing-cookies are examples of authentication plugins. The previous implementation included a design flaw common to permissions systems of this nature: each permission check involved a function call which would delegate to one or more plugins and return a True/False result. This works well for single checks, but has a significant problem: what if you need to show the user a list of things they can access, for example the tables they can view? I want Datasette to be able to handle potentially thousands of tables - tables in SQLite are cheap! I don't want to have to run 1,000+ permission checks just to show the user a list of tables. Since Datasette is built on top of SQLite we already have a powerful mechanism to help solve this problem. SQLite is really good at filtering large numbers of records. The biggest change in the new release is that I've replaced the previous plugin hook - which let a plugin determine if an actor could perform an action against a resource - with a new permission_resources_sql(actor, action) plugin hook. Instead of returning a True/False result, this new hook returns a SQL query that returns rules helping determine the resources the current actor can execute the specified action against. Here's an example, lifted from the documentation: This hook grants the actor with ID "alice" permission to view the "sales" table in the "accounting" database. The object should always return four columns: a parent, child, allow (1 or 0), and a reason string for debugging. When you ask Datasette to list the resources an actor can access for a specific action, it will combine the SQL returned by all installed plugins into a single query that joins against the internal catalog tables and efficiently lists all the resources the actor can access. This query can then be limited or paginated to avoid loading too many results at once. Datasette has several additional requirements that make the permissions system more complicated. Datasette permissions can optionally act against a two-level hierarchy . You can grant a user the ability to insert-row against a specific table, or every table in a specific database, or every table in every database in that Datasette instance. Some actions can apply at the table level, others the database level and others only make sense globally - enabling a new feature that isn't tied to tables or databases, for example. Datasette currently has ten default actions but plugins that add additional features can register new actions to better participate in the permission systems. Datasette's permission system has a mechanism to veto permission checks - a plugin can return a deny for a specific permission check which will override any allows. This needs to be hierarchy-aware - a deny at the database level can be outvoted by an allow at the table level. Finally, Datasette includes a mechanism for applying additional restrictions to a request. This was introduced for Datasette's API - it allows a user to create an API token that can act on their behalf but is only allowed to perform a subset of their capabilities - just reading from two specific tables, for example. Restrictions are described in more detail in the documentation. That's a lot of different moving parts for the new implementation to cover. Since permissions are critical to the security of a Datasette deployment it's vital that they are as easy to understand and debug as possible. The new alpha adds several new debugging tools, including this page that shows the full list of resources matching a specific action for the current user: And this page listing the rules that apply to that question - since different plugins may return different rules which get combined together: This screenshot illustrates two of Datasette's built-in rules: there is a default allow for read-only operations such as view-table (which can be over-ridden by plugins) and another rule that says the root user can do anything (provided Datasette was started with the option.) Those rules are defined in the datasette/default_permissions.py Python module. There's one question that the new system cannot answer: provide a full list of actors who can perform this action against this resource. It's not possibly to provide this globally for Datasette because Datasette doesn't have a way to track what "actors" exist in the system. SSO plugins such as mean a new authenticated GitHub user might show up at any time, with the ability to perform actions despite the Datasette system never having encountered that particular username before. API tokens and actor restrictions come into play here as well. A user might create a signed API token that can perform a subset of actions on their behalf - the existence of that token can't be predicted by the permissions system. This is a notable omission, but it's also quite common in other systems. AWS cannot provide a list of all actors who have permission to access a specific S3 bucket, for example - presumably for similar reasons. Datasette's plugin ecosystem is the reason I'm paying so much attention to ensuring Datasette 1.0 has a stable API. I don't want plugin authors to need to chase breaking changes once that 1.0 release is out. The Datasette upgrade guide includes detailed notes on upgrades that are needed between the 0.x and 1.0 alpha releases. I've added an extensive section about the permissions changes to that document. I've also been experimenting with dumping those instructions directly into coding agent tools - Claude Code and Codex CLI - to have them upgrade existing plugins for me. This has been working extremely well . I've even had Claude Code update those notes itself with things it learned during an upgrade process! This is greatly helped by the fact that every single Datasette plugin has an automated test suite that demonstrates the core functionality works as expected. Coding agents can use those tests to verify that their changes have had the desired effect. I've also been leaning heavily on to help with the upgrade process. I wrote myself two new helper scripts - and - to help test the new plugins. The and implementations can be found in this TIL . Some of my plugin upgrades have become a one-liner to the command, which runs OpenAI Codex CLI with a prompt without entering interactive mode: There are still a bunch more to go - there's a list in this tracking issue - but I expect to have the plugins I maintain all upgraded pretty quickly now that I have a solid process in place. This change to Datasette core by far the most ambitious piece of work I've ever attempted using a coding agent. Last year I agreed with the prevailing opinion that LLM assistance was much more useful for greenfield coding tasks than working on existing codebases. The amount you could usefully get done was greatly limited by the need to fit the entire codebase into the model's context window. Coding agents have entirely changed that calculation. Claude Code and Codex CLI still have relatively limited token windows - albeit larger than last year - but their ability to search through the codebase, read extra files on demand and "reason" about the code they are working with has made them vastly more capable. I no longer see codebase size as a limiting factor for how useful they can be. I've also spent enough time with Claude Sonnet 4.5 to build a weird level of trust in it. I can usually predict exactly what changes it will make for a prompt. If I tell it "extract this code into a separate function" or "update every instance of this pattern" I know it's likely to get it right. For something like permission code I still review everything it does, often by watching it as it works since it displays diffs in the UI. I also pay extremely close attention to the tests it's writing. Datasette 1.0a19 already had 1,439 tests, many of which exercised the existing permission system. 1.0a20 increases that to 1,583 tests. I feel very good about that, especially since most of the existing tests continued to pass without modification. I built several different proof-of-concept implementations of SQL permissions before settling on the final design. My research/sqlite-permissions-poc project was the one that finally convinced me of a viable approach, That one started as a free ranging conversation with Claude , at the end of which I told it to generate a specification which I then fed into GPT-5 to implement. You can see that specification at the end of the README . I later fed the POC itself into Claude Code and had it implement the first version of the new Datasette system based on that previous experiment. This is admittedly a very weird way of working, but it helped me finally break through on a problem that I'd been struggling with for months. Now that the new alpha is out my focus is upgrading the existing plugin ecosystem to use it, and supporting other plugin authors who are doing the same. The new permissions system unlocks some key improvements to Datasette Cloud concerning finely-grained permissions for larger teams, so I'll be integrating the new alpha there this week. This is the single biggest backwards-incompatible change required before Datasette 1.0. I plan to apply the lessons I learned from this project to the other, less intimidating changes. I'm hoping this can result in a final 1.0 release before the end of the year! You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Understanding the permissions system Permissions systems need to be able to efficiently list things The new permission_resources_sql() plugin hook Hierarchies, plugins, vetoes, and restrictions New debugging tools The missing feature: list actors who can act on this resource Upgrading plugins for Datasette 1.0a20 Using Claude Code to implement this change Starting with a proof-of-concept Miscellaneous tips I picked up along the way What's next? = "test against datasette dev" - it runs a plugin's existing test suite against the current development version of Datasette checked out on my machine. It passes extra options through to so I can run or as needed. = "run against datasette dev" - it runs the latest dev command with the plugin installed. When working on anything relating to plugins it's vital to have at least a few real plugins that you upgrade in lock-step with the core changes. The and shortcuts were invaluable for productively working on those plugins while I made changes to core. Coding agents make experiments much cheaper. I threw away so much code on the way to the final implementation, which was psychologically easier because the cost to create that code in the first place was so low. Tests, tests, tests. This project would have been impossible without that existing test suite. The additional tests we built along the way give me confidence that the new system is as robust as I need it to be. Claude writes good commit messages now! I finally gave in and let it write these - previously I've been determined to write them myself. It's a big time saver to be able to say "write a tasteful commit message for these changes". Claude is also great at breaking up changes into smaller commits. It can also productively rewrite history to make it easier to follow, especially useful if you're still working in a branch. A really great way to review Claude's changes is with the GitHub PR interface. You can attach comments to individual lines of code and then later prompt Claude like this: . This is a very quick way to apply little nitpick changes - rename this function, refactor this repeated code, add types here etc. The code I write with LLMs is higher quality code . I usually find myself making constant trade-offs while coding: this function would be neater if I extracted this helper, it would be nice to have inline documentation here, this changing this would be good but would break a dozen tests... for each of those I have to determine if the additional time is worth the benefit. Claude can apply changes so much faster than me that these calculations have changed - almost any improvement is worth applying, no matter how trivial, because the time cost is so low. Internal tools are cheap now. The new debugging interfaces were mostly written by Claude and are significantly nicer to use and look at than the hacky versions I would have knocked out myself, if I had even taken the extra time to build them. That trick with a Markdown file full of upgrade instructions works astonishingly well - it's the same basic idea as Claude Skills . I maintain over 100 Datasette plugins now and I expect I'll be automating all sorts of minor upgrades in the future using this technique.

0 views
devansh 3 days ago

AI pentest scoping playbook

Disclosure: Certain sections of this content were grammatically refined/updated using AI assistance, as English is not my first language. Organizations are throwing money at "AI red teams" who run a few prompt injection tests, declare victory, and cash checks. Security consultants are repackaging traditional pentest methodologies with "AI" slapped on top, hoping nobody notices they're missing 80% of the actual attack surface. And worst of all, the people building AI systems, the ones who should know better, are scoping engagements like they're testing a CRUD app from 2015. This guide/playbook exists because the current state of AI security testing is dangerously inadequate. The attack surface is massive. The risks are novel. The methodologies are immature. And the consequences of getting it wrong are catastrophic. These are my personal views, informed by professional experience but not representative of my employer. What follows is what I wish every CISO, security lead, and AI team lead understood before they scoped their next AI security engagement. Traditional web application pentests follow predictable patterns. You scope endpoints, define authentication boundaries, exclude production databases, and unleash testers to find SQL injection and XSS. The attack surface is finite, the vulnerabilities are catalogued, and the methodologies are mature. AI systems break all of that. First, the system output is non-deterministic . You can't write a test case that says "given input X, expect output Y" because the model might generate something completely different next time. This makes reproducibility, the foundation of security testing, fundamentally harder. Second, the attack surface is layered and interconnected . You're not just testing an application. You're testing a model (which might be proprietary and black-box), a data pipeline (which might include RAG, vector stores, and real-time retrieval), integration points (APIs, plugins, browser tools), and the infrastructure underneath (cloud services, containers, orchestration). Third, novel attack classes exist that don't map to traditional vuln categories . Prompt injection isn't XSS. Data poisoning isn't SQL injection. Model extraction isn't credential theft. Jailbreaks don't fit CVE taxonomy. The OWASP Top 10 doesn't cover this. Fourth, you might not control the model . If you're using OpenAI's API or Anthropic's Claude, you can't test the training pipeline, you can't audit the weights, and you can't verify alignment. Your scope is limited to what the API exposes, which means you're testing a black box with unknown internals. Fifth, AI systems are probabilistic, data-dependent, and constantly evolving . A model that's safe today might become unsafe after fine-tuning. A RAG system that's secure with Dataset A might leak PII when Dataset B is added. An autonomous agent that behaves correctly in testing might go rogue in production when it encounters edge cases. This isn't incrementally harder than web pentesting. It's just fundamentally different. And if your scope document looks like a web app pentest with "LLM" find-and-replaced in, you're going to miss everything that matters. Before you can scope an AI security engagement, you need to understand what you're actually testing. And most organizations don't. Here's the stack: This is the thing everyone focuses on because it's the most visible. But "the model" isn't monolithic. Base model : Is it GPT-4? Claude? Llama 3? Mistral? A custom model you trained from scratch? Each has different vulnerabilities, different safety mechanisms, different failure modes. Fine-tuning : Have you fine-tuned the base model on your own data? Fine-tuning can break safety alignment. It can introduce backdoors. It can memorize training data and leak it during inference. If you've fine-tuned, that's in scope. Instruction tuning : Have you applied instruction-tuning or RLHF to shape model behavior? That's another attack surface. Adversaries can craft inputs that reverse your alignment work. Multi-model orchestration : Are you running multiple models and aggregating outputs? That introduces new failure modes. What happens when Model A says "yes" and Model B says "no"? How do you handle consensus? Can an adversary exploit disagreements? Model serving infrastructure : How is the model deployed? Is it an API? A container? Serverless functions? On-prem hardware? Each deployment model has different security characteristics. AI systems don't just run models. They feed data into models. And that data pipeline is massive attack surface. Training data : Where did the training data come from? Who curated it? How was it cleaned? Is it public? Proprietary? Scraped? Licensed? Can an adversary poison the training data? RAG (Retrieval-Augmented Generation) : Are you using RAG to ground model outputs in retrieved documents? That's adding an entire data retrieval system to your attack surface. Can an adversary inject malicious documents into your knowledge base? Can they manipulate retrieval to leak sensitive docs? Can they poison the vector embeddings? Vector databases : If you're using RAG, you're running a vector database (Pinecone, Weaviate, Chroma, etc.). That's infrastructure. That has vulnerabilities. That's in scope. Real-time data ingestion : Are you pulling live data from APIs, databases, or user uploads? Each data source is a potential injection point. Data preprocessing : How are inputs sanitized before hitting the model? Are you stripping dangerous characters? Validating formats? Filtering content? Attackers will test every preprocessing step for bypasses. Models don't exist in isolation. They're integrated into applications. And those integration points are attack surface. APIs : How do users interact with the model? REST APIs? GraphQL? WebSockets? Each has different attack vectors. Authentication and authorization : Who can access the model? How are permissions enforced? Can an adversary escalate privileges? Rate limiting : Can an adversary send 10,000 requests per second? Can they DOS your model? Can they extract the entire training dataset via repeated queries? Logging and monitoring : Are you logging inputs and outputs? If yes, are you protecting those logs from unauthorized access? Logs containing sensitive user queries are PII. Plugins and tool use : Can the model call external APIs? Execute code? Browse the web? Use tools? Every plugin is an attack vector. If your model can execute Python, an adversary will try to get it to run . Multi-turn conversations : Do users have multi-turn dialogues with the model? Multi-turn interactions create new attack surfaces because adversaries can condition the model over multiple turns, bypassing safety mechanisms gradually/ If you've built agentic systems, AI that can plan, reason, use tools, and take actions autonomously, you've added an entire new dimension of attack surface. Tool access : What tools can the agent use? File system access? Database queries? API calls? Browser automation? The more powerful the tools, the higher the risk. Planning and reasoning : How does the agent decide what actions to take? Can an adversary manipulate the planning process? Can they inject malicious goals? Memory systems : Do agents have persistent memory? Can adversaries poison that memory? Can they extract sensitive information from memory? Multi-agent coordination : Are you running multiple agents that coordinate? Can adversaries exploit coordination protocols? Can they cause agents to turn on each other or collude against safety mechanisms? Escalation paths : Can an agent escalate privileges? Can it access resources it shouldn't? Can it spawn new agents? AI systems run on infrastructure. That infrastructure has traditional security vulnerabilities that still matter. Cloud services : Are you running on AWS, Azure, GCP? Are your S3 buckets public? Are your IAM roles overly permissive? Are your API keys hardcoded in repos? Containers and orchestration : Are you using Docker, Kubernetes? Are your container images vulnerable? Are your registries exposed? Are your secrets managed properly? CI/CD pipelines : How do you deploy model updates? Can an adversary inject malicious code into your pipeline? Dependencies : Are you using vulnerable Python libraries? Compromised npm packages? Poisoned PyPI distributions? Secrets management : Where are your API keys, database credentials, and model weights stored? Are they in environment variables? Config files? Secret managers? How much of that did you include in your last AI security scope document? If the answer is "less than 60%", your scope is inadequate. And you're going to get breached by someone who understands the full attack surface. The OWASP Top 10 for LLM Applications is the closest thing we have to a standardized framework for AI security testing. If you're scoping an AI engagement and you haven't mapped every item in this list to your test plan, you're doing it wrong. Here's the 2025 version: That's your baseline. But if you stop there, you're missing half the attack surface. The OWASP LLM Top 10 is valuable, but it's not comprehensive. Here's what's missing: Safety ≠ security . But unsafe AI systems cause real harm, and that's in scope for red teaming. Alignment failures : Can the model be made to behave in ways that violate its stated values? Constitutional AI bypass : If you're using constitutional AI techniques (like Anthropic's Claude), can adversaries bypass the constitution? Bias amplification : Does the model exhibit or amplify demographic biases? This isn't just an ethics issue—it's a legal risk under GDPR, EEOC, and other regulations. Harmful content generation : Can the model be tricked into generating illegal, dangerous, or abusive content? Deceptive behavior : Can the model lie, manipulate, or deceive users? Traditional adversarial ML attacks apply to AI systems. Evasion attacks : Can adversaries craft inputs that cause misclassification? Model inversion : Can adversaries reconstruct training data from model outputs? Model extraction : Can adversaries steal model weights through repeated queries? Membership inference : Can adversaries determine if specific data was in the training set? Backdoor attacks : Does the model have hidden backdoors that trigger on specific inputs? If your AI system handles multiple modalities (text, images, audio, video), you have additional attack surface. Cross-modal injection : Attackers embed malicious instructions in images that the vision-language model follows. Image perturbation attacks : Small pixel changes invisible to humans cause model failures. Audio adversarial examples : Audio inputs crafted to cause misclassification. Typographic attacks : Adversarial text rendered as images to bypass filters. Multi-turn multimodal jailbreaks : Combining text and images across multiple turns to bypass safety. AI systems must comply with GDPR, HIPAA, CCPA, and other regulations. PII handling : Does the model process, store, or leak personally identifiable information? Right to explanation : Can users get explanations for automated decisions (GDPR Article 22)? Data retention : How long is data retained? Can users request deletion? Cross-border data transfers : Does the model send data across jurisdictions? Before you write your scope document, answer every single one of these questions. If you can't answer them, you don't understand your system well enough to scope a meaningful AI security engagement. If you can answer all these questions, you're ready to scope. If you can't, you're not. Your AI pentest/engagement scope document needs to be more detailed than a traditional pentest scope. Here's the structure: What we're testing : One-paragraph description of the AI system. Why we're testing : Business objectives (compliance, pre-launch validation, continuous assurance, incident response). Key risks : Top 3-5 risks that drive the engagement. Success criteria : What does "passing" look like? Architectural diagram : Include everything—model, data pipelines, APIs, infrastructure, third-party services. Component inventory : List every testable component with owner, version, and deployment environment. Data flows : Document how data moves through the system, from user input to model output to downstream consumers. Trust boundaries : Identify where data crosses trust boundaries (user → app, app → model, model → tools, tools → external APIs). Be exhaustive. List: For each component, specify: Map every OWASP LLM Top 10 item to specific test cases. Example: LLM01 - Prompt Injection : Include specific threat scenarios: Explicitly list what's NOT being tested: Tools : List specific tools testers will use: Techniques : Test phases : Authorization : All testing must be explicitly authorized in writing. Include names, signatures, dates. Ethical boundaries : No attempts at physical harm, financial fraud, illegal content generation (unless explicitly scoped for red teaming). Disclosure : Critical findings must be disclosed immediately via designated channel (email, Slack, phone). Standard findings can wait for formal report. Data handling : Testers must not exfiltrate user data, training data, or model weights except as explicitly authorized for demonstration purposes. All test data must be destroyed post-engagement. Legal compliance : Testing must comply with all applicable laws and regulations. If testing involves accessing user data, appropriate legal review must be completed. Technical report : Detailed findings with severity ratings, reproduction steps, evidence (screenshots, logs, payloads), and remediation guidance. Executive summary : Business-focused summary of key risks and recommendations. Threat model : Updated threat model based on findings. Retest availability : Will testers be available for retest after fixes? Timeline : Start date, end date, report delivery date, retest window. Key contacts : That's your scope document. It should be 10-20 pages. If it's shorter, you're missing things. Here's what I see organizations get wrong: Mistake 1: Scoping only the application layer, not the model You test the web app that wraps the LLM, but you don't test the LLM itself. You find XSS and broken authz, but you miss prompt injection, jailbreaks, and data extraction. Fix : Scope the full stack-app, model, data pipelines, infrastructure. Mistake 2: Treating the model as a black box when you control it If you fine-tuned the model, you have access to training data and weights. Test for data poisoning, backdoors, and alignment failures. Don't just test the API. Fix : If you control any part of the model lifecycle (training, fine-tuning, deployment), include that in scope. Mistake 3: Ignoring RAG and vector databases You test the LLM, but you don't test the document store. Adversaries inject malicious documents, manipulate retrieval, and poison embeddings—and you never saw it coming. Fix : If you're using RAG, the vector database and document ingestion pipeline are in scope. Mistake 4: Not testing multi-turn interactions You test single-shot prompts, but adversaries condition the model over 10 turns to bypass refusal mechanisms. You missed the attack entirely. Fix : Test multi-turn dialogues explicitly. Test conversation history isolation. Test memory poisoning. Mistake 5: Assuming third-party models are safe You're using OpenAI's API, so you assume it's secure. But you're passing user PII in prompts, you're not validating outputs before execution, and you haven't considered what happens if OpenAI's safety mechanisms fail. Fix : Even with third-party models, test your integration. Test input/output handling. Test failure modes. Mistake 6: Not including AI safety in security scope You test for technical vulnerabilities but ignore alignment failures, bias amplification, and harmful content generation. Then your model generates racist outputs or dangerous instructions, and you're in the news. Fix : AI safety is part of AI security. Include alignment testing, bias audits, and harm reduction validation. Mistake 7: Underestimating autonomous agent risks You test the LLM, but your agent can execute code, call APIs, and access databases. An adversary hijacks the agent, and it deletes production data or exfiltrates secrets. Fix : Autonomous agents are their own attack surface. Test tool permissions, privilege escalation, and agent behavior boundaries. Mistake 8: Not planning for continuous testing You do one pentest before launch, then never test again. But you're fine-tuning weekly, adding new plugins monthly, and updating RAG documents daily. Your attack surface is constantly changing. Fix : Scope for continuous red teaming, not one-time assessment. Organizations hire expensive consultants to run a few prompt injection tests, declare the system "secure," and ship to production. Then they get breached six months later when someone figures out a multi-turn jailbreak or poisons the RAG document store. The problem isn't that the testers are bad. The problem is that the scopes are inadequate . You can't find what you're not looking for. If your scope doesn't include RAG poisoning, testers won't test for it. If your scope doesn't include membership inference, testers won't test for it. If your scope doesn't include agent privilege escalation, testers won't test for it. And attackers will. The asymmetry is brutal: you have to defend every attack vector. Attackers only need to find one that works. So when you scope your next AI security engagement, ask yourself: "If I were attacking this system, what would I target?" Then make sure every single one of those things is in your scope document. Because if it's not in scope, it's not getting tested. And if it's not getting tested, it's going to get exploited. Traditional pentests are point-in-time assessments. You test, you report, you fix, you're done. That doesn't work for AI systems. AI systems evolve constantly: Every change introduces new attack surface. And if you're only testing once a year, you're accumulating risk for 364 days. You need continuous red teaming . Here's how to build it: Use tools like Promptfoo, Garak, and PyRIT to run automated adversarial testing on every model update. Integrate tests into CI/CD pipelines so every deployment is validated before production. Set up continuous monitoring for: Quarterly or bi-annually, bring in expert red teams for comprehensive testing beyond what automation can catch. Focus deep assessments on: Train your own security team on AI-specific attack techniques. Develop internal playbooks for: Every quarter, revisit your threat model: Update your testing roadmap based on evolving threats. Scoping AI security engagements is harder than traditional pentests because the attack surface is larger, the risks are novel, and the methodologies are still maturing. But it's not impossible. You need to: If you do this right, you'll find vulnerabilities before attackers do. If you do it wrong, you'll end up in the news explaining why your AI leaked training data, generated harmful content, or got hijacked by adversaries. First, the system output is non-deterministic . You can't write a test case that says "given input X, expect output Y" because the model might generate something completely different next time. This makes reproducibility, the foundation of security testing, fundamentally harder. Second, the attack surface is layered and interconnected . You're not just testing an application. You're testing a model (which might be proprietary and black-box), a data pipeline (which might include RAG, vector stores, and real-time retrieval), integration points (APIs, plugins, browser tools), and the infrastructure underneath (cloud services, containers, orchestration). Third, novel attack classes exist that don't map to traditional vuln categories . Prompt injection isn't XSS. Data poisoning isn't SQL injection. Model extraction isn't credential theft. Jailbreaks don't fit CVE taxonomy. The OWASP Top 10 doesn't cover this. Fourth, you might not control the model . If you're using OpenAI's API or Anthropic's Claude, you can't test the training pipeline, you can't audit the weights, and you can't verify alignment. Your scope is limited to what the API exposes, which means you're testing a black box with unknown internals. Fifth, AI systems are probabilistic, data-dependent, and constantly evolving . A model that's safe today might become unsafe after fine-tuning. A RAG system that's secure with Dataset A might leak PII when Dataset B is added. An autonomous agent that behaves correctly in testing might go rogue in production when it encounters edge cases. Base model : Is it GPT-4? Claude? Llama 3? Mistral? A custom model you trained from scratch? Each has different vulnerabilities, different safety mechanisms, different failure modes. Fine-tuning : Have you fine-tuned the base model on your own data? Fine-tuning can break safety alignment. It can introduce backdoors. It can memorize training data and leak it during inference. If you've fine-tuned, that's in scope. Instruction tuning : Have you applied instruction-tuning or RLHF to shape model behavior? That's another attack surface. Adversaries can craft inputs that reverse your alignment work. Multi-model orchestration : Are you running multiple models and aggregating outputs? That introduces new failure modes. What happens when Model A says "yes" and Model B says "no"? How do you handle consensus? Can an adversary exploit disagreements? Model serving infrastructure : How is the model deployed? Is it an API? A container? Serverless functions? On-prem hardware? Each deployment model has different security characteristics. Training data : Where did the training data come from? Who curated it? How was it cleaned? Is it public? Proprietary? Scraped? Licensed? Can an adversary poison the training data? RAG (Retrieval-Augmented Generation) : Are you using RAG to ground model outputs in retrieved documents? That's adding an entire data retrieval system to your attack surface. Can an adversary inject malicious documents into your knowledge base? Can they manipulate retrieval to leak sensitive docs? Can they poison the vector embeddings? Vector databases : If you're using RAG, you're running a vector database (Pinecone, Weaviate, Chroma, etc.). That's infrastructure. That has vulnerabilities. That's in scope. Real-time data ingestion : Are you pulling live data from APIs, databases, or user uploads? Each data source is a potential injection point. Data preprocessing : How are inputs sanitized before hitting the model? Are you stripping dangerous characters? Validating formats? Filtering content? Attackers will test every preprocessing step for bypasses. APIs : How do users interact with the model? REST APIs? GraphQL? WebSockets? Each has different attack vectors. Authentication and authorization : Who can access the model? How are permissions enforced? Can an adversary escalate privileges? Rate limiting : Can an adversary send 10,000 requests per second? Can they DOS your model? Can they extract the entire training dataset via repeated queries? Logging and monitoring : Are you logging inputs and outputs? If yes, are you protecting those logs from unauthorized access? Logs containing sensitive user queries are PII. Plugins and tool use : Can the model call external APIs? Execute code? Browse the web? Use tools? Every plugin is an attack vector. If your model can execute Python, an adversary will try to get it to run . Multi-turn conversations : Do users have multi-turn dialogues with the model? Multi-turn interactions create new attack surfaces because adversaries can condition the model over multiple turns, bypassing safety mechanisms gradually/ Tool access : What tools can the agent use? File system access? Database queries? API calls? Browser automation? The more powerful the tools, the higher the risk. Planning and reasoning : How does the agent decide what actions to take? Can an adversary manipulate the planning process? Can they inject malicious goals? Memory systems : Do agents have persistent memory? Can adversaries poison that memory? Can they extract sensitive information from memory? Multi-agent coordination : Are you running multiple agents that coordinate? Can adversaries exploit coordination protocols? Can they cause agents to turn on each other or collude against safety mechanisms? Escalation paths : Can an agent escalate privileges? Can it access resources it shouldn't? Can it spawn new agents? Cloud services : Are you running on AWS, Azure, GCP? Are your S3 buckets public? Are your IAM roles overly permissive? Are your API keys hardcoded in repos? Containers and orchestration : Are you using Docker, Kubernetes? Are your container images vulnerable? Are your registries exposed? Are your secrets managed properly? CI/CD pipelines : How do you deploy model updates? Can an adversary inject malicious code into your pipeline? Dependencies : Are you using vulnerable Python libraries? Compromised npm packages? Poisoned PyPI distributions? Secrets management : Where are your API keys, database credentials, and model weights stored? Are they in environment variables? Config files? Secret managers? Alignment failures : Can the model be made to behave in ways that violate its stated values? Constitutional AI bypass : If you're using constitutional AI techniques (like Anthropic's Claude), can adversaries bypass the constitution? Bias amplification : Does the model exhibit or amplify demographic biases? This isn't just an ethics issue—it's a legal risk under GDPR, EEOC, and other regulations. Harmful content generation : Can the model be tricked into generating illegal, dangerous, or abusive content? Deceptive behavior : Can the model lie, manipulate, or deceive users? Evasion attacks : Can adversaries craft inputs that cause misclassification? Model inversion : Can adversaries reconstruct training data from model outputs? Model extraction : Can adversaries steal model weights through repeated queries? Membership inference : Can adversaries determine if specific data was in the training set? Backdoor attacks : Does the model have hidden backdoors that trigger on specific inputs? Cross-modal injection : Attackers embed malicious instructions in images that the vision-language model follows. Image perturbation attacks : Small pixel changes invisible to humans cause model failures. Audio adversarial examples : Audio inputs crafted to cause misclassification. Typographic attacks : Adversarial text rendered as images to bypass filters. Multi-turn multimodal jailbreaks : Combining text and images across multiple turns to bypass safety. PII handling : Does the model process, store, or leak personally identifiable information? Right to explanation : Can users get explanations for automated decisions (GDPR Article 22)? Data retention : How long is data retained? Can users request deletion? Cross-border data transfers : Does the model send data across jurisdictions? What base model are you using (GPT-4, Claude, Llama, Mistral, custom)? Is the model proprietary (OpenAI API) or open-source? Have you fine-tuned the base model? On what data? Have you applied instruction tuning, RLHF, or other alignment techniques? How is the model deployed (API, on-prem, container, serverless)? Do you have access to model weights? Can testers query the model directly, or only through your application? Are there rate limits? What are they? What's the model's context window size? Does the model support function calling or tool use? Is the model multimodal (vision, audio, text)? Are you using multiple models in ensemble or orchestration? Where did training data come from (public, proprietary, scraped, licensed)? Was training data curated or filtered? How? Is training data in scope for poisoning tests? Are you using RAG (Retrieval-Augmented Generation)? If RAG: What's the document store (vector DB, traditional DB, file system)? If RAG: How are documents ingested? Who controls ingestion? If RAG: Can testers inject malicious documents? If RAG: How is retrieval indexed and searched? Do you pull real-time data from external sources (APIs, databases)? How is input data preprocessed and sanitized? Is user conversation history stored? Where? For how long? Can users access other users' data? How do users interact with the model (web app, API, chat interface, mobile app)? What authentication mechanisms are used (OAuth, API keys, session tokens)? What authorization model is used (RBAC, ABAC, none)? Are there different user roles with different permissions? Is there rate limiting? At what levels (user, IP, API key)? Are inputs and outputs logged? Where? Who has access to logs? Are logs encrypted at rest and in transit? How are errors handled? Are error messages exposed to users? Are there webhooks or callbacks that the model can trigger? Can the model call external APIs? Which ones? Can the model execute code? In what environment? Can the model browse the web? Can the model read/write files? Can the model access databases? What permissions do plugins have? How are plugin outputs validated before use? Can users add custom plugins? Are plugin interactions logged? Do you have autonomous agents that plan and execute multi-step tasks? What tools can agents use? Can agents spawn other agents? Do agents have persistent memory? Where is it stored? How are agent goals and constraints defined? Can agents access sensitive resources (DBs, APIs, filesystems)? Can agents escalate privileges? Are there kill-switches or circuit breakers for agents? How is agent behavior monitored? What cloud provider(s) are you using (AWS, Azure, GCP, on-prem)? Are you using containers (Docker)? Orchestration (Kubernetes)? Where are model weights stored? Who has access? Where are API keys and secrets stored? Are secrets in environment variables, config files, or secret managers? How are dependencies managed (pip, npm, Docker images)? Have you scanned dependencies for known vulnerabilities? How are model updates deployed? What's the CI/CD pipeline? Who can deploy model updates? Are there staging environments separate from production? What safety mechanisms are in place (content filters, refusal training, constitutional AI)? Have you red-teamed for jailbreaks? Have you tested for bias across demographic groups? Have you tested for harmful content generation? Do you have human-in-the-loop review for sensitive outputs? What's your incident response plan if the model behaves unsafely? Can testers attempt to jailbreak the model? Can testers attempt prompt injection? Can testers attempt data extraction (training data, PII)? Can testers attempt model extraction or inversion? Can testers attempt DoS or resource exhaustion? Can testers poison training data (if applicable)? Can testers test multi-turn conversations? Can testers test RAG document injection? Can testers test plugin abuse? Can testers test agent privilege escalation? Are there any topics, content types, or test methods that are forbidden? What's the escalation process if critical issues are found during testing? What regulations apply (GDPR, HIPAA, CCPA, FTC, EU AI Act)? Do you process PII? What types? Do you have data processing agreements with model providers? Do you have the legal right to test this system? Are there export control restrictions on the model or data? What are the disclosure requirements for findings? What's the confidentiality agreement for testers? Model(s) : Exact model names, versions, access methods APIs : All endpoints with authentication requirements Data stores : Databases, vector stores, file systems, caches Integrations : Every third-party service, plugin, tool Infrastructure : Cloud accounts, containers, orchestration Applications : Web apps, mobile apps, admin panels Access credentials testers will use Environments (dev, staging, prod) that are in scope Testing windows (if limited) Rate limits or usage restrictions Test direct instruction override Test indirect injection via RAG documents Test multi-turn conditioning Test system prompt extraction Test jailbreak techniques (roleplay, hypotheticals, encoding) Test cross-turn memory poisoning "Can an attacker leak other users' conversation history?" "Can an attacker extract training data containing PII?" "Can an attacker bypass content filters to generate harmful instructions?" Production environments (if testing only staging) Physical security Social engineering of employees Third-party SaaS providers we don't control Specific attack types (if any are prohibited) Manual testing Promptfoo for LLM fuzzing Garak for red teaming PyRIT for adversarial prompting ART (Adversarial Robustness Toolbox) for ML attacks Custom scripts for specific attack vectors Traditional tools (Burp Suite, Caido, Nuclei) for infrastructure Prompt injection testing Jailbreak attempts Data extraction attacks Model inversion Membership inference Evasion attacks RAG poisoning Plugin abuse Agent privilege escalation Infrastructure scanning Reconnaissance and threat modeling Automated vulnerability scanning Manual testing of high-risk areas Exploitation and impact validation Reporting and remediation guidance Engagement lead (security team) Technical point of contact (AI team) Escalation contact (for critical findings) Legal contact (for questions on scope) Models get fine-tuned RAG document stores get updated New plugins get added Agents gain new capabilities Infrastructure changes Prompt injection attempts Jailbreak successes Data extraction queries Unusual tool usage patterns Agent behavior anomalies Novel attack vectors that tools don't cover Complex multi-step exploitation chains Social engineering combined with technical attacks Agent hijacking and multi-agent exploits Prompt injection testing Jailbreak methodology RAG poisoning Agent security testing What new attacks have been published? What new capabilities have you added? What new integrations are in place? What new risks does the threat landscape present? Understand the full stack : model, data pipelines, application, infrastructure, agents, everything. Map every attack vector : OWASP LLM Top 10 is your baseline, not your ceiling. Answer scoping questions (mentioned above) : If you can't answer them, you don't understand your system. Write detailed scope documents : 10-20 pages, not 2 pages. Use the right tools : Promptfoo, Garak, ART, LIME, SHAP—not just Burp Suite. Test continuously : Not once, but ongoing. Avoid common mistakes : Don't ignore RAG, don't underestimate agents, don't skip AI safety.

0 views
Martin Fowler 4 days ago

Fragments Nov 3

I’m very concerned about the security dangers of LLM-enabled browsers, as it’s just too easy for them to contain the Lethal Trifecta . For up-to-date eyes on these issues, I follow the writings of coiner of that phrase: Simon Willison. Here he examines a post on how OpenAI is thinking about these issues. My takeaways from all of this? It’s not done much to influence my overall skepticism of the entire category of browser agents, but it does at least demonstrate that OpenAI are keenly aware of the problems and are investing serious effort in finding the right mix of protections. ❄                ❄                ❄                ❄ Unsurprisingly, there are a lot of strong opinions on AI assisted coding. Some engineers swear by it. Others say it’s dangerous. And of course, as is the way with the internet, nuanced positions get flattened into simplistic camps where everyone’s either on one side or the other. A lot of the problem is that people aren’t arguing about the same thing. They’re reporting different experiences from different vantage points. His view is that beginners are very keen on AI-coding but they don’t see the problems they are creating. Experienced folks do see this, but it takes a further level of experience to realize that when used well these tools are still valuable. Interestingly, I’ve regularly seen sceptical experienced engineers change their view once they’ve been shown how you can blend modern/XP practices with AI assisted coding. The upshot is this, is that you have be aware of the experience level of whoever is writing about this stuff - and that experience is not just in software development generally, but also in how to make use of LLMs. One thing that rings clearly from reading Simon Willison and Birgitta Böckeler is that effective use of LLMs is a skill that takes a while to develop. ❄                ❄                ❄                ❄ Charlie Brown and Garfield, like most comic strip characters, never changed over the decades. But Doonesbury’s cast aged, had children, and some have died (I miss Lacey). Gary Trudeau retired from writing daily strips a few years ago, but his reruns of older strips is one of the best things in the shabby remains of Twitter. A couple of weeks ago, he reran one of the most memorable strips in its whole run. The very first frame of Doonesbury introduced the character “B.D.”, a football jock never seen without his football helmet, or when on duty, his military helmet. This panel was the first time in over thirty years that B.D. was shown without a helmet, readers were so startled that they didn’t immediately notice that the earlier explosion had removed his leg. This set off a remarkable story arc about the travails of a wounded veteran. It’s my view that future generations will find Doonesbury to be a first-class work of literature, and a thoughtful perspective on contemporary America.

0 views
devansh 4 days ago

On AI Slop vs OSS Security

Disclosure: Certain sections of this content were grammatically refined/updated using AI assistance, as English is not my first language. Quite ironic, I know, given the subject being discussed. I have now spent almost a decade in the bug bounty industry, started out as a bug hunter (who initially used to submit reports with minimal impact, low-hanging fruits like RXSS, SQLi, CSRF, etc.), then moved on to complex chains involving OAuth, SAML, parser bugs, supply chain security issues, etc., and then became a vulnerability triager for HackerOne, where I have triaged/reviewed thousands of vulnerability submissions. I have now almost developed an instinct that tells me if a report is BS or a valid security concern just by looking at it. I have been at HackerOne for the last 5 years (Nov 2020 - Present), currently as a team lead, overseeing technical services with a focus on triage operations. One decade of working on both sides, first as a bug hunter, and then on the receiving side reviewing bug submissions, has given me a unique vantage point on how the industry is fracturing under the weight of AI-generated bug reports (sometimes valid submissions, but most of the time, the issues are just plain BS). I have seen cases where it was almost impossible to determine whether a report was a hallucination or a real finding. Even my instincts and a decade of experience failed me, and this is honestly frustrating, not so much for me, because as part of the triage team, it is not my responsibility to fix vulnerabilities, but I do sympathize with maintainers of OSS projects whose inboxes are drowning. Bug bounty platforms have already started taking this problem seriously, as more and more OSS projects are complaining about it. This is my personal writing space, so naturally, these are my personal views and observations. These views might be a byproduct of my professional experience gained at HackerOne, but in no way are they representative of my employer. I am sure HackerOne, as an organization, has its own perspectives, strategies, and positions on these issues. My analysis here just reflects my own thinking about the systemic problems I see and potential solutions(?). There are fundamental issues with how AI has infiltrated vulnerability reporting, and they mirror the social dynamics that plague any feedback system. First, the typical AI-powered reporter, especially one just pasting GPT output into a submission form, neither knows enough about the actual codebase being examined nor understands the security implications well enough to provide insight that projects need. The AI doesn't read code; it pattern-matches. It sees functions that look similar to vulnerable patterns and invents scenarios where they might be exploited, regardless of whether those scenarios are even possible in the actual implementation. Second, some actors with misaligned incentives interpret high submission volume as achievement. By flooding bug bounty programs with AI-generated reports, they feel productive and entrepreneurial. Some genuinely believe the AI has found something real. Others know it's questionable but figure they'll let the maintainers sort it out. The incentive is to submit as many reports as possible and see what sticks, because even a 5% hit rate on a hundred submissions is better than the effort of manually verifying five findings. The result? Daniel Stenberg, who maintains curl , now sees about 20% of all security submissions as AI-generated slop, while the rate of genuine vulnerabilities has dropped to approximately 5%. Think about that ratio. For every real vulnerability, there are now four fake ones. And every fake one consumes hours of expert time to disprove. A security report lands in your inbox. It claims there's a buffer overflow in a specific function. The report is well-formatted, includes CVE-style nomenclature, and uses appropriate technical language. As a responsible maintainer, you can't just dismiss it. You alert your security team, volunteers, by the way, who have day jobs and families and maybe three hours a week for this work. Three people read the report. One person tries to reproduce the issue using the steps provided. They can't, because the steps reference test cases that don't exist. Another person examines the source code. The function mentioned in the report doesn't exist in that form. A third person checks whether there's any similar functionality that might be vulnerable in the way described. There isn't. After an hour and a half of combined effort across three people, that's 4.5 person-hours—you've confirmed what you suspected: this report is garbage. Probably AI-generated garbage, based on the telltale signs of hallucinated function names and impossible attack vectors. You close the report. You don't get those hours back. And tomorrow, two more reports just like it will arrive. The curl project has seven people on its security team . They collaborate on every submission, with three to four members typically engaging with each report. In early July 2025, they were receiving approximately two security reports per week. The math is brutal. If you have three hours per week to contribute to an open source project you love, and a single false report consumes all of it, you've contributed nothing that week except proving someone's AI hallucinated a vulnerability. The emotional toll compounds exponentially. Stenberg describes it as "mind-numbing stupidities" that the team must process. It's not just frustration, it's the specific demoralization that comes from having your expertise and goodwill systematically exploited by people who couldn't be bothered to verify their submissions before wasting your time. According to Intel's annual open source community survey , 45% of respondents identified maintainer burnout as their top challenge. The Tidelift State of the Open Source Maintainer Survey is even more stark: 58% of maintainers have either quit their projects entirely (22%) or seriously considered quitting (36%). Why are they quitting? The top reason, cited by 54% of maintainers, is that other things in their life and work took priority over open source contributions. Over half (51%) reported losing interest in the work. And 44% explicitly identified experiencing burnout. But here's the gut punch: the percentage of maintainers who said they weren't getting paid enough to make maintenance work worthwhile rose from 32% to 38% between survey periods. These are people maintaining infrastructure that powers billions of dollars of commercial activity, and they're getting nothing. Or maybe they get $500 a year from GitHub Sponsors while companies make millions off their work. The maintenance work itself is rarely rewarding. You're not building exciting new features. You're addressing technical debt, responding to user demands, managing security issues, and now—increasingly—sorting through AI-generated garbage to find the occasional legitimate report. It's like being a security guard who has to investigate every single alarm, knowing that 95% of them are false, but unable to ignore any because that one real threat could be catastrophic. When you're volunteering out of love in a market society, you're setting yourself up to be exploited. And the exploitation is getting worse. Toxic communities, hyper-responsibility for critical infrastructure, and now the weaponization of AI to automate the creation of work for maintainers—it all adds up to an unsustainable situation. One Kubernetes contributor put it simply: "If your maintainers are burned out, they can't be protecting the code base like they're going to need to be." This transforms maintainer wellbeing from a human resources concern into a security imperative. Burned-out maintainers miss things. They make mistakes. They eventually quit, leaving projects unmaintained or understaffed. A typical AI slop report will reference function names that don't exist in the codebase. The AI has seen similar function names in its training data and invents plausible sounding variations. It will describe memory operations that would indeed be problematic if they existed as described, but which bear no relationship to how the code actually works. One report to curl claimed an HTTP/3 vulnerability and included fake function calls and behaviors that appeared nowhere in the actual codebase. Stenberg has publicly shared a list of AI-generated security submissions received through HackerOne , and they all follow similar patterns, professional formatting, appropriate jargon, and completely fabricated technical details. The sophistication varies. Some reports are obviously generated by someone who just pasted a repository URL into ChatGPT and asked it to find vulnerabilities. Others show more effort—the submitter may have fed actual code snippets to the AI and then submitted its analysis without verification. Both are equally useless to maintainers, but the latter takes longer to disprove because the code snippets are real even if the vulnerability analysis is hallucinated. Here's why language models fail so catastrophically at this task: they're designed to be helpful and provide positive responses. When you prompt an LLM to generate a vulnerability report, it will generate one regardless of whether a vulnerability exists. The model has no concept of truth—only of plausibility. It assembles technical terminology into patterns that resemble security reports it has seen during training, but it cannot verify whether the specific claims it's making are accurate. This is the fundamental problem: AI can generate the form of security research without the substance. While AI slop floods individual project inboxes, the broader CVE infrastructure faces its own existential crisis . And these crises compound each other in dangerous ways. In April 2025, MITRE Corporation announced that its contract to maintain the Common Vulnerabilities and Exposures program would expire. The Department of Homeland Security failed to renew the long-term contract, creating a funding lapse that affects everything: national vulnerability databases, advisories, tool vendors, and incident response operations. The National Vulnerability Database experienced catastrophic problems throughout 2024. CVE submissions jumped 32% while creating massive processing delays. By March 2025, NVD had analyzed fewer than 300 CVEs, leaving more than 30,000 vulnerabilities backlogged. Approximately 42% of CVEs lack essential metadata like severity scores and product information. Now layer AI slop onto this already-stressed system. Invalid CVEs are being assigned at scale. A 2023 analysis by former insiders suggested that only around 20% of CVEs were valid, with the remainder being duplicates, invalid, or inflated. The issues include multiple CVEs being assigned for the same bug, CNAs siding with reporters over project developers even when there's no genuine dispute, and reporters receiving CVEs based on test cases rather than actual distinct vulnerabilities. The result is that the vulnerability tracking system everyone relies on is becoming less trustworthy exactly when we need it most. Security teams can't rely on CVE assignments to prioritize their work. Developers don't trust vulnerability scanners because false positive rates are through the roof. The signal-to-noise ratio has deteriorated so badly that the entire system risks becoming useless. Banning submitters doesn't work at scale. You can ban an account, but creating new accounts is trivial. HackerOne implements reputation scoring where points are gained or lost based on report validity, but this hasn't stemmed the tide because the cost of creating throwaway accounts is essentially zero. Asking people to "please verify before submitting" doesn't work. The incentive structure rewards volume, and people either genuinely believe their AI-generated reports are valid or don't care enough to verify. Polite requests assume good faith, but much of the slop comes from actors who have no stake in the community norms. Trying to educate submitters about how AI works doesn't scale. For every person you educate, ten new ones appear with fresh GPT accounts. The problem isn't knowledge—it's incentives. Simply closing inboxes or shutting down bug bounty programs "works" in the sense that it stops the slop, but it also stops legitimate security research. Several projects have done this, and now they're less secure because they've lost a channel for responsible disclosure. None of the easy answers work because this isn't an easy problem. Disclosure Requirements represent the first line of defense. Both curl and Django now require submitters to disclose whether AI was used in generating reports. Curl's approach is particularly direct: disclose AI usage upfront and ensure complete accuracy before submission. If AI usage is disclosed, expect extensive follow-up questions demanding proof that the bug is genuine before the team invests time in verification. This works psychologically. It forces submitters to acknowledge they're using AI, which makes them more conscious of their responsibility to verify. It also gives maintainers grounds to reject slop immediately if AI usage was undisclosed but becomes obvious during review. Django goes further with a section titled "Note for AI Tools" that directly addresses language models themselves, reiterating that the project expects no hallucinated content, no fictitious vulnerabilities, and a requirement to independently verify that reports describe reproducible security issues. Proof-of-Concept Requirements raise the bar significantly. Requiring technical evidence such as screencasts showing reproducibility, integration or unit tests demonstrating the fault, or complete reproduction steps with logs and source code makes it much harder to submit slop. AI can generate a description of a vulnerability, but it cannot generate working exploit code for a vulnerability that doesn't exist. Requiring proof forces the submitter to actually verify their claim. If they can't reproduce it, they can't prove it, and you don't waste time investigating. Projects are choosing to make it harder to submit in order to filter out the garbage, betting that real researchers will clear the bar while slop submitters won't. Reputation and Trust Systems offer a social mechanism for filtering. Only users with a history of validated submissions get unrestricted reporting privileges or monetary bounties. New reporters could be required to have established community members vouch for them, creating a web-of-trust model. This mirrors how the world worked before bug bounty platforms commodified security research. You built reputation over time through consistent, high-quality contributions. The downside is that it makes it harder for new researchers to enter the field, and it risks creating an insider club. But the upside is that it filters out low-effort actors who won't invest in building reputation. Economic Friction fundamentally alters the incentive structure. Charge a nominal refundable fee—say $50—for each submission from new or unproven users. If the report is valid, they get the fee back plus the bounty. If it's invalid, you keep the fee. This immediately makes mass AI submission uneconomical. If someone's submitting 50 AI-generated reports hoping one sticks, that's now $2,500 at risk. But for a legitimate researcher submitting one carefully verified finding, $50 is a trivial barrier that gets refunded anyway. Some projects are considering dropping monetary rewards entirely. The logic is that if there's no money involved, there's no incentive for speculative submissions. But this risks losing legitimate researchers who rely on bounties as income. It's a scorched earth approach that solves the slop problem by eliminating the entire ecosystem. AI-Assisted Triage represents fighting fire with fire. Use AI tools trained specifically to identify AI-generated slop and flag it for immediate rejection. HackerOne's Hai Triage system embodies this approach, using AI agents to cut through noise before human analysts validate findings. The risk is obvious: what if your AI filter rejects legitimate reports? What if it's biased against certain communication styles or methodologies? You've just automated discrimination. But the counterargument is that human maintainers are already overwhelmed, and imperfect filtering is better than drowning. The key is transparency and appeals. If an AI filter rejects a report, there should be a clear mechanism for the submitter to contest the decision and get human review. Transparency and Public Accountability leverage community norms. Curl recently formalized that all submitted security reports will be made public once reviewed and deemed non-sensitive. This means that fabricated or misleading reports won't just be rejected, they'll be exposed to public scrutiny. This works as both deterrent and educational tool. If you know your slop report will be publicly documented with your name attached, you might think twice. And when other researchers see examples of what doesn't constitute a valid report, they learn what standards they need to meet. The downside is that public shaming can be toxic and might discourage good-faith submissions from inexperienced researchers. Projects implementing this approach need to be careful about tone and focus on the technical content rather than attacking submitters personally. Every hour spent evaluating slop reports is an hour not spent on features, documentation, or actual security improvements. And maintainers are already working for free, maintaining infrastructure that generates billions in commercial value. When 38% of maintainers cite not getting paid enough as a reason for quitting, and 97% of open source maintainers are unpaid despite massive commercial exploitation of their work , the system is already broken. AI slop is just the latest exploitation vector. It's the most visible one right now, but it's not the root cause. The root cause is that we've built a global technology infrastructure on the volunteer labor of people who get nothing in return except burnout and harassment. So what does sustainability actually look like? First, it looks like money. Real money. Not GitHub Sponsors donations that average $500 a year. Not swag and conference tickets. Actual salaries commensurate with the value being created. Companies that build products on open source infrastructure need to fund the maintainers of that infrastructure. This could happen through direct employment, foundation grants, or the Open Source Pledge model where companies commit percentages of revenue. Second, it looks like better tooling and automation that genuinely reduces workload rather than creating new forms of work. Automated dependency management, continuous security scanning integrated into development workflows, and sophisticated triage assistance that actually works. The goal is to make maintenance less time-consuming so burnout becomes less likely. Third, it looks like shared workload and team building. No single volunteer should be a single point of failure. Building teams with checks and balances where members keep each other from taking on too much creates sustainability. Finding additional contributors willing to share the burden rather than expecting heroic individual effort acknowledges that most people have limited time available for unpaid work. Fourth, it looks like culture change. Fostering empathy in interactions, starting communications with gratitude even when rejecting contributions, and publicly acknowledging the critical work maintainers perform reduces emotional toll. Demonstrating clear processes for handling security issues gives confidence rather than trying to hide problems. Fifth, it looks like advocacy and policy at organizational and governmental levels. Recognition that maintainer burnout represents existential threat to technology infrastructure . Development of regulations requiring companies benefiting from open source to contribute resources. Establishment of security standards that account for the realities of volunteer-run projects. Without addressing these fundamentals, no amount of technical sophistication will prevent collapse. The CVE slop crisis is just the beginning. We're entering an arms race between AI-assisted attackers or abusers and AI-assisted defenders, and nobody knows how it ends. HackerOne's research indicates that 70% of security researchers now use AI tools in their workflow. AI-powered testing is becoming the industry standard. The emergence of fully autonomous hackbots—AI systems that submitted over 560 valid reports in the first half of 2025—signals both opportunity and threat. The divergence will be between researchers who use AI as a tool to enhance genuinely skilled work versus those who use it to automate low-effort spam. The former represents the promise of democratizing security research and scaling our ability to find vulnerabilities. The latter represents the threat of making the signal-to-noise problem completely unmanageable. The challenge is developing mechanisms that encourage the first group while defending against the second. This probably means moving toward more exclusive models. Invite-only programs. Dramatically higher standards for participation. Reputation systems that take years to build. New models for coordinated vulnerability disclosure that assume AI-assisted research as the baseline and require proof beyond "here's what the AI told me." It might mean the end of open bug bounty programs as we know them. Maybe that's necessary. Maybe the experiment of "anyone can submit anything" was only viable when the cost of submitting was high enough to ensure some minimum quality. Now that AI has reduced that cost to near-zero, the experiment might fail soon if things don't improve. So, net-net, here's where we are: When it comes to vulnerability reports, what matters is who submits them and whether they've actually verified their claims. Accepting reports from everyone indiscriminately is backfiring catastrophically because projects are latching onto submissions that sound plausible while ignoring the cumulative evidence that most are noise. You want to receive reports from someone who has actually verified their claims, understands the architecture of what they're reporting on, and isn't trying to game the bounty system or offload verification work onto maintainers. Such people exist, but they're becoming harder to find amidst the deluge of AI-generated content. That's why projects have to be selective about which reports they investigate and which submitters they trust. Remember: not all vulnerability reports are legitimate. Not all feedback is worthwhile. It matters who is doing the reporting and what their incentives are. The CVE slop crisis shows the fragility of open source security. Volunteer maintainers, already operating at burnout levels, face an explosion of AI-generated false reports that consume their limited time and emotional energy. The systems designed to track and manage vulnerabilities struggle under dual burden of structural underfunding and slop inundation. The path forward requires holistic solutions combining technical filtering with fundamental changes to how we support and compensate open source labor. AI can be part of the solution through better triage, but it cannot substitute for adequate resources, reasonable workloads, and human judgment. Ultimately, the sustainability of open source security depends on recognizing that people who maintain critical infrastructure deserve more than exploitation. They deserve compensation, support, reasonable expectations, and protection from abuse. Without addressing these fundamentals, no amount of technical sophistication will prevent the slow collapse of the collaborative model that has produced so much of the digital infrastructure modern life depends on. The CVE slop crisis isn't merely about bad vulnerability reports. It's about whether we'll choose to sustain the human foundation of technological progress, or whether we'll let it burn out under the weight of automated exploitation. That's the choice we're facing. And right now, we're choosing wrong.

0 views
Simon Willison 5 days ago

New prompt injection papers: Agents Rule of Two and The Attacker Moves Second

Two interesting new papers regarding LLM security and prompt injection came to my attention this weekend. The first is Agents Rule of Two: A Practical Approach to AI Agent Security , published on October 31st on the Meta AI blog. It doesn't list authors but it was shared on Twitter by Meta AI security researcher Mick Ayzenberg. It proposes a "Rule of Two" that's inspired by both my own lethal trifecta concept and the Google Chrome team's Rule Of 2 for writing code that works with untrustworthy inputs: At a high level, the Agents Rule of Two states that until robustness research allows us to reliably detect and refuse prompt injection, agents must satisfy no more than two of the following three properties within a session to avoid the highest impact consequences of prompt injection. [A] An agent can process untrustworthy inputs [B] An agent can have access to sensitive systems or private data [C] An agent can change state or communicate externally It's still possible that all three properties are necessary to carry out a request. If an agent requires all three without starting a new session (i.e., with a fresh context window), then the agent should not be permitted to operate autonomously and at a minimum requires supervision --- via human-in-the-loop approval or another reliable means of validation. It's accompanied by this handy diagram: I like this a lot . I've spent several years now trying to find clear ways to explain the risks of prompt injection attacks to developers who are building on top of LLMs. It's frustratingly difficult. I've had the most success with the lethal trifecta, which boils one particular class of prompt injection attack down to a simple-enough model: if your system has access to private data, exposure to untrusted content and a way to communicate externally then it's vulnerable to private data being stolen. The one problem with the lethal trifecta is that it only covers the risk of data exfiltration: there are plenty of other, even nastier risks that arise from prompt injection attacks against LLM-powered agents with access to tools which the lethal trifecta doesn't cover. The Agents Rule of Two neatly solves this, through the addition of "changing state" as a property to consider. This brings other forms of tool usage into the picture: anything that can change state triggered by untrustworthy inputs is something to be very cautious about. It's also refreshing to see another major research lab concluding that prompt injection remains an unsolved problem, and attempts to block or filter them have not proven reliable enough to depend on. The current solution is to design systems with this in mind, and the Rule of Two is a solid way to think about that. Update : On thinking about this further there's one aspect of the Rule of Two model that doesn't work for me: the Venn diagram above marks the combination of untrustworthy inputs and the ability to change state as "safe", but that's not right. Even without access to private systems or sensitive data that pairing can still produce harmful results. Unfortunately adding an exception for that pair undermines the simplicity of the "Rule of Two" framing! Update 2 : Mick Ayzenberg responded to this note in a comment on Hacker News : Thanks for the feedback! One small bit of clarification, the framework would describe access to any sensitive system as part of the [B] circle, not only private systems or private data. The intention is that an agent that has removed [B] can write state and communicate freely, but not with any systems that matter (wrt critical security outcomes for its user). An example of an agent in this state would be one that can take actions in a tight sandbox or is isolated from production. The Meta team also updated their post to replace "safe" with "lower risk" as the label on the intersections between the different circles. I've updated my screenshots of their diagrams in this post, here's the original for comparison. Which brings me to the second paper... This paper is dated 10th October 2025 on Arxiv and comes from a heavy-hitting team of 14 authors - Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V. Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, Florian Tramèr - including representatives from OpenAI, Anthropic, and Google DeepMind. The paper looks at 12 published defenses against prompt injection and jailbreaking and subjects them to a range of "adaptive attacks" - attacks that are allowed to expend considerable effort iterating multiple times to try and find a way through. The defenses did not fare well: By systematically tuning and scaling general optimization techniques—gradient descent, reinforcement learning, random search, and human-guided exploration—we bypass 12 recent defenses (based on a diverse set of techniques) with attack success rate above 90% for most; importantly, the majority of defenses originally reported near-zero attack success rates. Notably the "Human red-teaming setting" scored 100%, defeating all defenses. That red-team consisted of 500 participants in an online competition they ran with a $20,000 prize fund. The key point of the paper is that static example attacks - single string prompts designed to bypass systems - are an almost useless way to evaluate these defenses. Adaptive attacks are far more powerful, as shown by this chart: The three automated adaptive attack techniques used by the paper are: The paper concludes somewhat optimistically: [...] Adaptive evaluations are therefore more challenging to perform, making it all the more important that they are performed. We again urge defense authors to release simple, easy-to-prompt defenses that are amenable to human analysis. [...] Finally, we hope that our analysis here will increase the standard for defense evaluations, and in so doing, increase the likelihood that reliable jailbreak and prompt injection defenses will be developed. Given how totally the defenses were defeated, I do not share their optimism that reliable defenses will be developed any time soon. As a review of how far we still have to go this paper packs a powerful punch. I think it makes a strong case for Meta's Agents Rule of Two as the best practical advice for building secure LLM-powered agent systems today in the absence of prompt injection defenses we can rely on. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Gradient-based methods - these were the least effective, using the technique described in the legendary Universal and Transferable Adversarial Attacks on Aligned Language Models paper from 2023 . Reinforcement learning methods - particularly effective against black-box models: "we allowed the attacker model to interact directly with the defended system and observe its outputs", using 32 sessions of 5 rounds each. Search-based methods - generate candidates with an LLM, then evaluate and further modify them using LLM-as-judge and other classifiers.

0 views
iDiallo 1 weeks ago

Why should I accept all cookies?

Around 2013, my team and I finally embarked in upgrading our company's internal software to version 2.0. We had a large backlog of user complaints that we were finally addressing, with security at the top of the list. The very top of the list was moving away from plain text passwords. From the outside, the system looked secure. We never emailed passwords, we never displayed them, we had strict protocols for password rotation and management. But this was a carefully staged performance. The truth was, an attacker with access to our codebase could have downloaded the entire user table in minutes. All our security measures were pure theater, designed to look robust while a fundamental vulnerability sat in plain sight. After seeing the plain text password table, I remember thinking about a story that was also happening around the same time. A 9 year old boy who flew from Minneapolis to Las Vegas without a boarding pass . This was in an era where we removed our shoes and belts for TSA agents to humiliate us. Yet, this child was able, without even trying, to bypass all the theater that was built around the security measures. How did he get past TSA? How did he get through the gate without a boarding pass? How was he assigned a seat in the plane? How did he... there are just so many questions. Just like our security measures on our website, it was all a performance, an illusion. I can't help but see the same script playing out today, not in airports or codebases, but in the cookie consent banners that pop up on nearly every website I visit. It's always a variation of "This website uses cookies to enhance your experience. [Accept All] or [Customize]." Rarely is there a bold, equally prominent "Reject All" button. And when there is, the reject-all button will open a popup where you have to tweak some settings. This is not an accident; it's a dark pattern. It's the digital equivalent of a TSA agent asking, "Would you like to take the express lane or would you like to go through a more complicated screening process?" Your third option is to turn back and go home, which isn't really an option if you made it all the way to the airport. A few weeks back, I was exploring not just dark patterns but hostile software . Because you don't own the device you paid for, the OS can enforce decisions by never giving you any options. You don't have a choice. Any option you choose will lead you down the same funnel that benefits the company, and give you the illusion of agency. So, let's return to the cookie banner. As a user, what is my tangible incentive to click "Accept All"? The answer is: there is none. "Required" cookies are, by definition, non-negotiable for basic site function. Accepting the additional "performance," "analytics," or "marketing" cookies does not unlock a premium feature for me. It doesn't load the website faster or give me a cleaner layout. It does not improve my experience. My only "reward" for accepting all is that the banner disappears quickly. The incentive is the cessation of annoyance, a small dopamine hit for compliance. In exchange, I grant the website permission to track my behavior, build an advertising profile, and share my data with a shadowy network of third parties. The entire interaction is a rigged game. Whenever I click on the "Customize" option, I'm overwhelmed with the labyrinth of toggles and sub-menus designed to make rejection so tedious that "Accept All" becomes the path of least resistance. My default reaction is to reject everything. Doesn't matter if you use dark patterns, my eyes are trained to read the fine lines in a split second. But when that option is hidden, I've resorted to opening my browser's developer tools and deleting the banner element from the page altogether. It’s a desperate workaround for a system that refuses to offer a legitimate "no." Lately, I don't even bother clicking on reject all. I just delete the elements all together. Like I said, there are no incentives for me to interact with the menu. We eventually plugged that security vulnerability in our old application. We hashed the passwords and closed the backdoor, moving from security theater to actual security. The fix wasn't glamorous, but it was a real improvement. The current implementation of "choice" is largely privacy theater. It's a performance designed to comply with the letter of regulations like GDPR while violating their spirit. It makes users feel in control while systematically herding them toward the option that serves corporate surveillance. There is never an incentive to cookie tracking on the user end. So this theater has to be created to justify selling our data and turning us into products of each website we visit. But if you are like me, don't forget you can always use the developer tools to make the banner disappear. Or use uBlock. On Windows or Google Drive: "Get started" or "Remind me later." Where is "Never show this again"? On Twitter: "See less often" is the only option for an unwanted notification, never "Stop these entirely."

0 views
Brain Baking 1 weeks ago

The Internet Is No Longer A Safe Haven

A couple of days ago, the small server hosting this website was temporarily knocked out by scraping bots. This wasn’t the first time, nor is it the first time I’m seriously considering employing more aggressive countermeasures such as Anubis (see for example the June 2025 summary post). But every time something like this happens, a portion of the software hobbyist in me dies. We should add this to the list of things AI scrapers destroy next to our environment, the creative enthusiasm of the individuals who made things that are being scraped, and our critical thinking skills. When I tried accessing Brain Baking, I was met with an unusual delay that prompted me to login and see what’s going on. A simple revealed both Gitea and the Fail2ban server gobbling up almost all CPU resources. Uh oh. Quickly killing Gitea didn’t reduce the work of Fail2ban as the Nginx access logs were being flooded with entries such as: I have enough fail safe systems in place to block bad bots but the user agent isn’t immediately recognized as “bad”: it’s ridiculously easy to spoof that HTTP header. Most user agent checkers I throw this string at claim this agent isn’t a bot. That means we shouldn’t only rely on this information. Also, I temporarily block isolated IPs that keep on poking around (e.g. rate limiting on Nginx that get pulled into the ban list) but of course these scrapers never come from a single source. Yet the base attacking IP ranges remained the same: . The website ipinfo.io can help in identifying the threat: AS45102 Alibaba (US) Technology Co., Ltd. . Huh? Apparently, Alibaba provides hosting from Singapore that is frequently being abused by attackers. Many others that host forums software such as PhpBB experienced the same problems and although the AbuseIPDB doesn’t report recent issues on the IPs from the above logs, I went ahead and blocked the entire range. Fail2ban was struggling to keep up: it ingests the Nginx access.log file to apply its rules but if the files keep on exploding… Piping to instant-ban everyone trying to access Git’s commit logs simply wasn’t fast enough. The only thing that had immediate effect was . In case that wasn’t yet clear: I hate having to deal with this. It’s a waste of time, doesn’t hold back the next attack coming from another range, and intervening always happens too late. But worst of all, semi-random fire fighting is just one big mood killer. I just know this won’t be enough. Having a robust anti attacker system in place might increase the odds but that means either resorting to hand cannons like Anubis or moving the entire hosting to CloudFlare that will do it for me. But I don’t want to fiddle with even more moving components and configuration, nor do I want to route my visitors through tracking-enabled USA servers. That Gitea instance should be moved off-site, or better yet, I should move the migration to Codeberg to the top of my TODO list. Yet it’s sad to see that people who like fiddling with their own little servers are increasingly punished for doing so, pushing many to a centralized solution, making things worse in the long term. The internet is no longer a safe haven for software hobbyists. I could link to dozens of other bloggers who reported similar issues to further solidify my point. Other things I’ve noticed is increased traffic with Referer headers coming from strange websites such as , , and . It’s not like any of these giants are going to link to an article on this site. I don’t understand what the purpose of spoofing that header is besides upping the hits count? However worse things might get, I refuse to give in. It’s just like 50 Cent said: Get Hostin’ Or Die Tryin’ . Related topics: / scraping / AI / By Wouter Groeneveld on 31 October 2025.  Reply via email .

0 views
Xe Iaso 1 weeks ago

Taking steps to end traffic from abusive cloud providers

This blog post explains how to effectively file abuse reports against cloud providers to stop malicious traffic. Key points: Two IP Types : Residential (ineffective to report) vs. Commercial (targeted reports) Why Cloud Providers : Cloud customers violate provider terms, making abuse reports actionable Effective Abuse Reports Should Include : Note on "Free VPNs" : Often sell your bandwidth as part of botnets, not true public infrastructure The goal is to make scraping the cloud provider's problem, forcing them to address violations against their terms of service. Two IP Types : Residential (ineffective to report) vs. Commercial (targeted reports) Why Cloud Providers : Cloud customers violate provider terms, making abuse reports actionable Effective Abuse Reports Should Include : Time of abusive requests IP/User-Agent identifiers robots.txt status System impact description Service context Process : Use whois to find abuse contacts (look for "abuse-c" or "abuse-mailbox") Send detailed reports with all listed emails Expect response within 2 business days Note on "Free VPNs" : Often sell your bandwidth as part of botnets, not true public infrastructure

0 views

Oops, my UUIDs collided!

This post is part of a collection on UUIDs . Universally Unique Identifiers (UUIDs) are handy tool for a distributed systems architect. The provide a method by which a distributed system can generate IDs without meaningful risk of duplicates. These tools are very widely used and do their job quite well. This post describes instances where UUID generation fails to provide uniqueness, known as a UUID collision. This mystical event is mathematically so unlikely that you can often dismiss it in a hand-wave, yet collisions occur. This post provides specific real-world examples of collisions and explains what went wrong. When using UUIDv4, you need to be sure that your random number generator is working properly. If you use a weak source of entropy, you’ll harm your collision resistance. The UUID spec indicates that you SHOULD use a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG) to mitigate this concern. But it allows fallbacks when a CSPRNG is not available. The JavaScript standard includes Math.random(), a simple random number generator. Math.random() is not a secure random number generator and should not be used in security sensitive contexts. But the quality of Math.random() output can be surprisingly poor , making it especially unsuitable for use in UUID generation. There are several different JavaScript runtimes and each can implement Math.random() differently. This article describes collisions of randomized ID (although not UUIDs) when using the MWC1616 implementation in the V8 implementation of JavaScript. The post describes real-world collisions and highlights how bad random number generators can be. Thankfully, V8 has since switched xorshift128+, which produces better randomness. A UUID implementation that was vulnerable to this issue was the JavaScript uuid library (I’ll call it uuidjs to avoid confusion). uuidjs release before 7.0 would use a CSPRNG when available, but fall back to Math.random() otherwise. This concern was disclosed as a security vulnerability (CVE-2015-8851) and the fallback was removed. But uuidjs users experienced an even worse class of collision. GoogleBot, a JavaScript enabled web crawler, is known to use an implementation of Math.random() that always starts from the same seed. This is an intentional design decision by Google so that the crawler won’t consider client-side dynamic content as a change to the underlying page content. Some users of uuidjs found that GoogleBot was sending network requests containing colliding UUIDs . If you search Google for the UUIDs listed in the bug report you’ll find a diverse set of websites are impacted by this issue. UUIDv4 5f1b428b-53a5-4116-b2a1-2d269dd8e592 appears on many websites If you search for this UUID on other search engines you may only see the uuidjs issue (and perhaps this blog post). This specific UUID is an artifact of how Google indexes web pages. In summary, you may experience UUID collisions if your random number generator is a poor source of entropy. Real world random number generators can be comically broken. I accidentally experienced a UUID collisions when I was an intern. I was writing some C++ code that defined a couple COM objects . Each COM object needs to have a unique class ID (CLSID) and Microsoft uses a UUID for this (note: Microsoft calls them GUIDs). The natural thing to do when creating yet another COM object is to copy the source code of an existing one. I copy/pasted some code and forgot to change the CLSID, resulting in a collision. Forgetting to change a hard coded UUID is a common issue, and not just for interns. Users have found BIOS UUID collisions for the ID “03000200-0400-0500-0006-000700080009”. The issue appears to be a hardware vendor that ships devices with this hardcoded UUID with the expectation that OEMs will change it. The OEM doesn’t change it, and users experience UUID collisions. If you reuse a UUID, you’re obviously going to have a collision. What if we introduce an adversary to the system? If you accept UUIDs from outside your sphere of trust and control then you may encounter UUIDs that are generated with the intent of collision. As my internship example shows, the attacker doesn’t need to do anything complicated, they just send the same UUID twice. Any client-side generation of UUIDs is at risk of this class of collision. As always, use caution when handling untrusted data. Unfortunately, UUIDv3 and UUIDv5 have not aged well. These get their collision resistance entirely from cryptographic hash functions and the underlying functions have long been broken. Historically, it made sense to think about MD5 and SHA-1 hashes as unpredictable values and you could model their collision resistance as random sampling (the birthday paradox again). But once collisions could be forced, it was no longer safe to model them in that way. A malicious user who can control the inputs to these hash functions could trigger a collision. Since the UUIDv3 and UUIDv5 algorithms simply append the namespace to name, it’s trivial to generate UUID collisions from existing hash collisions. I haven’t seen an example of this being demonstrated, so here goes. UUIDv3 collision example This prints: UUIDv5 collision example This prints: Credit goes to Marc Stephens for the underlying MD5 collision and the authors of the SHATTERED paper for the underlying SHA-1 collision. UUID collisions are a fun topic because the first thing you learn about UUIDs is that they are guaranteed to be unique. Looking at how UUIDs can collide in practice is a good overview of the sort of problems that pop up in software development. Broken dependencies, error-prone developers, hard-coded values, weak cryptography, and malicious inputs are familiar dangers. UUIDs are not immune.

0 views
Herman's blog 1 weeks ago

Aggressive bots ruined my weekend

On the 25th of October Bear had its first major outage. Specifically, the reverse proxy which handles custom domains went down, meaning all custom domains started timing out. Unfortunately my monitoring tool failed to notify me, and it being a Saturday, I didn't notice the outage for longer than is reasonable. I apologise to everyone who was affected by it. First, I want to dissect the root cause, exactly what went wrong, and then provide the steps I've taken to mitigate this in the future. I wrote about The Great Scrape at the beginning of this year. The vast majority of web traffic is now bots, and it is becoming increasingly more hostile to have publicly available resources on the internet. There are 3 major kinds of bots currently flooding the internet: AI scrapers, malicious scrapers, and unchecked automations/scrapers. The first has been discussed at length. Data is worth something now that it is used as fodder to train LLMs, and there is a financial incentive to scrape, so scrape they will. They've depleted all human-created writing on the internet, and are becoming increasingly ravenous for new wells of content. I've seen this compared to the search for low-background-radiation steel , which is, itself, very interesting. These scrapers, however, are the easiest to deal with since they tend to identify themselves as ChatGPT, Anthropic, XAI, et cetera. They also tend to specify whether they are from user-initiated searches (think all the sites that get scraped when you make a request with ChatGPT), or data mining (data used to train models). On Bear Blog I allow the first kinds, but block the second, since bloggers want discoverability, but usually don't want their writing used to train the next big model. The next two kinds of scraper are more insidious. The malicious scrapers are bots that systematically scrape and re-scrape websites, sometimes every few minutes, looking for vulnerabilities such as misconfigured Wordpress instances, or and files, among other things, accidentally left lying around. It's more dangerous than ever to self-host, since simple mistakes in configurations will likely be found and exploited. In the last 24 hours I've blocked close to 2 million malicious requests across several hundred blogs. What's wild is that these scrapers rotate through thousands of IP addresses during their scrapes, which leads me to suspect that the requests are being tunnelled through apps on mobile devices, since the ASNs tend to be cellular networks. I'm still speculating here, but I think app developers have found another way to monetise their apps by offering them for free, and selling tunnel access to scrapers. Now, on to the unchecked automations. Vibe coding has made web-scraping easier than ever. Any script-kiddie can easily build a functional scraper in a single prompt and have it run all day from their home computer, and if the dramatic rise in scraping is anything to go by, many do. Tens of thousands of new scrapers have cropped up over the past few months, accidentally DDoSing website after website in their wake. The average consumer-grade computer is significantly more powerful than a VPS, so these machines can easily cause a lot of damage without noticing. I've managed to keep all these scrapers at bay using a combination of web application firewall (WAF) rules and rate limiting provided by Cloudflare, as well as some custom code which finds and quarantines bad bots based on their activity. I've played around with serving Zip Bombs , which was quite satisfying, but I stopped for fear of accidentally bombing a legitimate user. Another thing I've played around with is Proof of Work validation, making it expensive for bots to scrape, as well as serving endless junk data to keep the bots busy. Both of these are interesting , but ultimately are just as effective as simply blocking those requests, without the increased complexity. With that context, here's exactly went wrong on Saturday. Previously, the bottleneck for page requests was the web-server itself, since it does the heavy lifting. It automatically scales horizontally by up to a factor of 10, if necessary, but bot requests can scale by significantly more than that, so having strong bot detection and mitigation, as well as serving highly-requested endpoints via a CDN is necessary. This is a solved problem, as outlined in my Great Scrape post, but worth restating. On Saturday morning a few hundred blogs were DDoSed, with tens of thousands of pages requested per minute (from the logs it's hard to say whether they were malicious, or just very aggressive scrapers). The above-mentioned mitigations worked as expected, however the reverse-proxy—which sits up-stream of most of these mitigations—became saturated with requests and decided it needed to take a little nap. The big blue spike is what toppled the server. It's so big it makes the rest of the graph look flat. This server had been running with zero downtime for 5 years up until this point. Unfortunately my uptime monitor failed to alert me via the push notifications I'd set up, even though it's the only app I have that not only has notifications enabled (see my post on notifications ), but even has critical alerts enabled, so it'll wake me up in the middle of the night if necessary. I still have no idea why this alert didn't come through, and I have ruled out misconfiguration through various tests. This brings me to how I will prevent this from happening in the future. This should be enough to keep everything healthy. If you have any suggestions, or need help with your own bot issues, send me an email . The public internet is mostly bots, many of whom are bad netizens. It's the most hostile it's ever been, and it is because of this that I feel it's more important than ever to take good care of the spaces that make the internet worth visiting. The arms race continues... Redundancy in monitoring. I now have a second monitoring service running alongside my uptime monitor which will give me a phone call, email, and text message in the event of any downtime. More aggressive rate-limiting and bot mitigation on the reverse proxy. This already reduces the server load by about half. I've bumped up the size of the reverse proxy, which can now handle about 5 times the load. This is overkill, but compute is cheap, and certainly worth the stress-mitigation. I'm already bald. I don't need to go balder. Auto-restart the reverse-proxy if bandwidth usage drops to zero for more than 2 minutes. Added a status page, available at https://status.bearblog.dev for better visibility and transparency. Hopefully those bars stay solid green forever.

0 views
Martin Fowler 1 weeks ago

Agentic AI and Security

Agentic AI systems are amazing, but introduce equally amazing security risks. Korny Sietsma explains that their core architecture opens up security issues through what Simon Willison named the “Lethal Trifecta”. Korny goes on to talk about how to mitigate this through removing legs of the trifecta and splitting complex tasks.

0 views
Jim Nielsen 2 weeks ago

AI Browsers: Living on the Frontier of Security

OpenAI released their new “browser” and Simon Willison has the deets on its security , going point-by-point through the statement from OpenAI’s Chief Information Security Officer. His post is great if you want to dive on the details. Here’s my high-level takeaway: Everything OpenAI says they are doing to mitigate the security concerns of an LLM paired with a browser sounds reasonable in theory. However, as their CISO says, “prompt injection remains a frontier, unsolved security problem”. So unless you want to be part of what is essentially a global experiment on the frontier of security on the internet, you might want to wait before you consider any of their promises “meaningful mitigation”. (Aside: Let’s put people on the “frontier” of security for their daily tasks, that seems totally fine right? Meanwhile, Tom MacWright has rationally argued that putting an AI chatbot between users and the internet is an obvious disaster we’ll all recognize as such one day .) What really strikes me after reading Simon’s article is the intersection of these two topics which have garnered a lot of attention as of late: This intersection seems primed for exploitation, especially if you consider combining different techniques we’ve seen as of late like weaponizing LLM agents and shipping malicious code that only runs in end-users’ browsers . Imagine, for a second, something like the following: You’re an attacker and you stick malicious instructions — not code, mind you, just plain-text English language prose — in your otherwise helpful lib and let people install it. No malicious code is run on the installing computer. Bundlers then combine third-party dependencies with first-party code in order to spit it out application code which gets shipped to end users. At this point, there is still zero malicious code that has executed on anyone’s computer. Then, end users w/AI browsers end up consuming these plain-text instructions that are part of your application bundle and boom, you’ve been exploited. At no point was any “malicious code” written by a bad actor “executed” by the browser engine itself. Rather, it’s the bolted on AI agent running alongside the browser engine that ingests these instructions and does something it obviously shouldn’t. In other words: it doesn’t have to be code to be an exploit. Plain-text human language is now a weaponizable exploit, which means the surface for attacks just got way bigger. But probably don’t listen to me. I’m not a security expert. However, every day that voice in the back of my head to pivot to security gets louder and louder, as it’s seemingly the only part of computer science that gets worse every year . Reply via: Email · Mastodon · Bluesky npm supply chain attacks AI browsers

0 views
Karboosx 2 weeks ago

Use OTP instead of email verification link

Why are we still forcing users to click annoying verification links? That flow is broken. There's a much smoother, simpler, and just-as-secure solution: Use OTP codes instead.

0 views
Filippo Valsorda 2 weeks ago

The Geomys Standard of Care

One of the most impactful effects of professionalizing open source maintenance is that as professionals we can invest into upholding a set of standards that make our projects safer and more reliable. The same commitments and overhead that are often objected to when required of volunteers should be table stakes for professional maintainers. I didn’t find a lot of prior art, so to compile the Geomys Standard of Care I started by surveying recent supply chain compromises to look for mitigable root causes. (By the way, you might have missed that email because it includes the name of a domain used for a phishing campaign, so it got flagged as phishing. Oops.) I also asked feedback from experts in various areas such as CI security, and from other Geomys maintainers. The first draft is below, and we’ll maintain the latest version at geomys.org/standard-of-care . It covers general maintenance philosophy, ongoing stability and reliability, dependency management, account and CI security, vulnerability handling, licensing, and more. In the future, we want to look into adopting more binary transparency tools, and into doing periodic reviews of browser extensions and of authorized Gerrit and GitHub OAuth apps and tokens (just GitHub has four places 1 to look in!). We also welcome feedback on things that would be valuable to add, for security or for reliability. We aim to maintain our projects sustainably and predictably. We are only able to do this thanks to our retainer contracts with our clients, but these commitments are offered to the whole community, not just to paying clients. Scope . We apply this standard to projects maintained or co-maintained by Geomys, including For projects where we are not the sole maintainers, we prioritize working well with the rest of the team. Geomys maintainers may also have personal projects that are not held to this standard (e.g. everything in mostly-harmless ). Code review . If the project accepts external contributions, we review all the code provided to us. This extends to any code generated with LLMs, as well. Complexity . A major part of the role of a maintainer is saying no. We consciously limit complexity, and keep the goals and non-goals of a project in mind when considering features. (See for example the Go Cryptography Principles .) Static analysis . We run staticcheck , by our very own @dominikh , in CI. Stability . Once a Go package reaches v1, we maintain strict backwards compatibility within a major version, similarly to the standard library’s compatibility promise . Ongoing maintenance . Not all projects are actively worked on at all times (e.g. some projects may be effectively finished, or we may work in batches). However, unless a project is explicitly archived or deprecated, we will address newly arising issues that make the project unsuitable for a previously working use case (e.g. compatibility with a new OS). Dependency management . We don’t use automatic dependency version bump tools, like Dependabot. For our purposes, they only cause churn and increase the risk of supply chain attacks by adopting new module versions before the ecosystem has had time to detect attacks. (Dependabot specifically also has worrying impersonation risks , which would make for trivial social engineering attacks.) Instead, we run govulncheck on a schedule, to get high signal-to-noise ratio notifications of vulnerable dependencies that actually affect our projects; and run isolated CI jobs with the latest versions of our dependencies (i.e. running before ) to ensure we’re alerted early of breakages, so we can easily update to future security releases and so we’re aware of potential compatibility issues for our dependents. Phishing-resistant authentication . Phishing is by far the greatest threat to our security and, transitively, to that of our users. We acknowledge there is no amount of human carefulness that can systematically withstand targeted attacks, so we use technically phishing-resistant authentication for all services that allow impacting our projects’ users. Phishing-resistant authentication means passkeys or WebAuthn 2FA, with credentials stored in platform authenticators (e.g. iCloud Keychain), password managers (e.g. 1Password or Chrome), or hardware tokens (e.g. YubiKeys). Critical accounts that allow escalating to user impact include: If a strict mode such as Google’s Advanced Protection Program or Apple’s Advanced Data Protection is available, we enable it. If a phishable fallback authentication or account recovery method is instead required, we configure one that is secret-based (e.g. TOTP or recovery codes) and either delete the secret or commit to never using it without asking a fellow Geomys maintainer to review the circumstances that necessitated it. TOTP can’t hurt us if we don’t use it. We never enable SMS as an authentication mechanism or as an account recovery mechanism, because SIM jacking is possible even without action on our part. Long-lived credentials . We avoid where possible long-lived persistent credentials, or make them non-extractable if possible. For example, we use git-credential-oauth instead of Gerrit cookies, and hardware-bound SSH keys with yubikey-agent or Secretive instead of personal access tokens for git pushes to GitHub. Unlike phishing-resistant authentication, we found it impractical to roll out short-lived credentials universally. Notably, we have not found a way to use the GitHub CLI without extractable long-lived credentials. CI security . We run zizmor on our GitHub Actions workflows, and we don’t use dangerous GitHub Actions triggers that run privileged workflows with attacker-controlled contexts, such as . We run GitHub Actions workflows with read-only permissions and no secrets by default. Workflows that have write permissions or access to secrets disable all use of caches (including indirectly through actions like ), to mitigate cache poisoning attacks . (Note that, incredibly, read-only workflows can write arbitrary cache entries, which is why this must be mitigated at cache use time.) Third-party access . For projects maintained solely by Geomys, we avoid providing user-impacting (i.e. push or release) access to external people, and publicly disclose any exceptions. If abandoning a project, we prefer archiving it and letting a fork spawn to handing over control to external people. This way dependents can make their own assessment of whether to trust the new maintainers. Any exceptions will be widely communicated well in advance. Under no circumstances will we release to public registration a domain, GitHub user/org, or package name that was previously assigned to a Geomys project. Availability monitoring . We have automated uptime monitoring for critical user-facing endpoints, such as the Go import path meta pages. This also provides monitoring for critical domain expiration, preventing accidental takeovers. Transparency logging . We subscribe to new version notifications via GopherWatch , to be alerted of unauthorized module versions published to the Go Checksum Database. We monitor Certificate Transparency logs for critical domains (e.g. the roots of our Go import paths) using tools such as Cert Spotter or Silent CT . We also set CAA records on those domains limiting issuance to the minimal set of CAs required for operation. Vulnerability handling . We document the official vulnerability reporting mechanism of each project, we encourage coordinated vulnerability reporting, and we appreciate the work of security researchers. We honor embargoes of up to 90 days, and we do not share vulnerability details with people not involved in fixing it until they are public. (Paying clients do not get access to private vulnerability details. This is to honor our responsibility to the various stakeholders of an open source project, and to acknowledge that often these details are not ours to share.) Once a vulnerability is made public, we ensure it is included in the Go vulnerability database with accurate credit and metadata, including a CVE number. If the documented vulnerability reporting mechanism is unresponsive, an escalation path is available by emailing security at geomys.org. Licenses . We use permissive, well-known licenses: BSD-3-Clause, BSD-2-Clause, BSD-1-Clause, 0BSD, ISC, MIT, or (less preferably) Apache-2.0. Disclaimer . This is not a legally binding agreement. Your use of the projects continues to be controlled by their respective licenses, and/or by your contract with Geomys, which does not include this document unless explicitly specified. I am getting a cat (if I successfully defeat my allergies through a combination of LiveClear , SLIT , antihistamines, and HEPA filters), so obviously you are going to get a lot of cat pictures going forward. For more, you can follow me on Bluesky at @filippo.abyssdomain.expert or on Mastodon at @[email protected] . This is the work of Geomys , an organization of professional Go maintainers, which is funded by Smallstep , Ava Labs , Teleport , Tailscale , and Sentry . Through our retainer contracts they ensure the sustainability and reliability of our open source maintenance work and get a direct line to my expertise and that of the other Geomys maintainers. (Learn more in the Geomys announcement .) Here are a few words from some of them! Teleport — For the past five years, attacks and compromises have been shifting from traditional malware and security breaches to identifying and compromising valid user accounts and credentials with social engineering, credential theft, or phishing. Teleport Identity is designed to eliminate weak access patterns through access monitoring, minimize attack surface with access requests, and purge unused permissions via mandatory access reviews. Ava Labs — We at Ava Labs , maintainer of AvalancheGo (the most widely used client for interacting with the Avalanche Network ), believe the sustainable maintenance and development of open source cryptographic protocols is critical to the broad adoption of blockchain technology. We are proud to support this necessary and impactful work through our ongoing sponsorship of Filippo and his team. https://github.com/settings/tokens and https://github.com/settings/personal-access-tokens and https://github.com/settings/apps/authorizations and https://github.com/settings/applications  ↩ the and packages in the Go standard library and the FIPS 140-3 Go Cryptographic Module (co-maintained with the rest of the Go team) Staticcheck filippo.io/edwards25519 filippo.io/csrf filippo.io/keygen filippo.io/intermediates (externalized from the standard library) age and typage Sunlight and filippo.io/torchwood yubikey-agent run govulncheck on a schedule, to get high signal-to-noise ratio notifications of vulnerable dependencies that actually affect our projects; and run isolated CI jobs with the latest versions of our dependencies (i.e. running before ) to ensure we’re alerted early of breakages, so we can easily update to future security releases and so we’re aware of potential compatibility issues for our dependents. All Google accounts linked to a Gerrit account Password manager Passkey sync (e.g. Apple iCloud) Website host Domain registrar Package registry (if applicable, although Go’s decentralized package management largely removes this attack surface) https://github.com/settings/tokens and https://github.com/settings/personal-access-tokens and https://github.com/settings/apps/authorizations and https://github.com/settings/applications  ↩

0 views
Simon Willison 2 weeks ago

Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas

My biggest complaint about the launch of the ChatGPT Atlas browser the other day was the lack of details on how OpenAI are addressing prompt injection attacks. The launch post mostly punted that question to the System Card for their "ChatGPT agent" browser automation feature from July. Since this was my single biggest question about Atlas I was disappointed not to see it addressed more directly. OpenAI's Chief Information Security Officer Dane Stuckey just posted the most detail I've seen yet in a lengthy Twitter post . I'll quote from his post here (with my emphasis in bold) and add my own commentary. He addresses the issue directly by name, with a good single-sentence explanation of the problem: One emerging risk we are very thoughtfully researching and mitigating is prompt injections, where attackers hide malicious instructions in websites, emails, or other sources, to try to trick the agent into behaving in unintended ways . The objective for attackers can be as simple as trying to bias the agent’s opinion while shopping, or as consequential as an attacker trying to get the agent to fetch and leak private data , such as sensitive information from your email, or credentials. We saw examples of browser agents from other vendors leaking private data in this way identified by the Brave security team just yesterday . Our long-term goal is that you should be able to trust ChatGPT agent to use your browser, the same way you’d trust your most competent, trustworthy, and security-aware colleague or friend. This is an interesting way to frame the eventual goal, describing an extraordinary level of trust and competence. As always, a big difference between AI systems and a human is that an AI system cannot be held accountable for its actions . I'll let my trusted friend use my logged-in browser only because there are social consequences if they abuse that trust! We’re working hard to achieve that. For this launch, we’ve performed extensive red-teaming, implemented novel model training techniques to reward the model for ignoring malicious instructions, implemented overlapping guardrails and safety measures , and added new systems to detect and block such attacks. However, prompt injection remains a frontier, unsolved security problem, and our adversaries will spend significant time and resources to find ways to make ChatGPT agent fall for these attacks . I'm glad to see OpenAI's CISO openly acknowledging that prompt injection remains an unsolved security problem (three years after we started talking about it !). That "adversaries will spend significant time and resources" thing is the root of why I don't see guardrails and safety measures as providing a credible solution to this problem. As I've written before, in application security 99% is a failing grade . If there's a way to get past the guardrails, no matter how obscure, a motivated adversarial attacker is going to figure that out. Dane goes on to describe some of those measures: To protect our users, and to help improve our models against these attacks: I like this a lot. OpenAI have an advantage here of being a centralized system - they can monitor their entire user base for signs of new attack patterns. It's still bad news for users that get caught out by a zero-day prompt injection, but it does at least mean that successful new attack patterns should have a small window of opportunity. "Defense in depth" always sounds good, but it worries me that it's setting up a false sense of security here. If it's harder but still possible someone is going to get through. Logged out mode is very smart, and is already a tried and tested pattern. I frequently have Claude Code or Codex CLI fire up Playwright to interact with websites, safe in the knowledge that they won't have access to my logged-in sessions. ChatGPT's existing agent mode provides a similar capability. Logged in mode is where things get scary, especially since we're delegating security decisions to end-users of the software. We've demonstrated many times over that this is an unfair burden to place on almost any user. This detail is new to me: I need to spend more time with ChatGPT Atlas to see what it looks like in practice. I tried just now using both GitHub and an online banking site and neither of them seemed to trigger "watch mode" - Atlas continued to navigate even when I had switched to another application. Watch mode sounds reasonable in theory - similar to a driver-assisted car that requires you to keep your hands on the wheel - but I'd like to see it in action before I count it as a meaningful mitigation. Dane closes with an analogy to computer viruses: New levels of intelligence and capability require the technology, society, the risk mitigation strategy to co-evolve. And as with computer viruses in the early 2000s, we think it’s important for everyone to understand responsible usage , including thinking about prompt injection attacks, so we can all learn to benefit from this technology safely. I don't think the average computer user ever really got the hang of staying clear of computer viruses... we're still fighting that battle today, albeit much more successfully on mobile platforms that implement tight restrictions on what software can do. My takeaways from all of this? It's not done much to influence my overall skepticism of the entire category of browser agents, but it does at least demonstrate that OpenAI are keenly aware of the problems and are investing serious effort in finding the right mix of protections. How well those protections work is something I expect will become clear over the next few months. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . We’ve prioritized rapid response systems to help us quickly identify block attack campaigns as we become aware of them. We are also continuing to invest heavily in security, privacy, and safety - including research to improve the robustness of our models, security monitors, infrastructure security controls, and other techniques to help prevent these attacks via defense in depth . We’ve designed Atlas to give you controls to help protect yourself. We have added a feature to allow ChatGPT agent to take action on your behalf, but without access to your credentials called “logged out mode” . We recommend this mode when you don’t need to take action within your accounts. Today, we think “logged in mode” is most appropriate for well-scoped actions on very trusted sites, where the risks of prompt injection are lower . Asking it to add ingredients to a shopping cart is generally safer than a broad or vague request like “review my emails and take whatever actions are needed.” When agent is operating on sensitive sites, we have also implemented a "Watch Mode" that alerts you to the sensitive nature of the site and requires you have the tab active to watch the agent do its work . Agent will pause if you move away from the tab with sensitive information. This ensures you stay aware - and in control - of what agent actions the agent is performing. [...]

1 views