There is a document in almost every technology company's legal folder called something like "Proprietary Information and Inventions Agreement."
New employees sign it during onboarding, usually on the same day they set up their laptop and fill out their tax forms. The document states, in language that has been reviewed by a lawyer and refined over multiple drafts, that all code written by the employee in the course of their employment belongs to the company.
The company's founders read this document and feel reassured. The legal team files it and considers the matter settled. The developer signs it and forgets it exists within the week.
What nobody in that onboarding process discusses is the more interesting question, what does "ownership" actually mean for software in 2026, and under what conditions does it hold?
The answer, for a growing number of organisations, is that it holds less firmly than anyone assumed. The mechanism eroding it is not a competitor, a disgruntled employee, or a legal challenge. It is a browser tab with a chat window that the company's own developers opened of their own free will.
The Number That Should End the Conversation
Research compiled across multiple workforce surveys puts the figure at between 37% and 38% of employees who admit to sharing confidential internal company data, including private source code, with AI platforms their organisations have not approved.
Admit to. That qualifier matters. The actual figure is almost certainly higher. Surveys that ask people whether they have done something that could be characterised as a policy violation reliably undercount, because a meaningful portion of respondents either do not recognise that what they did constitutes the behaviour being described, or they recognise it and decline to say so.
Apply that logic to the 37% figure. In an engineering team of twenty developers, seven or eight openly acknowledge doing this. The real number is probably ten or eleven. In a company of one hundred employees, thirty-seven admit it. The company has no mechanism to know which thirty-seven, or what they shared, or when.
What is not captured in this figure, and what matters more than the behaviour itself, is what was shared. "Confidential internal company data" is a category that includes meeting notes, customer lists, financial projections, and HR records. Source code is specifically called out in the research. Source code that represents the technical implementation of the company's product, the logic that determines how it behaves, the architecture decisions that took years of iteration to arrive at.
The company that signed those proprietary information agreements and filed them confidently believes it owns that code. The code is in its repository. The developers are on its payroll. The legal paperwork is in order.
But the code has also passed through the servers of at least one commercial AI provider, under terms that most organisations have never read, on an account that carries no data processing agreement, potentially in a jurisdiction where the company has no legal presence and no enforcement rights.
The ownership claim has not been extinguished. But it has been complicated in ways that the proprietary information agreement was never designed to address.
What IP Ownership Actually Requires
Intellectual property protection for software is not a single thing. It is a stack of overlapping legal mechanisms, each with different conditions and different vulnerabilities.
Copyright arises automatically. From the moment a developer writes original code, it is protected by copyright. No registration required. This sounds comprehensive, and for the specific expression in the code it mostly is. Copyright protects the exact way something was written. It does not protect the underlying approach, the algorithm, or the business decision embedded in the code. A competitor who independently arrives at a similar implementation through different code does not infringe the copyright.
This limitation is why trade secret law matters more to most technology companies than copyright. Trade secret protection covers the algorithm itself, the architectural choice, the performance optimisation that took three months to get right. It covers the things that make the product work better than an alternative, independent of how they happen to be expressed in code on any given day.
But trade secret protection has a condition that copyright does not. It requires the owner to take reasonable measures to maintain secrecy. The protection exists because and only because the information was kept confidential. The moment it is disclosed to a third party without adequate protections, trade secret status is gone. Not weakened. Gone. Permanently.
This is not a technicality. It is the structural foundation of the entire legal concept. Trade secrets are valuable because they are secret. Once they are not, the law has nothing left to protect.
So the question becomes: does submitting proprietary source code to a free-tier commercial AI service, under terms of service that permit the provider to use that content to improve their models, without a data processing agreement, without any confidentiality obligation binding the provider, constitute disclosure of the trade secret?
No court has delivered a definitive ruling on this specific fact pattern as of early 2025. But the academic and legal consensus forming around the question points in a consistent direction. The reasonable measures standard will be applied to what the company actually did to protect its code, not to what its proprietary information agreement said about confidentiality. A company with no AI usage policy, no monitoring, and no technical controls, whose developers were systematically submitting code to unapproved services, will face a challenging argument that it took reasonable measures.
The proprietary information agreement the developer signed does not help here. That agreement creates obligations for the developer toward the company. It does not create obligations for the AI provider. It does not prevent the disclosure from occurring. It does not restore trade secret status after the disclosure has happened.
The Ownership Myth Has a Second Layer
Assume for a moment that the trade secret question is resolved in the company's favour. There is still a second problem with the IP ownership claim, and it runs in the opposite direction.
Not all of the code in the company's repository was written by human developers. An increasing portion of it was generated by AI tools. Research from GitHub's Octoverse report suggests that in some codebases, AI tools contributed to nearly half of the committed code. For individual features or modules, the proportion can be higher.
Copyright law in most major jurisdictions requires human authorship. The US Copyright Office has been explicit: works generated by AI without sufficient human creative contribution cannot be registered and do not receive copyright protection. Several court decisions have reinforced this position. The UK and EU are working through equivalent questions without yet reaching settled conclusions, but the direction of travel is consistent.
What this means for a company that has been using AI tools for development without governance is that a meaningful portion of what it calls its intellectual property may not attract copyright protection at all. The code exists. The company can prevent its employees from copying it. But if a competitor independently wrote similar code, the copyright claim that would normally give the original company a basis for enforcement may not exist.
The company does not know which portions of its codebase were substantially AI-generated without significant human creative input. It probably has not thought to ask. The proprietary information agreement does not address the question. The developer onboarding process does not address it. The security policy does not address it.
The code sits in the repository, covered by a legal claim that may be thinner than anyone in the company has been told to consider.
The Third Problem: The Code That Came Back Different
There is a dimension to this that operates entirely in the background and that no policy document, however well-crafted, can retroactively address.
AI models trained on large datasets of code learn patterns. Those patterns influence the output the model generates. When a developer submits proprietary code and the model incorporates what it learned into its weights, subsequent users of the model may receive suggestions that statistically reflect the patterns in the submitted code, without any direct copying occurring, without any deliberate action by the provider, and without any ability for anyone to trace the connection.
A competitor using the same AI service and asking for suggestions on a similar technical problem may receive output that happens to align with the original company's proprietary approach. Not because anyone copied anything. Because the model generates statistically likely outputs based on everything it has learned, including, potentially, the code that the original company's developers submitted through a free-tier account.
The legal framework for addressing this scenario does not yet exist in any jurisdiction. The harm is real: the competitive advantage embedded in the IP has been diminished. But the mechanism does not fit into trade secret misappropriation, because the competitor did not misappropriate anything. It does not fit into copyright infringement, because no copying occurred. It is a form of value erosion that the existing IP framework was not designed to capture.
Companies that believe their IP ownership claim is solid because they have the paperwork in order have not reckoned with the possibility that the value protected by that paperwork is being quietly redistributed through the normal operation of tools their own employees chose to use.
What the Myth Costs in Practice
The IP ownership claim is most tested when it matters most. Three situations make this concrete.
The first is acquisition due diligence. When a company is being acquired, technical due diligence includes IP chain of title: can the company demonstrate that it owns what it says it owns, free of encumbrances, with no third-party claims? A due diligence team that asks about AI tool usage and receives a vague answer, or that discovers through developer interviews that free-tier tools were used extensively without governance, will flag the IP as uncertain. Uncertain IP at an acquisition reduces valuation, introduces indemnification requirements, or in some cases causes deals not to close.
The second is enterprise customer onboarding. Sophisticated enterprise customers in financial services, healthcare, and government require vendor assessments that increasingly include questions about how development tools are governed. A vendor that cannot demonstrate clean AI governance is becoming a vendor that does not pass the assessment. The contract goes elsewhere.
The third is litigation. If the company ever needs to assert its IP rights in court, the history of how that IP was treated matters. A company that systematically allowed its most sensitive code to pass through unapproved AI services, with no policy and no monitoring, will have this fact used against it in any proceeding where it claims to have taken reasonable measures to protect the IP.
The myth costs nothing until one of these situations arises. When they arise, it costs a great deal.
What Organisations Are Getting Wrong in Their Response
The most common response when organisations become aware of this problem is to reach for a policy document. Write something that says "do not share proprietary code with unapproved AI services." Have the legal team review it. Send it to all staff. File it.
This response treats the problem as a communication gap. The developers did not know the rule. Now they do. Problem solved.
The problem is not a communication gap. It is a structural one. A developer at eleven at night, production is down, a specific function is failing in a way that is not obvious, has a choice between two paths. Path one: open the AI tool that solves this class of problem in two minutes. Path two: check whether the code involved falls into the category that requires an approved tool, determine which approved tools are available for this category, ensure the account type is enterprise-grade, submit the query through the approved channel, wait for the response.
The developer will take path one. Not because they are careless. Because they are solving a real problem under real pressure, and path one works. The policy document exists in a different mental context than the production incident. They do not intersect at eleven PM.
Governance that actually changes behaviour combines written policy with the right tools at the right account tiers, so that the path of least resistance and the compliant path are the same path. A developer with access to a properly configured enterprise AI account, with clear guidance that this is the approved tool for this category of work, does not face a choice. They use the tool. The usage is governed. The IP is protected.
The companies that solve this problem are the ones that remove the friction from the right choice rather than adding friction to the wrong one.
The Report
The Invisible Risk is Bithost's research report on this topic. It covers the mechanics of how AI tools transmit and retain data, the legal framework for IP protection and where it holds and where it does not, the governance gap in organisations with 10 to 200 engineers, the regulatory exposure under GDPR and the EU AI Act, sector-specific risk profiles, and what a practical governance programme looks like.
It is built for technical leaders and founders who need to understand this clearly enough to make decisions, not for legal specialists who already do.
The report includes a self-assessment scorecard. Fifteen minutes with that scorecard produces a clearer picture of where your organisation sits than most legal teams have been asked to produce.
To request the full report click here
To speak with the Bithost team about an assessment for your organisation, write to sales@bithost.in.