Subversion Obliterate, the forgotten featurePDF
Sometimes it is best to know forehand some of the more obscure features and limitations of a software product, before diving in. One such obscure limitation is that of the infamous Obliterate feature of the Version Control software, Subversion, that people only learn of when they discover it is not there. Will you need it? Most likely yes, at some point. Especially if you are dealing with a large codebase, you will want to be wary of Subversion, however great its existing feature are. Read on.
In a nutshell, Subversion is Open Source, therefor is free, it is time tested and is actively being developed for. For many, this alone is enough reason to use Subversion for their version control needs. It is the supposed successor to CVS and therefor boasts most of the features CVS had, better and more. According to the feature list on the Subversion homepage:
Most current CVS features.
- Subversion is meant to be a better CVS, so it has most of CVS’s features. Generally, Subversion’s interface to a particular feature is similar to CVS’s, except where there’s a compelling reason to do otherwise.
This sounds really good, and this is only one item of the list. Yet still, one feature is missing. Can you guess it?
The need to Obliterate
Obliterate means ‘really removing stuff from the repository, seriously’. When you check in files and at some point decide you need it removed, normally you would go and simply use the delete command for Subversion. Either manually by command line, or through some automated tool such as TortoiseSVN. However, you don’t really delete it from the repository, it is only marked as deleted. Now why would you want to completely obliterate anything from a repository including version history?
There are a couple of situations that might arise:
- You accidently checked in sensitive information and need it removed
- You are running a large codebase and need to archive old repository entries to clean up to save drivespace
- You need to split up a repository in several others for some reason, maybe a project needs to be split up
- You accidently checked in large chunks of (unused) information (perhaps some .iso was somehow included), which is sitting there bloating your repository for no good reason
All these cases have already happened. Most notably, the first case happened with the Apache codebase, where third-party code that Apache did not have intellectual rights of was checked into the Subversion repository. They simply couldn’t get it out, because there was no command like Obliterate. Deleting them the normal way wasn’t enough, since you could still get to the pre-deletion version. There is no way, at all, to accomplish this… save for one workaround that is not bulletproof (we’ll come to that).
These are not use cases I came up with. They are suggested many times already and the subversion dev team knows about these. In fact, and here’s something that will blow your mind, they have known about these problems for 7 years. Say what? Take a look at the Obliterate feature request in the Subversion issue tracker. All these use cases have been discussed and summarized there, and are acknowledged as valid ground for feature implementation. Notice the first post of October 2001. It was then when it was first ‘officially’ reported as an issue.
Also take a look at the feature requests on the mailing lists.
The broken workaround
There exists a feature in Subversion called ‘dump’. It allows you to dump the entire repository into a single file. Then there is the feature ‘load’, that can fill an existing (empty) repository from such a dump file. The workaround revolves around a third feature called ‘dumpfilter’. With this, you can filter a dump file and so remove files with a pathbased matching pattern.
This works up to a certain point:
- It is cumbersome, and works inconsistent in various environments (related to filesystems and trailing slashes)
- It is very slow (can end up in tens of hours for very large codebases)
- It does not always work (the infamous ‘Invalid copy source path’ error)
So while it is very nice to have at least some workaround to fall back to, it sometimes simply isn’t a viable option. At least with CVS you could dive into the repository itself and remove the files manually. In Subversion not so (I’m not implying though CVS is a better alternative).
7 years, where the hell is it?
That’s the billion dollar question. Or to be more accurate, that’s the X dollar question, where X is the bounty that some suggested should be offered for whoever gets the feature in (cit.). Apparently the problem lies in the nature of Subversion’s filesystem; working in an Obliterate command supposedly requires tons of requirement specifications, design reports and even more debates on these before even a capable programmer could be put on the matter, with a salary from a sponsor. All this may be true, but… let me reiterate:
Apparently Subversion’s codebase is so complex that it simply isn’t accessible enough for the public to come up with a patch all this time. Remember Subversion has been one of the accepted de facto standards for Revision Control for many years, it should be one of the more prolific Open Source projects running to date. Yet after
seven years there has been no activity for an Obliterate command, save a whole lot of brain activity. In fact it apparently is so inaccessible, the Subversion development team itself doesn’t feel confident it can pull it off without rewriting the filesystem (cit.). I’m not trying to trivialize the situation, it’s no small task and documentation certainly is important. The truth is that the issue has been dead in the water for many years, while question marks keep popping up around the web.
How is it possible that such a widely acknowledged feature is still missing after all these years. Perhaps planned already? Let’s see, what is Subversion’s (official) standpoint on this:
- It is not on the roadmap, medium or long term
- the Subversion guide (note 35) (SVN Guide by red-bean) mentions this feature is planned, though posts on the web suggest it has been in there since the beginning of time
- The issue tracker promises no timeframe and in fact states the development team is reluctant to do anything until design documents have been produced
Finally there is the ubiquitous FAQ entry about version history removal that states:
How do I completely remove a file from the repository’s history?
There are special cases where you might want to destroy all evidence of a file or commit. (Perhaps somebody accidentally committed a confidential document.) This isn’t so easy, because Subversion is deliberately designed to never lose information. Revisions are immutable trees which build upon one another. Removing a revision from history would cause a domino effect, creating chaos in all subsequent revisions and possibly invalidating all working copies.The project has plans, however, to someday implement an svnadmin obliterate command which would accomplish the task of permanently deleting information. (See issue tracker, issue 516.)
In the meantime, your only recourse is to svnadmin dump your repository, then pipe the dumpfile through svndumpfilter (excluding the bad path) into an svnadmin load command. See chapter 5 of the Subversion book for details about this.
It all seems a case of a design flaw that is almost irreparable due to a complex filesystem implementation and
seven years of labor on this implementation.
What to do now
There seems little we can do. Try downloading the sourcecode and see if you can make sense of it all. Be aware, even the core developers are afraid to touch it without refined documentation backup them up. Maybe go along with the bounty idea and attract Cowboy Coders that lack the compulsive need for documentation.
Meanwhile, take a look at the comparison list of revision Control software.
Finally, here’s a semi-interview from 27 February 2008 I found, where Karl Fogel states what needs to be done to get the ball rolling.Tags: cvs • revision control • Subversion • sv-obliterate • svn • Version Control
Thank you for drawing more attention to this problem (maybe it will encourage people to try to solve it). I think you may have misanalyzed, though.
The delay is not due to the supposed complexity of the codebase. Subversion’s repository code is not *that* horrifyingly complex :-). I mean, sure, you need to know a thing or two, but grokking the code is not the real obstacle.
The real obstacle is that no one has stepped up to design this and see it through to completion. Figuring out the desired behavior (with enough specificity to actually implement) is what’s hard here, not the implementation itself.
You misunderstood the citation that said “… the ability to permanently remove a file, dir, or revision from history forever. This means rewriting the whole filesystem; a big deal. …” What that commenter meant was that Subversion would have to rewrite repository data when an ‘obliterate’ is performed, not that programmers would have to rewrite the filesystem code in order to implement ‘obliterate’! (Even then it’s not completely true, and in any case it was carelessly worded; sorry you got misled.)
Why haven’t we implemented ‘obliterate’? Well, because you haven’t :-). That is to say, there is no “we” here; work gets done by those who do it. Each person working on Subversion has their own reasons for doing it: some are paid full-time, some are paid part-time, and some are volunteering their time. In the first two categories, you can expect that those paying the bills will have something to say about their developers’ priorities, and so far they haven’t seen ‘obliterate’ as a compelling enough feature to fund. The volunteers apparently have other itches to scratch.
But there can be new volunteers! We’d all like to see the feature happen. We’re willing to help with the design. (It’s true that I recently recommended discussion wait until after 1.5, but that’s a temporary thing, and certainly hasn’t applied for the last seven years.)
I do think this comment was gratuitous:
“There seems little we can do. Try downloading the sourcecode and see if you can make sense of it all. Be aware, even the core developers are afraid to touch it without refined documentation backup them up. Maybe go along with the bounty idea and attract Cowboy Coders that lack the compulsive need for documentation.”
What are you talking about? The core developers, and even occasional patch contributors, touch that code all the time. Just run ‘svn log’ on our repository to see. The idea that fear of changing the code is somehow responsible for the delay is… how can I say this politely? … absolutely wrong.
The code is not the obstacle here, the behavioral specification is. (There have recently been some good suggestions for incremental implementation requiring a less detailed spec; they would still require fleshing out, but they hold promise.)
I guess the obvious question is, do people really want this feature? I don’t want it.
I think most publicly reporting company’s wouldn’t be able to use Subversion anymore, without the audit trail that Subversion provides. If you could people could just overwrite bits when they wanted to, what good is the audit trail? So much for SOX compliance. Everything that could materially affect the finances must have controlled access, including the software.
A dump/load cycle, and hacking the dump is as reasonable workaround, when things go disastrously wrong. It is an extreme measure, but it is for extreme cases.
Your SOX compliance vanishes in a puff of smoke with ‘rm -rf repo’ — being able to remove items from the repository pales in comparison the the power of the shredded of file system — which in turn can be bypassed with a sledgehammer to the hard disk. Besides, you *can* dump the repository and filter the files anyway. It’s just harder and error prone.
> do people really want this feature? I don’t want it.
I guess you haven’t worked in the same SVN repos with non-technical people.
After a consultant had been working two years against our svn, I noticed that she had added gigabytes of MS-installation packages into it, with a version number in the file names (her way to version them)… I want to obliterate them.
“Subversion is Open Source, therefor is free.”
Sorry to point this out… but open source quite literally means that the source code is open and available (to all or some people). It does not necessarily mean that its free.
“Open source. Seven years.”
A commercial company can still create a product that is open source, and still charge for it… please don’t try to link the phrase “open source” as a label for software that is only developed by a community of (hard working) developers, for free.
If you don’t like the fact that subversion is lacking a single feature, then perhaps you should try some other product… possibly git (not sure if that has this feature)… alternatively, you could try paying for SourceSafe (and if you do, I look forward to the complaints for something you have paid for – like the issues with file locking).
I found your post pretty useful and informative but if u take my personal stand on this it is not that a big requirement for this feature as I am using Subversion from long time now and have fixed such problems using dump and dumpfilter.
Its a good to have feature but not a feature to crib about.
You may differ but that is what I feel
Karl, thank you for clarifying some of the issues I’ve mentioned, that is after all what this is about.
“Why haven’t we implemented ‘obliterate’? Well, because you haven’t :-). “
I actually thought of that and I’m not sure where I stand on this. In some way it feels a little bit like saying “Politicians are always right, because you elected them [and they represent you]”. but that ofcourse doesn’t compute. The truth is the vast majority of developers out there -that do not have the spare time or the budget to work on Open Source projects- are represented by the lightened souls of those who áre in the fortunate position to spend a lot of time on it. As such bystander of Subversion’s OS community, I can only conclude that the reason it is not implemented isn’t because I (‘We’) haven’t done so, but because it seems an impossible task as evidenced by the many years that have gone by and the fact that Subversion is a popular framework.
Relatively speaking, It’s hard to believe Subversion didn’t attract enough OS developers being as popular as SVN is, spanning seven years of opportunity, that might’ve implemented it. If it did, which I think it did, it apparently is too hard to accomplish. If it didn´t, then apparently the feature isn´t as wished for as I would´ve thought.
Karl said: “The code is not the obstacle here, the behavioral specification is. (There have recently been some good suggestions for incremental implementation requiring a less detailed spec; they would still require fleshing out, but they hold promise.)”
That sounds promising, do you have some reference where we might find this?
I agree the feature is good to have and obviously not critical. But just in case you are in a tight spot and really need it removed and the dumpfilter isn’t an option, it would be *really* nice to have it. In terms of larger companies with large codebases (I’ve worked for one), this is a risk that needs weighing… and as evidenced even by some contributors on the issue 516 tracker, it can be a showstopper.
All open source software is free software, and vice versa. The two terms are effectively synonymous. Any software license that required people to pay when they copy the software would not meet the Open Source Definition and would not be open source. “Open Source” does not merely mean “You can view the source”. It means “You can view, modify, redistribute, and redistribute modified versions of the source.” It means no one gets a monopoly on distribution and modification, not even the original authors or copyright holders.
There are hundreds of proposed features in Subversion for which “years have gone by” and yet they are still not implemented. Would you conclude from this that they must all be really difficult (“impossible”)? Or would you conclude that those with the inclination or funding to work on Subversion simply chose to work on other tasks, for a variety of reasons?
I don’t have time to dig up the archived thread from the dev@ list, unfortunately. If I were actively working on this, I would, but I have other commitments in Subversion myself right now. It’s recent, though; you can find it if you look for it.
@ Karl Fogel
You seem to be referring to a specific open source licensing model.
The term is more general than that… for example, I create software for companies, they pay me for that service… the code is open (aka “Open Source”) and available for them, but is _not_ available to other companies, and it is not free (I’ve got to earn money for the mortgage). An alternative would be distributing the binary version of the software (aka “Closed Source”).
Don’t assume that anything “Open Source” falls under licensing that means that is it free… and saying otherwise is doing a great disservice… I personally think we should go back to the way software used to be distributed when main-frames were in use… you got the software and the source code, and you could modify it yourself… but ultimately, you paid for the software in the first place… I’m not saying that free open source software should not exist (hell, I use it every day), but it does not mean that a company cannot make money from having the source open… this is where I think companies like Microsoft are causing problems, as it makes it virtually impossible to work with them (networking stack, etc).
It’s a good feature. Maybe instead of obliterate, you could implement hide. That way, it remains in the database and yet isn’t displayed to anyone who doesn’t have access to the database.
I was planning to apply to Subversion in this year’s Summer of Code to attempt to fix some of the SVN annoyances I encountered during last year’s. Maybe I’ll add obliterate to the list. Though your talk about the complexity makes me think of reconsidering (of course, I have a very small chance of being accepted, so why not?).
BTW, open-source is free as in libre but not necessarily as in gratis. GNU itself states that you can charge for free (open-source) software, but you can’t charge extra for the source code. I don’t know of any projects that actually do this, but it’s theoretically possible.
Pingback: Magnanimous » Blog Archive » All aboard the Git train
I quote the first two comments, look at the date:
Some people want the ability to permanently remove a file, dir, or revision from
history forever. This means rewriting the whole filesystem; a big deal. It
would also break existing working copies.
But hey, some folks want security. 🙂
——- Additional comments from Ben Collins-Sussman Thu Oct 4 10:56:37 -0700 2001 ——-
1 week estimated, post-1.0
I Belive this article was VERY IMFORMATIVE and really hits the nail on the coffin. Belive me this feature is wanted (i wouldn’t even call it a feature it is basic design if you ask me)
and for the record @TOM
“A dump/load cycle, and hacking the dump is as reasonable workaround, when things go disastrously wrong. It is an extreme measure, but it is for extreme cases.”
this is nonsence, and if this horrible “workaround” didn’t exist we might have an existing obliterate today.
Not that I’m impressed a lot, but this is more than I expected for when I found a link on Delicious telling that the info here is quite decent. Thanks.
Thanks for posting this. I’m really surprised that no one has wrapped the dump/dumpfilter/load command as a little batch file and called it obliterate. The procedure seems to work 80-90% of the time and could be fixed to work all of the time.
I don’t see why the workaround couldn’t become the sanctioned way to do this.
As an open source developer, I think it’s ridiculous when someone uses errr… abuses the term “open source” to absolve responsibility.
If you’re lucky enough to have a community and they are pretty unanimously requesting a feature, there’s no excuse not to implement it OTHER THAN that the code is too big of a mess. Fine, rewrite it. Don’t just say “Hey, it’s open source. Why don’t you download it and add the feature.” That’s just ridiculous.
I’m also wondering why the SVN folks decided not to implement this from the beginning. I mean, who would decide *not* to implement a feature like this? One has to wonder if it was just a huge, gigantic, mother of all “woops”.
I need this feature right now. When can it be ready? And no I won’t use the 75% chance of success batch process.
Creativity instead of moaning might do the trick. The number one use case (for me) is the need to remove weighty old content – not old source. I like old source code. No need to DELETE a revision POINTER – Modify revision tree with 0 length files and flag it, allowing UI’s downstream to filter or note the change. This is software, right? Once obliterate is in, system admins should be able to disable it if their auditing requirements call for such. It isn’t like, even without the feature, people are prevented from bypassing auditing requirements by deleting the entire repository. Sorry for all the obfuscatory double negatives in the last sentence, just wanted to blend in with the thread.
That was sheer nonsense.
Just a very quick note to say that developement for Obliterate is ongoing and is going well.
Julian Foad blogged about his progress here:
He’s also been updating our wiki here:
Benny Bottema Post author
This is excellent news indeed, thank you for the heads up!
Benny Bottema Post author
Alas, it seems Obliterate has been dumped off the wagon. That’s a bit harsh on my side, since there actually has been an effort to get some work done on this feature. for now though (that’s optimistic) the work proves to be too difficult to implement (read: integrate) in the existing code base.
Release status (from the roadmap):
Remove obliterate code – Not Started: The obliterate feature work made only extremely minimal progress, and does not carry enough practical usefulness to warrant release. See this thread for discussion.
You gotta love their sense of humor though:
Some of the problem cases would be addressed with a much simpler way to “prevent access to a revision”. Our typical problem case is that we committed a revision with a problem, or that we committed a revision marked as a new release version and decided that it should *not* be the release version (needs an extra touch-up / menu word change / log notes / etc.) Any takers?
But why would we implement a feature in a proprietary code base and not get paid.
Today we implement and tomorrow you start charging for svn because you can.
It may be open source, but it’s not gpl.
So anyone who really cares, use git.
Even the ability to obliterate/roll back the top, most recent commit into the repository would be welcome. I know that this isn’t the same as fixing a repository inflated several years ago by the inclusions of gigabytes of porn, but then it also shouldn’t be the same level of difficulty.
Rollback is the standard base level behavior we expect from most of our software. And as I read the repeated requests, the most common request is from people who want to do just this simple task.
Hey, SVN is also a haven for users who are unwanted in other repos.
As a doco, I use Subversion with a Madcap Flare front-end as a “Poor Man’s CMS”. It is excellent in that respect. I use SVN because my git-using engineering colleagues scorn my binary blobs, unwanted in their compact, unicode repo. [As long as I have a final-format file in a e-library somewhere, SOX is satisfied.]
I don’t really need to keep old screenshots, videos, and AI files for ever, and some of my blobs are very big indeed – so I run out of repo space. Every year or so , I export my files, obliterate the old repo, archive a minimum set, and then reload the working files into a new repo,
Maybe they could just make it easier to obliterate final-format and other binary files
I’ll propose an alternative theory here: ‘svn obliterate’ has not been implemented even after all of these years because dumpfilter, as painful as it is, is a good enough solution.
The only thing obliterate would add over dumpfilter is user-friendliness. However, the software development community’s experience with git has taught us that rewriting history on a collaborative project is a decision *not* to be taken lightly and should be done in serious emergency situations, only. How many leaders of projects stored in git accept non-fast-forward pull requests, despite git’s full support of history rewriting? Roughly zero, the last I checked. And that’s because rewriting history in a multi-user repository is a really bad idea, 99.999% of the time.
For that 0.001% of the time, we have dumpfilter, and that’s good enough. That’s why there is no obliterate, and there probably never will be.
correct the link
I read through all these postings with great interest and pleasure because I learned a lot about Open Source projects, the concerns of the SVN users and contributors, and the intricacies of such an undertaking as a whole.
Let me just add my 2 cents to the perspective of this discussion: I eventually got sick with the issue of backing up (and even more so: restoring) data on my Win7 laptop (I’m an IT consultant). So I finally installed an SVN server on my laptop. Viola, all I have to do is to backup the root directory where all my SVN repositories reside – a foolproof no-brainer. I’m totally, totally happy with this idea.
In such a scenario you do make mistakes and include things into version control you shouldn’t have (e.g. pictures from a company event).
If more people would embrace such a use case the urge to implement the “obliterate” feature would seem much more important than now.
Pingback: Nuking huge file in svn repository | XL-UAT
Having wasted days trying to obliterate some folders and getting the dreadful svndumpfilter’s Invalid copy source path error, I’ve stumbled across a pretty awesome tool called Subdivision that managed to obliterate the required folders in one pass. Essentially you choose what files or folders you want to obliterate and it seems to analyze the repository structure and work out what other files or folders need to be obliterated as well and which files or folders need to stay so that the filtering succeeds. It can also extract files or split a repository in two parts. Here’s the website in case you can’t find it http://subdi.vision
Look at that. Subdivision seems to do the trick just fine.
It’s funny how much you can accomplish when you have paid developers who are actually motivated to try.
I understand the problem, that subversion isn’t designed to “forget” and therefore it isn’t easy to remove a revision, a file in the history of diffential changes, etc.
All use cases are solved when svn obliterate would just empty the file and all connected files before and afterwards. That means you would see that it is there, it was there, it was changed, it was moved, who commited it, but the content is 0 byte.
I’m just suffering while trying to remove a iso file from an repo. unfortunatelly it is already there for a couple of month and hundreds of commits:(
Subdivision Free Edition did the job.