Ask how can one create a Debian mirror and you will get
a dozen different responses. Those who are used to mirroring content, whether it is distribution packages, software in general, documents, etc., will usually come up with one answer:
rsync.
Truth is: for Debian's archive, rsync is not enough.
Due to the design of the archive, a single call to rsync leaves the mirror in an inconsistent state during most of the sync process.
Explanation is simple: index files are stored in dists/, package files are stored in pool/. Sort those directories by name (just like rsync does) and you get that the indexes will be updated before the actual packages are downloaded.
There are multiple scripts out there that do exactly that, one of them in
Ubuntu's wiki. Plenty more if you search the web.
Now, addressing that issue shouldn't be so difficult, right? after all, all the index files are in dists/, so syncing in two stages should be enough. It's not that simple.
With the dists/ directory containing over 8.5GBs worth of indexes and, erm,
installer files, even a two stages sync will usually leave the mirror in an inconsistent state for a while.
How about only deferring to the second stage the bare minimum?, I hear you ask.
That is the
current approach, but it leads to
some errors when new index files are added and used. The fact that people insist in writing their
own scripts doesn't help.
Hopefully, some ideas like
moving the installer stuff out of dists/ and
overhauling the repository layout are being considered. An alternative is to make the users of the mirrors more robust and fault-tolerant, but we would be talking about tenths if not hundreds of tools that would need to be improved.
In all cases, the one script that is actively maintained, is rather portable, and improved from time to time is the
ftpsync script. Please, do yourself and your users a favour: don't attempt to reinvent the wheel (and forget about calling rsync just once).