[devel] RFC: girar: optimize rebuild
Vladimir D. Seleznev
vseleznv на altlinux.org
Сб Апр 11 18:33:49 MSK 2020
On Sat, Apr 11, 2020 at 01:36:31PM +0300, Andrey Savchenko wrote:
> On Sat, 11 Apr 2020 02:10:42 +0300 Vladimir D. Seleznev wrote:
> >
> > Hi!
> >
> > The first part of rebuilt packages optimization for girar. It introduces
> > pkg_identity() and simple optimization of the rebuilt sourcerpm.
> >
> > pkg_identity() takes RPM package and returns a value called package identity,
> > a hash of subset of RPM package header. That subset is the entire header
> > without some nonessential artifacts like buildhost, buildtime, header hashsum,
> > etc.
> >
> > The two package builds of the same NEVR might have equal or different
> > package identities. The equal identities mean that build results of these
> > packages are equal too, that allows build optimization. The practical
> > example of simple rebuilt sourcerpm optimization also introduced.
> >
> > The future work can be about optimization of "copied" to another branch
> > sourcerpm with retrieved from archive sourcerpm, and binary packages
> > optimization (this case has an issue when binary subpackages are mixed
> > archs, i.e. arch and noarch, this probably could work only with single-arch
> > builds).
> >
> > Please review and discuss.
>
> I see two problems with proposed approach:
>
> 1) It assumes there will be not pkg_identity hash collisions. This
> is wrong. They may occur sooner or later and the code *must*
> correctly deal with such collisions. Remember what happened to
> subversion when collision occurred in a repository, while git was
> resilient.
Any hashsum function has collisions by definition. The only way to avoid
them is not to use hashsums.
> The way proposal is now the identity hash collision will lead to
> undergraded repository at best and broken at worst.
No, it will not, cause any issues that this collision might bring up
will be caught by later build checks.
> I see no easy way to fix this problem, but it must be either fixed
> or proposed optimization rejected.
>
> 2) The hash function choise — sha256 — is very unfortunate: it has
> longer digest than sha1, but otherwise is vulnerable to the same
> attack; so right now it is still marginally secure, but it will not
> last long. Moreover sha256 is quite slow.
The good news: it is not about security.
> It is better to use newer generation of hash functions, e.g.
> blake2b based on the chacha stream cipher. It is more future proof
> and faster at the same time. You can just use the b2sum
> implementation from the GNU coreutils.
--
WBR,
Vladimir D. Seleznev
Подробная информация о списке рассылки Devel