[devel] RFC: girar: optimize rebuild

Vladimir D. Seleznev vseleznv на altlinux.org
Сб Апр 11 18:33:49 MSK 2020


On Sat, Apr 11, 2020 at 01:36:31PM +0300, Andrey Savchenko wrote:
> On Sat, 11 Apr 2020 02:10:42 +0300 Vladimir D. Seleznev wrote:
> > 
> > Hi!
> > 
> > The first part of rebuilt packages optimization for girar. It introduces
> > pkg_identity() and simple optimization of the rebuilt sourcerpm.
> > 
> > pkg_identity() takes RPM package and returns a value called package identity,
> > a hash of subset of RPM package header. That subset is the entire header
> > without some nonessential artifacts like buildhost, buildtime, header hashsum,
> > etc.
> > 
> > The two package builds of the same NEVR might have equal or different
> > package identities. The equal identities mean that build results of these
> > packages are equal too, that allows build optimization. The practical
> > example of simple rebuilt sourcerpm optimization also introduced.
> > 
> > The future work can be about optimization of "copied" to another branch
> > sourcerpm with retrieved from archive sourcerpm, and binary packages
> > optimization (this case has an issue when binary subpackages are mixed
> > archs, i.e. arch and noarch, this probably could work only with single-arch
> > builds).
> > 
> > Please review and discuss.
> 
> I see two problems with proposed approach:
> 
> 1) It assumes there will be not pkg_identity hash collisions. This
> is wrong. They may occur sooner or later and the code *must*
> correctly deal with such collisions. Remember what happened to
> subversion when collision occurred in a repository, while git was
> resilient.

Any hashsum function has collisions by definition. The only way to avoid
them is not to use hashsums.

> The way proposal is now the identity hash collision will lead to
> undergraded repository at best and broken at worst.

No, it will not, cause any issues that this collision might bring up
will be caught by later build checks.

> I see no easy way to fix this problem, but it must be either fixed
> or proposed optimization rejected.
> 
> 2) The hash function choise — sha256 ­— is very unfortunate: it has
> longer digest than sha1, but otherwise is vulnerable to the same
> attack; so right now it is still marginally secure, but it will not
> last long. Moreover sha256 is quite slow.

The good news: it is not about security.

> It is better to use newer generation of hash functions, e.g.
> blake2b based on the chacha stream cipher. It is more future proof
> and faster at the same time. You can just use the b2sum
> implementation from the GNU coreutils.

-- 
   WBR,
   Vladimir D. Seleznev


Подробная информация о списке рассылки Devel