[devel] RFC: girar: optimize rebuild

Andrey Savchenko bircoph на altlinux.org
Сб Апр 11 13:36:31 MSK 2020


On Sat, 11 Apr 2020 02:10:42 +0300 Vladimir D. Seleznev wrote:
> 
> Hi!
> 
> The first part of rebuilt packages optimization for girar. It introduces
> pkg_identity() and simple optimization of the rebuilt sourcerpm.
> 
> pkg_identity() takes RPM package and returns a value called package identity,
> a hash of subset of RPM package header. That subset is the entire header
> without some nonessential artifacts like buildhost, buildtime, header hashsum,
> etc.
> 
> The two package builds of the same NEVR might have equal or different
> package identities. The equal identities mean that build results of these
> packages are equal too, that allows build optimization. The practical
> example of simple rebuilt sourcerpm optimization also introduced.
> 
> The future work can be about optimization of "copied" to another branch
> sourcerpm with retrieved from archive sourcerpm, and binary packages
> optimization (this case has an issue when binary subpackages are mixed
> archs, i.e. arch and noarch, this probably could work only with single-arch
> builds).
> 
> Please review and discuss.

I see two problems with proposed approach:

1) It assumes there will be not pkg_identity hash collisions. This
is wrong. They may occur sooner or later and the code *must*
correctly deal with such collisions. Remember what happened to
subversion when collision occurred in a repository, while git was
resilient.

The way proposal is now the identity hash collision will lead to
undergraded repository at best and broken at worst.

I see no easy way to fix this problem, but it must be either fixed
or proposed optimization rejected.

2) The hash function choise — sha256 ­— is very unfortunate: it has
longer digest than sha1, but otherwise is vulnerable to the same
attack; so right now it is still marginally secure, but it will not
last long. Moreover sha256 is quite slow.

It is better to use newer generation of hash functions, e.g.
blake2b based on the chacha stream cipher. It is more future proof
and faster at the same time. You can just use the b2sum
implementation from the GNU coreutils.

Best regards,
Andrew Savchenko
----------- следующая часть -----------
Было удалено вложение не в текстовом формате...
Имя     : отсутствует
Тип     : application/pgp-signature
Размер  : 833 байтов
Описание: отсутствует
Url     : <http://lists.altlinux.org/pipermail/devel/attachments/20200411/24771306/attachment-0001.bin>


Подробная информация о списке рассылки Devel