[devel] stopping a cascade of rebuilds

Пн Апр 27 08:38:33 MSK 2020

On Thu, Apr 23, 2020 at 10:21 PM Vladimir D. Seleznev
<vseleznv на altlinux.org> wrote:
> > So for src.rpm packages, it's a solved problem. For binary packages,
> > the identity should specifically exclude disttag. It will no longer
> > satisfy the definition of ID for rpm (substitution will break for
> > subpackages with strict dependencies). Therefore for binary packages,
> > we need to track <ID,disttag> tuples.
>
> Why should we track them? If we rebuild a package, we should check
> whether identity of its binary packages had changed. If it had not, we
> shouldn't replace its binary packages by rebuilt packages. That simple.

Because in the identity-addressable storage, we can have a few
packages with the same ID but different disttags, as in the example
below.  The fact that foo-data-ID1 hasn't changed doesn't mean you can
immediately grab foo-data-filehash1.

> > $ cat id2f/libfoo
> > <libfoo-ID1> <disttag1> <libfoo-filehash1>
> > <libfoo-ID2> <disttag2> <libfoo-filehash2>
> >
> > $ cat id2f/foo-data
> > <foo-data-ID1> <disttag1> <foo-data-filehash1>
> > <foo-data-ID1> <disttag2> <foo-data-filehash2>

> The things get more complicated in case of "copying" packages. In that
> case this schema could help. But we need also track buildtime. Just to
> prevent package replacing with earlier build. So, it is triple now.

Hmm, haven't thought of buildtime. We've got a package in the repo, a
freshly rebuilt package with the same EVR, and an identical package
(or a few identical packages) in the identity-addressable storage. We
need to prevent the outcome in which the buildtime of the package in
the repo goes down.  That's what you mean by "prevent package
replacing with earlier build", right?  (Otherwise replacing with an
earlier build is not a problem.)  Such an outcome is unlikely to occur
in practice, but not impossible.

Three parties are now involved.  You cannot simply ask, "can I
overwrite build/100 with packages from identity-addressable storage?"
The third piece of information is what you already have in the repo.

> > It may even make sense to group the mappings by src.rpm name instead
> > of package name. At first it seems less intuitive, but in return it
> > can give you a consistent view similar to MVCC snapshot.  Of course,
> > these files should be updated atomically, with rename(2). To check a
> > set subpackages, you first need to copy the file to a local dir. This
> > should rule out the case in which some subpackages have been added to
> > the file and some not.
>
> I don't get this idea, please expand it.

These are implementation details (that are not all that important
unless we agree on the data model).  How do you access the
identity-addressable storage concurrently?  The worst case is that
id2f/libfoo is written and read simultaneously, so you read a
half-written file.  To avoid this, files should be replaced
atomically:

$ mv local/libfoo id2f/.tmp$$.libfoo && mv id2f/.tmp$$.libfoo id2f/libfoo

You are now guaranteed that you open either old id2f/libfoo or new
id2f/libfoo, but not something in between. But this is not enough,
because there is a higher-order inconsistency: you don't want to
process new id2f/libfoo with old id2f/foo-data (when id2f/foo-data is
just being replaced).  To get a "consistent view" across subpackages,
you can combine id2f/libfoo and id2f/foo-data into a single file. To
process subpackages, you first need to make a local copy of that file.
Because with command-line tools, you'd want to open the file a few
times, so you can't do it directly on id2f/libfoo.

This is an alternative to the "single writer, multiple readers" model.
In this model, the reader first has to obtain a shared lock on id2f.
In what I describe, the reader can proceed without any locking.