[devel] [PATCH 2/2] gb: optimize rebuilt srpm if its identity is equal to identity of srpm in the repo

Alexey Tourbin alexey.tourbin на gmail.com
Сб Апр 11 14:29:55 MSK 2020


On Sat, Apr 11, 2020 at 2:11 AM Vladimir D. Seleznev
<vseleznv на altlinux.org> wrote:
> +osrpm_identity=
> +osrpm="$GB_REPO_DIR/files/SRPMS/$srpmsu"
> +if [ -f "$osrpm" ]; then
> +       echo >&2 "$I: Found $srpmsu in the repo, this means the package was rebuilt"
> +       osrpm_identity="$(pkg_identity "$osrpm")"
> +fi
> +
>  for arch in $GB_ARCH; do
>         [ -d "$arch/srpm" -o ! -s "$arch/excluded" ] || continue
>         f="$arch/srpm/$srpmsu"
>         [ -f "$f" ] || continue
> +       srpm_identity="$(pkg_identity "$f")"
> +       echo >&2 "$I: $arch $srpmsu identity = $srpm_identity"
> +       # non-empty $osrpm_identity means the NEVR was rebuilt
> +       # optimize rebuilt sourcerpm if identities of original and rebuilt sourcerpms are equal
> +       if [ -n "$osrpm_identity" ] &&
> +                  [ "$osrpm_identity" = "$srpm_identity" ]; then
> +               echo >&2 "$I: $arch: optimize rebuilt $srpmsu cause its identity is equal to $srpmsu in the repo"
> +               install -p "$osrpm" "$f"
> +       fi
>         built_pkgname="$(rpmquery --qf '%{name}' -p -- "$f")"
>         echo "$built_pkgname" > pkgname
>         break

So how does it work in practice? Suppose I first uploaded a .src.rpm
package. Do we store the original src.rpm, the one with the uploader's
signature? When it gets rebuilt, this should not affect the original
.src.rpm (as if it was uploaded again). No special handling is
required in this case.

Then suppose I build a gearifeid package from Sisyphus for p9. But
your code only handles GB_REPO_DIR, not the NEIGHBOUR_REPO_DIR the
package comes from. To be clear, that information is lost: when you
request to build a signed tag from /gears, it does not imply that
there is a corresponding .src.rpm in any REPO_DIR.

There is already a problem with cross-repo copying: if done in
earnest, both repos need to be locked. And of course this is
deadlock-prone. You can do better without any locking if you identify
every package in all repos with your new identity hash. This can be
done relatively easy, since you already have that big
content-addressable storage. You can hardlink it into a shadow
identity-addressable storage. Once you've done that, you obtain the
global / beatific vision: given a package, you instantly know if you
have already seen something like this. (On the second thought: you
don't need locking because the -f test is atomic and files cannot be
removed from the storage, but there will still be race conditions.
It's not too bad in practice. Further those race conditions can be
detected at the task-commit stage.)

There is one specific problem with the outlined approach: the notion
of identity is flawed, because the disttag may or may not matter.
Sometimes you cannot substitute a package for another package with the
same identity but a different disttag. Specifically this is the case
with strict dependencies between subpackages. You cannot substitute a
subpackage unless you also substitute all the other subpackages. This
is further complicated by noarch subpackages: you need to coordinate
substitution across architectures.


Подробная информация о списке рассылки Devel