Bitbake returning non-zero due to sstate errors


 

In my new CI setup I'm using an sstate mirror which seems to have some
occasional download issues. This results in the setscene task failing.
For example:

ERROR: qt3d-5.13.2+gitAUTOINC+93361f1a59-r0
do_package_write_ipk_setscene: Fetcher failure: Unable to find file
file://fd/sstate:qt3d:armv7at2hf-neon-linux-gnueabi:5.13.2+gitAUTOINC+93361f1a59:r0:armv7at2hf-neon:3:fda6c3edff0205b07ff176cf16771247117fa310bc65a6a1df6befc4230e0a74_package_write_ipk.tgz;downloadfilename=fd/sstate:qt3d:armv7at2hf-neon-linux-gnueabi:5.13.2+gitAUTOINC+93361f1a59:r0:armv7at2hf-neon:3:fda6c3edff0205b07ff176cf16771247117fa310bc65a6a1df6befc4230e0a74_package_write_ipk.tgz
anywhere. The paths that were searched were:
/builds/SanCloudLtd/sancloud-arago/build/sstate-cache
/builds/SanCloudLtd/sancloud-arago/build/sstate-cache
ERROR: qt3d-5.13.2+gitAUTOINC+93361f1a59-r0
do_package_write_ipk_setscene: No suitable staging package found
ERROR: Logfile of failure stored in:
/builds/SanCloudLtd/sancloud-arago/build/tmp/work/armv7at2hf-neon-linux-gnueabi/qt3d/5.13.2+gitAUTOINC+93361f1a59-r0/temp/log.do_package_write_ipk_setscene.10524
NOTE: recipe qt3d-5.13.2+gitAUTOINC+93361f1a59-r0: task
do_package_write_ipk_setscene: Failed
WARNING: Setscene task
(/builds/SanCloudLtd/sancloud-arago/sources/meta-qt5/recipes-qt/qt5/qt3d_git.bb:do_package_write_ipk_setscene)
failed with exit code '1' - real task will be run instead

As indicated in the final warning message there the real tasks run
since no sstate artifact is available. These tasks succeed:

NOTE: recipe qt3d-5.13.2+gitAUTOINC+93361f1a59-r0: task
do_package_write_ipk: Succeeded

The result is a successful build of the desired images. However, the
build is marked as a failure due to those sstate errors:

Summary: There were 11 ERROR messages shown, returning a non-zero exit code.

Is this the expected behaviour? The final images are built correctly.
I can't see any simple way to mask those setscene errors but I might
be missing something.

The full log can be seen at
https://gitlab.com/SanCloudLtd/sancloud-arago/-/jobs/443901140/raw.
I'm on the zeus branch here, I'll try to re-test on master later if I
can.

Thanks,
Paul


Richard Purdie
 

On Thu, 2020-02-20 at 11:26 +0000, Paul Barker wrote:
In my new CI setup I'm using an sstate mirror which seems to have
some
occasional download issues. This results in the setscene task
failing.
For example:

ERROR: qt3d-5.13.2+gitAUTOINC+93361f1a59-r0
do_package_write_ipk_setscene: Fetcher failure: Unable to find file
file://fd/sstate:qt3d:armv7at2hf-neon-linux-
gnueabi:5.13.2+gitAUTOINC+93361f1a59:r0:armv7at2hf-
neon:3:fda6c3edff0205b07ff176cf16771247117fa310bc65a6a1df6befc4230e0a
74_package_write_ipk.tgz;downloadfilename=fd/sstate:qt3d:armv7at2hf-
neon-linux-gnueabi:5.13.2+gitAUTOINC+93361f1a59:r0:armv7at2hf-
neon:3:fda6c3edff0205b07ff176cf16771247117fa310bc65a6a1df6befc4230e0a
74_package_write_ipk.tgz
anywhere. The paths that were searched were:
/builds/SanCloudLtd/sancloud-arago/build/sstate-cache
/builds/SanCloudLtd/sancloud-arago/build/sstate-cache
ERROR: qt3d-5.13.2+gitAUTOINC+93361f1a59-r0
do_package_write_ipk_setscene: No suitable staging package found
ERROR: Logfile of failure stored in:
/builds/SanCloudLtd/sancloud-arago/build/tmp/work/armv7at2hf-neon-
linux-gnueabi/qt3d/5.13.2+gitAUTOINC+93361f1a59-
r0/temp/log.do_package_write_ipk_setscene.10524
NOTE: recipe qt3d-5.13.2+gitAUTOINC+93361f1a59-r0: task
do_package_write_ipk_setscene: Failed
WARNING: Setscene task
(/builds/SanCloudLtd/sancloud-arago/sources/meta-qt5/recipes-
qt/qt5/qt3d_git.bb:do_package_write_ipk_setscene)
failed with exit code '1' - real task will be run instead

As indicated in the final warning message there the real tasks run
since no sstate artifact is available. These tasks succeed:

NOTE: recipe qt3d-5.13.2+gitAUTOINC+93361f1a59-r0: task
do_package_write_ipk: Succeeded

The result is a successful build of the desired images. However, the
build is marked as a failure due to those sstate errors:

Summary: There were 11 ERROR messages shown, returning a non-zero
exit code.

Is this the expected behaviour? The final images are built correctly.
I can't see any simple way to mask those setscene errors but I might
be missing something.

The full log can be seen at
https://gitlab.com/SanCloudLtd/sancloud-arago/-/jobs/443901140/raw.
I'm on the zeus branch here, I'll try to re-test on master later if I
can.
We've discussed this before and it can be argued either way.

Personally, I worry about why artefacts "disappear" and this is why its
an error, files should not be disappearing part way through a build.

From a bitbake perspective, a task really did fail and task failures
are errors. The fact it was able to recover is a bonus.

Perhaps it should be a warning now we have levels of warnings that are
meaningful. Previously we threw so many, this would have been one more
lost amongst many. I know many people don't like the behaviour.

Cheers,

Richard


 

On Thu, 20 Feb 2020 at 11:36, Richard Purdie
<richard.purdie@...> wrote:

On Thu, 2020-02-20 at 11:26 +0000, Paul Barker wrote:
In my new CI setup I'm using an sstate mirror which seems to have
some
occasional download issues. This results in the setscene task
failing.
For example:

ERROR: qt3d-5.13.2+gitAUTOINC+93361f1a59-r0
do_package_write_ipk_setscene: Fetcher failure: Unable to find file
file://fd/sstate:qt3d:armv7at2hf-neon-linux-
gnueabi:5.13.2+gitAUTOINC+93361f1a59:r0:armv7at2hf-
neon:3:fda6c3edff0205b07ff176cf16771247117fa310bc65a6a1df6befc4230e0a
74_package_write_ipk.tgz;downloadfilename=fd/sstate:qt3d:armv7at2hf-
neon-linux-gnueabi:5.13.2+gitAUTOINC+93361f1a59:r0:armv7at2hf-
neon:3:fda6c3edff0205b07ff176cf16771247117fa310bc65a6a1df6befc4230e0a
74_package_write_ipk.tgz
anywhere. The paths that were searched were:
/builds/SanCloudLtd/sancloud-arago/build/sstate-cache
/builds/SanCloudLtd/sancloud-arago/build/sstate-cache
ERROR: qt3d-5.13.2+gitAUTOINC+93361f1a59-r0
do_package_write_ipk_setscene: No suitable staging package found
ERROR: Logfile of failure stored in:
/builds/SanCloudLtd/sancloud-arago/build/tmp/work/armv7at2hf-neon-
linux-gnueabi/qt3d/5.13.2+gitAUTOINC+93361f1a59-
r0/temp/log.do_package_write_ipk_setscene.10524
NOTE: recipe qt3d-5.13.2+gitAUTOINC+93361f1a59-r0: task
do_package_write_ipk_setscene: Failed
WARNING: Setscene task
(/builds/SanCloudLtd/sancloud-arago/sources/meta-qt5/recipes-
qt/qt5/qt3d_git.bb:do_package_write_ipk_setscene)
failed with exit code '1' - real task will be run instead

As indicated in the final warning message there the real tasks run
since no sstate artifact is available. These tasks succeed:

NOTE: recipe qt3d-5.13.2+gitAUTOINC+93361f1a59-r0: task
do_package_write_ipk: Succeeded

The result is a successful build of the desired images. However, the
build is marked as a failure due to those sstate errors:

Summary: There were 11 ERROR messages shown, returning a non-zero
exit code.

Is this the expected behaviour? The final images are built correctly.
I can't see any simple way to mask those setscene errors but I might
be missing something.

The full log can be seen at
https://gitlab.com/SanCloudLtd/sancloud-arago/-/jobs/443901140/raw.
I'm on the zeus branch here, I'll try to re-test on master later if I
can.
We've discussed this before and it can be argued either way.

Personally, I worry about why artefacts "disappear" and this is why its
an error, files should not be disappearing part way through a build.

From a bitbake perspective, a task really did fail and task failures
are errors. The fact it was able to recover is a bonus.

Perhaps it should be a warning now we have levels of warnings that are
meaningful. Previously we threw so many, this would have been one more
lost amongst many. I know many people don't like the behaviour.
I'm now looking into this...

In sstate_checkhashes() we mark sstate as available if
fetcher.checkstatus() succeeds. Then at a later point
sstate_setscene() calls sstate_installpkg() calls pstaging_fetch()
calls fetcher.download() to actually get the sstate artifact. If the
artifact is removed from the mirror between these two accesses (due to
an sstate mirror clean up running in parallel to a build), or if there
is an intermittent download failure we could see checkstatus() succeed
then download() fail.

I don't think we should ignore all setscene errors but in the specific
case where it's the download step that fails I think that should be a
warning. Or it could be an error by default with a variable we can set
to turn it into a warning. Does that sound reasonable? If so I'll work
up a patch.

Thanks,
Paul


Richard Purdie
 

On Thu, 2020-02-20 at 11:59 +0000, Paul Barker wrote:
I'm now looking into this...

In sstate_checkhashes() we mark sstate as available if
fetcher.checkstatus() succeeds. Then at a later point
sstate_setscene() calls sstate_installpkg() calls pstaging_fetch()
calls fetcher.download() to actually get the sstate artifact. If the
artifact is removed from the mirror between these two accesses (due
to an sstate mirror clean up running in parallel to a build), or if
there is an intermittent download failure we could see checkstatus()
succeed then download() fail.

I don't think we should ignore all setscene errors but in the
specific case where it's the download step that fails I think that
should be a warning. Or it could be an error by default with a
variable we can set to turn it into a warning. Does that sound
reasonable? If so I'll work up a patch.
Thinking about the code, I'm not sure how you're generically going to
tell the difference between a setscene task that fails as the file
disappeared compared to a setscene failure with another real error? :/

We could make all failed setscene tasks warnings but I think that
buries actual real errors.

This is probably why I've not changed the code before now.

Special exit code values? :/

I'm open to proposals.

I know we could put in some configuration option but in general I hate
these as it just means more test matrix combinations and more ways for
people to see different behaviours. They have a time/place but I'm not
sure its here.

Cheers,

Richard


Martin Jansa
 

On Thu, Feb 20, 2020 at 11:26:54AM +0000, Paul Barker wrote:
In my new CI setup I'm using an sstate mirror which seems to have some
occasional download issues. This results in the setscene task failing.
For example:

ERROR: qt3d-5.13.2+gitAUTOINC+93361f1a59-r0
do_package_write_ipk_setscene: Fetcher failure: Unable to find file
file://fd/sstate:qt3d:armv7at2hf-neon-linux-gnueabi:5.13.2+gitAUTOINC+93361f1a59:r0:armv7at2hf-neon:3:fda6c3edff0205b07ff176cf16771247117fa310bc65a6a1df6befc4230e0a74_package_write_ipk.tgz;downloadfilename=fd/sstate:qt3d:armv7at2hf-neon-linux-gnueabi:5.13.2+gitAUTOINC+93361f1a59:r0:armv7at2hf-neon:3:fda6c3edff0205b07ff176cf16771247117fa310bc65a6a1df6befc4230e0a74_package_write_ipk.tgz
anywhere. The paths that were searched were:
/builds/SanCloudLtd/sancloud-arago/build/sstate-cache
/builds/SanCloudLtd/sancloud-arago/build/sstate-cache
ERROR: qt3d-5.13.2+gitAUTOINC+93361f1a59-r0
do_package_write_ipk_setscene: No suitable staging package found
ERROR: Logfile of failure stored in:
/builds/SanCloudLtd/sancloud-arago/build/tmp/work/armv7at2hf-neon-linux-gnueabi/qt3d/5.13.2+gitAUTOINC+93361f1a59-r0/temp/log.do_package_write_ipk_setscene.10524
NOTE: recipe qt3d-5.13.2+gitAUTOINC+93361f1a59-r0: task
do_package_write_ipk_setscene: Failed
WARNING: Setscene task
(/builds/SanCloudLtd/sancloud-arago/sources/meta-qt5/recipes-qt/qt5/qt3d_git.bb:do_package_write_ipk_setscene)
failed with exit code '1' - real task will be run instead

As indicated in the final warning message there the real tasks run
since no sstate artifact is available. These tasks succeed:

NOTE: recipe qt3d-5.13.2+gitAUTOINC+93361f1a59-r0: task
do_package_write_ipk: Succeeded

The result is a successful build of the desired images. However, the
build is marked as a failure due to those sstate errors:

Summary: There were 11 ERROR messages shown, returning a non-zero exit code.

Is this the expected behaviour? The final images are built correctly.
I can't see any simple way to mask those setscene errors but I might
be missing something.

The full log can be seen at
https://gitlab.com/SanCloudLtd/sancloud-arago/-/jobs/443901140/raw.
I'm on the zeus branch here, I'll try to re-test on master later if I
can.
See this previous discussion which includes patches for bitbake and
oe-core to change them to warnings:
https://marc.info/?l=openembedded-core&m=150403687120408&w=2

Because it was rejected I'm parsing the bitbake output to see if all
ERROR: messages were only about setscene tasks as mentioned in the same
thread much later:
https://marc.info/?l=openembedded-core&m=157504616302317&w=4


 

On Thu, 20 Feb 2020 at 12:04, Richard Purdie
<richard.purdie@...> wrote:

On Thu, 2020-02-20 at 11:59 +0000, Paul Barker wrote:
I'm now looking into this...

In sstate_checkhashes() we mark sstate as available if
fetcher.checkstatus() succeeds. Then at a later point
sstate_setscene() calls sstate_installpkg() calls pstaging_fetch()
calls fetcher.download() to actually get the sstate artifact. If the
artifact is removed from the mirror between these two accesses (due
to an sstate mirror clean up running in parallel to a build), or if
there is an intermittent download failure we could see checkstatus()
succeed then download() fail.

I don't think we should ignore all setscene errors but in the
specific case where it's the download step that fails I think that
should be a warning. Or it could be an error by default with a
variable we can set to turn it into a warning. Does that sound
reasonable? If so I'll work up a patch.
Thinking about the code, I'm not sure how you're generically going to
tell the difference between a setscene task that fails as the file
disappeared compared to a setscene failure with another real error? :/

We could make all failed setscene tasks warnings but I think that
buries actual real errors.

This is probably why I've not changed the code before now.

Special exit code values? :/

I'm open to proposals.

I know we could put in some configuration option but in general I hate
these as it just means more test matrix combinations and more ways for
people to see different behaviours. They have a time/place but I'm not
sure its here.
I agree - I really don't want to have to add additional complexity
here. But I do think we need to fix this in some way, others are
affected by this as can be seen from previous discussions. And in the
case of a public sstate mirror we can't control when users decide to
run builds, there will always be the chance of a user running a build
on an old commit while old sstate artifacts are cleaned or starting a
build just as the mirror is taken offline for some maintenance.

I think we might be able to make this work if we can avoid adding any
new conditional logic to the fetcher itself. I can see that almost
every call to logger.error() is followed by raising an error - perhaps
we could rework the code to include all the relevant info in the
raised error object and allow higher level code to catch the exception
and decide what to do with it. Because once logger.error() is called,
knotty counts an error and bitbake will exit non-zero even if the
error is safely handled. Once the fetcher simply raises exceptions in
the case of failed downloads we could handle this neatly in
sstate.bbclass. Would that be a viable way forward? Or would that
break the other fetcher use cases?

Thanks,
Paul


Joshua Watt
 



On Thu, Feb 20, 2020, 9:22 AM Paul Barker <pbarker@...> wrote:
On Thu, 20 Feb 2020 at 12:04, Richard Purdie
<richard.purdie@...> wrote:
>
> On Thu, 2020-02-20 at 11:59 +0000, Paul Barker wrote:
> > I'm now looking into this...
> >
> > In sstate_checkhashes() we mark sstate as available if
> > fetcher.checkstatus() succeeds. Then at a later point
> > sstate_setscene() calls sstate_installpkg() calls pstaging_fetch()
> > calls fetcher.download() to actually get the sstate artifact. If the
> > artifact is removed from the mirror between these two accesses (due
> > to an sstate mirror clean up running in parallel to a build), or if
> > there is an intermittent download failure we could see checkstatus()
> > succeed then download() fail.
> >
> > I don't think we should ignore all setscene errors but in the
> > specific case where it's the download step that fails I think that
> > should be a warning. Or it could be an error by default with a
> > variable we can set to turn it into a warning. Does that sound
> > reasonable? If so I'll work up a patch.
>
> Thinking about the code, I'm not sure how you're generically going to
> tell the difference between a setscene task that fails as the file
> disappeared compared to a setscene failure with another real error? :/
>
> We could make all failed setscene tasks warnings but I think that
> buries actual real errors.
>
> This is probably why I've not changed the code before now.
>
> Special exit code values? :/
>
> I'm open to proposals.
>
> I know we could put in some configuration option but in general I hate
> these as it just means more test matrix combinations and more ways for
> people to see different behaviours. They have a time/place but I'm not
> sure its here.

I agree - I really don't want to have to add additional complexity
here. But I do think we need to fix this in some way, others are
affected by this as can be seen from previous discussions. And in the
case of a public sstate mirror we can't control when users decide to
run builds, there will always be the chance of a user running a build
on an old commit while old sstate artifacts are cleaned or starting a
build just as the mirror is taken offline for some maintenance.

I think we might be able to make this work if we can avoid adding any
new conditional logic to the fetcher itself. I can see that almost
every call to logger.error() is followed by raising an error - perhaps
we could rework the code to include all the relevant info in the
raised error object and allow higher level code to catch the exception
and decide what to do with it. Because once logger.error() is called,
knotty counts an error and bitbake will exit non-zero even if the
error is safely handled. Once the fetcher simply raises exceptions in
the case of failed downloads we could handle this neatly in
sstate.bbclass. Would that be a viable way forward? Or would that
break the other fetcher use cases?

FWIW we also have this problem because our CI nodes all update the sstate cache via rsync after they finish, which causes races. This hasn't affected our developers, but I suspect that is only because they aren't doing builds at 1 AM.

The way we worked around it was to split up the build into two invocations of bitbake:

 bitbake --setscene-only <target> || true
 bitbake --skip-setscene <target>

Although this will likely not work very well with hash equivalence.


Thanks,
Paul