[Libguestfs] [PATCH nbdkit v3 2/3] Add new retry filter.

Eric Blake eblake at redhat.com
Thu Sep 19 16:32:47 UTC 2019


On 9/19/19 10:26 AM, Richard W.M. Jones wrote:
> This filter can be used to transparently reopen/retry when a plugin
> fails.  The connection is closed and reopened which for most plugins
> causes them to attempt to reconnect to their source.
> 
> For example if doing a long or slow SSH copy:
> 
>   nbdkit -U - ssh host=remote /var/tmp/test.iso \
>     --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'
> 
> if the SSH connection or network goes down in the middle then the
> whole operation will fail.
> 
> By adding the retry filter:
> 
>   nbdkit -U - ssh --filter=retry host=remote /var/tmp/test.iso \
>     --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'
> 
> this operation can recover from temporary failures in at least some
> circumstances.  The NBD connection (a local Unix domain socket in the
> example above) is not interrupted during retries, so NBD clients don't
> need to be taught how to retry - everything is handled internally by
> nbdkit.
> ---

Pretty cool how fast we've gone from idea to mostly-working code.
nbdkit is awesome for fast prototyping.

> +++ b/filters/retry/nbdkit-retry-filter.pod
> @@ -0,0 +1,103 @@
> +=head1 NAME
> +
> +nbdkit-retry-filter - reopen connection on error
> +
> +=head1 SYNOPSIS
> +
> + nbdkit --filter=retry PLUGIN

Worth adding [retry-parameter]... here?

> +
> +=head1 DESCRIPTION
> +
> +C<nbdkit-retry-filter> is a filter that transparently reopens the
> +plugin connection when an error is encountered.  It can be used to
> +make long-running copy operations reliable in the presence of
> +temporary network failures, without requiring any changes to the
> +plugin.
> +
> +A number of parameters are available to control:
> +
> +=over 4
> +
> +=item *
> +
> +How many times we retry.
> +
> +=item *
> +
> +The delay between retries, and whether we wait longer each time (known
> +as "exponential back-off").
> +
> +=item *
> +
> +If we reopen the plugin in read-only mode after the first failure.
> +
> +=back
> +
> +The default (no parameters) is designed to offer a happy medium
> +between recovering from short temporary failures but not doing
> +anything too bad when permanent or unrecoverable failures happen: We
> +retry 5 times with exponential back-off, waiting in total about 1
> +minute before we give up.
> +
> +=head1 EXAMPLE
> +
> +In this example we copy and convert a large file using
> +L<nbdkit-ssh-plugin(1)>, L<qemu-img(1)> and L<nbdkit-captive(1)>.
> +
> + nbdkit -U - ssh --filter=retry host=remote /var/tmp/test.iso \
> +   --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'
> +
> +Without I<--filter=retry> a temporary failure would cause the copy to
> +fail (for example, the remote host’s firewall is restarted causing the
> +SSH connection to be closed).  Adding this filter means that it may be
> +possible to transparently recover.
> +
> +=head1 PARAMETERS

EXAMPLE before PARAMETERS is unusual.  Not necessarily wrong, though.

> +
> +=over 4
> +
> +=item B<retries=>N
> +
> +The number of times any single operation will be retried before we
> +give up and fail the operation.  The default is 5.
> +
> +=item B<retry-delay=>N
> +
> +The number of seconds to wait before retrying.  The default is 2
> +seconds.

Worth allowing sub-second times?

> +
> +=item B<retry-exponential=yes>
> +
> +Use exponential back-off.  The retry delay is doubled between each
> +retry.  This is the default.
> +
> +=item B<retry-exponential=no>
> +
> +Do not use exponential back-off.  The retry delay is the same between
> +each retry.
> +
> +=item B<retry-readonly=yes>
> +
> +As soon as a failure occurs and we retry, switch the underlying plugin
> +permanently to read-only mode.
> +
> +=item B<retry-readonly=no>
> +
> +Do not change the read-write/read-only mode of the plugin when
> +retrying.  This is the default.

Worth an intermediate mode that only forces readonly if a write
operation fails, but tries to preserve read/write if a read operation fails?


> +
> +static int retries = 5;         /* 0 = filter is disabled */
> +static int initial_delay = 2;

Would need a different unit here if you allow sub-second retry.

> +static bool exponential_backoff = true;
> +static bool force_readonly = false;

Initializing a static variable to 0 is sometimes worth avoiding (as it
can force .data instead of .bss for a slightly larger binary), but here
it makes sense.

> +
> +static int
> +retry_config (nbdkit_next_config *next, void *nxdata,
> +              const char *key, const char *value)
> +{
> +  int r;
> +
> +  if (strcmp (key, "retries") == 0) {
> +    if (sscanf (value, "%d", &retries) != 1 || retries < 0) {
> +      nbdkit_error ("cannot parse retries: %s", value);
> +      return -1;
> +    }
> +  }
> +  else if (strcmp (key, "retry-delay") == 0) {
> +    if (sscanf (value, "%d", &initial_delay) != 1 || initial_delay <= 0) {

sscanf("%d") cannot detect overflow; should this use strtol with errno
checking instead?


> +#define retry_config_help \
> +  "retries=<N>              Number of retries (default: 5).\n" \
> +  "retry-delay=<N>          Second to wait before retry (default: 2).\n" \

Seconds

> +  "retry-exponential=yes|no Exponential back-off (default: yes).\n" \
> +  "retry-readonly=yes|no    Force read-only on failure (default: no).\n"
> +
> +struct retry_handle {
> +  int readonly;                 /* Save original readonly setting. */
> +};
> +
> +static void *
> +retry_open (nbdkit_next_open *next, void *nxdata, int readonly)


Out of review time today, I'll get back to the rest of this file later...



-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190919/c7db7d5a/attachment.sig>


More information about the Libguestfs mailing list