[Pulp-dev] Asyncio I/O Downloaders

Thu Aug 24 23:16:39 UTC 2017

Thank you for writing these points so clearly. I was working from the other
requirements you wrote [2], but it mostly does the ones below. I'll try to
give answers inline.

[2]: https://pulp.plan.io/issues/2951#note-17

On Thu, Aug 24, 2017 at 4:58 PM, Jeff Ortel <jortel at redhat.com> wrote:

>
>
> On 08/24/2017 09:51 AM, Brian Bouterse wrote:
> > The next step in considering the asyncio downloaders is done and ready
> to be looked at. The PR made [0]
> > replaces the existing downloaders and is "merge-ready". The best way to
> read about them is through their docs
> > [1]. Here are some of its highlights:
> >
> > # Code size and docs
> > * huge code savings by switching to asyncio. removes 2427 lines and adds
> 558 lines.
>
> Replacing concurrent.futures with asyncio would remove a fair amount of
> code in batch.py but the remainder of
> lines of code removed discarded features.  Each can be correlated to use
> cases [2].
>
> - Custom error recovery.
>     Supported recoverable error recovery.
>
If an exception is raised in a co-routine it propagates up to allow for
custom handling. Meanwhile you can re-enter the coroutine to allow all
other non-errored downloading can occur.

    Supported unavailable mirrors.

Was mirroring in the previous code? I didn't see it. This can be
accomplished by subclassing a HttpDownloader and implementing 1 method. I
can show an example if that helps. Mirroring is content specific so I think
we can't offer this in the plugin API anyway.

>
> - Event notifications.
>
I can see how the threaded design needs one, but with asyncio you don't
need one. All of the use cases are done as needed, easily either in core or
by plugin writers. The main goal with this design is that its highly
extensible to add custom behaviors either in core or a plugin with little
extra work. All Python developers know how to subclass things to customize
behaviors, but not all of them know how to work with a custom event system.
This is one of the main reasons to standardize on a common technology.

>     Supported streamer forwarding of HTTP headers.
>     Supported ChangeSet knowing when an artifact was actually downloaded.
>     Supported calculating digests and size needed ONLY for creating
> Artifacts.
>     Supported (future) progress reporting when desired.
>     Supported (future) mirror lists.
>
> - Custom Writers - handling of downloaded bits.
>     Supported writing files to disk.
>
^ is the default behavior

>     Supported the streamer streaming downloaded bit stream in the Twisted
> response instead of writing
>     the file to disk.
>
We didn't make any changes to the streamer code so this is unrelated. When
we do go to change it though, we can probably update the streamer to work
with this in a day. Swapping out the write-to-disk behavior with a stream
via twisted behavior is a single method to override in a subclass.

>
> - File URLs.
>     supported: "file:///myfile"
>
^ This can be easily added. Is that something I should do? I think it would
be another 100 lines or so because asyncio does most of the work.

>
> - Custom validations without having to subclass the downloader.
>     Included: digest and size validations but plugin writer could
> implement additional validations.
>
It validates size and digests currently. Adding custom validations is easy.
Doing it without subclassing is not something the plugin writer really
cares about, I think. We didn't talk about that requirement.

> - Standardized settings for SSL and Auth.
>     Supported consistent settings across downloader types.
>
^ it does

    Supported validation of settings.
>
Validation of settings happens in the serializer. In terms of the download
API, if the settings are invalid they will be rejected and errors logged so
I think everything will work as expected without this requirement.

> - Shared Contexts - collaborative sharing of resources between downloaders.
>     Supported shared: sessions (connection pools), auth tokens and
> resolved mirror lists.
>
It has a shared context. The mirror lists don't exist in either code base
but they can easily be added here.

>     Supported the sharing of resources without requiring the user of the
> downloader to know
>     which resources are shared or how they are shared.  Resource sharing
> needs to happen in the
>     streamer too.
>
I don't understand this new requirement can you elaborate?

>
> - Support for HEAD requests (HttpDownload).
>     Just fetches the HTTP headers.
>
We can do this even thought this is a new requirement. Why is this a
requirement?

>
> - Standard exceptions.
>     Supported catching DownloadError, NotFound and NotAuthorized.
>     Encapsulated client lib specific exceptions and error behaviors.
>
^ What are the desired behaviors here exactly? I think launching with fewer
behaviors is better for an alpha. These requirements are not written.

> - Downloader Attachments - attach any user defined object to the
> downloader.
>     Supported easy correlation between a downloader and an artifact being
> downloaded.
>
^ is supported using the 'url' of the DownloadReport or the 'id' of the
GroupDownloader

>
> > * ^ is surprising considering the majority of lines added are docs
> > * Read the compiled docs here [1]
> >
> > # Features
> > * It has 100% feature parity with the existing http/https downloaders.
>
> See gaps above.
>
> Are authenticated proxies supported?
>
I think it is supported from this ticket.
https://github.com/aio-libs/aiohttp/issues/845#issuecomment-278727807 Even
if it doesn't though, it will at some point. aiohttp is a very active
project that works closely with the requests community. Pulp is at an alpha
and I don't think proxy ssl verification is anything we've talked about as
a hard requirement for the 3.0 release.

>
> Ftp can be added, but purposely left
> > out for now. The requirements being fulfilled are written here [2].
> > * It has more features than the existing downloaders including
> additional ssl config, proxy_auth, additional
> > connection options, and more.
>
> Can you enumerate them?
>
* You can add authentication credentials when authenticating with a proxy.
* You can configure security settings such as what SSL TLS versions are
allowed now.
* You can have finer control over flow rate timeouts and other connection
options
* you can provide fingerprint verification of ssl certificates
* you can use non-system dns easily through a simple configuration. This is
another security feature
* enable/disable DNS caching
There are even more actually ^

>
> >
> > # Easier for plugin writers
> > * An easier download customization experience by having to subclass
> fewer objects than the existing
> > downloaders.
>
> Can you be specific?
>
This design has so many fewer objects that the plugin writer probably only
has to subclass one thing, HttpDownloader. It's less for them to think
about and have to customize.

> > This design has one downloader for both synchronous or asynchronous
> downloading
>
> But they need to execute the downloader in an Asynio loop for both
> synchronous and asynchronous, correct?
>
Correct. They only have one thing to use and one way to do it. Also since
it's 3 lines of code even a single download is easily handled.

> > * Makes some functionality contained only in the changeset available in
> the downloaders directly. Specifically
> > the GroupDownloader was functionality you could only get via changeset
> usage. Changesets could easily be
> > retooled to use this instead which would be great.
> > * Each object can be used independently or in conjunction with others
> >
> > # Demo
> > * works with the HEAD of master of pulp_example [3] (not pulp_file)
> which shows that it actually runs
> >
> > [0]: https://github.com/pulp/pulp/pull/3129
> > [1]: http://file.rdu.redhat.com/~bbouters/plugins/plugin-api/
> download.html
> > [2]: https://pulp.plan.io/issues/2951#note-17
> > [3]: https://github.com/dkliban/pulp_example/
> >
> > -Brian
> >
> >
> > _______________________________________________
> > Pulp-dev mailing list
> > Pulp-dev at redhat.com
> > https://www.redhat.com/mailman/listinfo/pulp-dev
> >
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170824/ed8ecf9f/attachment.htm>