McSinyx

Outro

Steamed fish was amazing, matter of fact
Let me get some jerk chicken to go
Grabbed me one of them lemon pie theories
And let me get some of them benchmarks you theories too

  1. The Look
  2. The Benchmark
    1. Average Distribution
    2. Large Distribution
    3. Distribution with Conflicting Dependencies
  3. What Now?

The Look

At the time of writing, implementation-wise parallel download is ready:

asciicast

Does this mean I've finished everything just-in-time? This sounds to good to be true! And how does it perform? Welp...

The Benchmark

Here comes the bad news: under a decent connection to the package index, using fast-deps does not make pip faster. For best comparison, I will time pip download on the following cases:

Average Distribution

For convenience purposes, let's refer to the commands to be used as follows

$ pip --no-cache-dir download {requirement}  # legacy-resolver
$ pip --use-feature=2020-resolver \
   --no-cache-dir download {requirement}  # 2020-resolver
$ pip --use-feature=2020-resolver --use-feature=fast-deps \
   --no-cache-dir download {requirement}  # fast-deps

In the first test, I used axuy and obtained the following results

legacy-resolver2020-resolverfast-deps
7.709s7.888s10.993s
7.068s7.127s11.103s
8.556s6.972s10.496s

Funny enough, running pip download with fast-deps in a directory with downloaded files already took around 7-8 seconds. This is because to lazily download a wheel, pip has to make many requests which are apparently more expensive than actual data transmission on my network.

When is it useful then?

With unstable connection to PyPI (for some reason I am not confident enough to state), this is what I got

2020-resolverfast-deps
1m16.134s0m54.894s
1m0.384s0m40.753s
0m50.102s0m41.988s

As the connection was unstable and that the majority of pip networking is performed as CI/CD with large and stable bandwidth, I am unsure what this result is supposed to tell (-;

Large Distribution

In this test, I used TensorFlow as the requirement and obtained the following figures:

legacy-resolver2020-resolverfast-deps
0m52.135s0m58.809s1m5.649s
0m50.641s1m14.896s1m28.168s
0m49.691s1m5.633s1m22.131s

Distribution with Conflicting Dependencies

Some requirement that will trigger a decent amount of backtracking by the current implementation of the new resolver oslo-utils==1.4.0:

2020-resolverfast-deps
14.497s24.010s
17.680s28.884s
16.541s26.333s

What Now?

I don't know, to be honest. At this point I'm feeling I've failed my own (and that of other stakeholders of pip) expectation and wasted the time and effort of pip's maintainers reviewing dozens of PRs I've made in the last three months.

On the bright side, this has been an opportunity for me to explore the codebase of package manager and discovered various edge cases where the new resolver has yet to cover (e.g. I've just noticed that pip download would save to-be-discarded distributions, I'll file an issue on that soon). Plus I got to know many new and cool people and idea, which make me a more helpful individual to work on Python packaging in the future, I hope.

Tags: gsoc pip python Nguyễn Gia Phong, 2020-08-31

Comments

Follow the anchor in an author's name to reply. Please read the rules before commenting.