uv: An Extremely Fast Python Package Manager

00:40:33
https://www.youtube.com/watch?v=gSKTfG1GXYQ

概要

TLDRCharlie Marsh, founder of Astral, presents UV, a high-performance Python package manager that aims to unify and streamline Python tooling. Building on previous success with Rough, a Python linter and formatter, Marsh highlighted UV's ability to handle Python environments effectively, resolve packages rapidly, and maintain performance through efficient cache design and zero-copy serialization. UV stands out due to its comprehensive scope and speed. Unlike fragmented tools like pip or poetry, UV offers an all-in-one solution similar to Rust's Cargo, focusing on rapid installation and management of Python environments without relying on Python's interpreter. This approach enhances user experiences by allowing fast and ephemeral creation of virtual environments. The talk detailed the inner workings of UV, explaining hard problems solved during its development, like dependency resolution challenges without multiversion support, and creating universal lock files applicable across platforms. Efficiently handling Python's complex dependency syntax and performance challenges in dependency management were core themes. Additionally, Rust's role in UV was emphasized, providing low-level control and efficient memory usage, contributing to UV’s speed. Techniques like reading metadata efficiently from zip files and designing a global cache for expedited file linking were discussed as major optimization strategies.

収穫

  • 💻 UV is a unified, fast Python package manager.
  • 🚀 Built with Rust for high performance.
  • 🔄 Handles Python environments and dependencies efficiently.
  • 📦 Universal lock files for cross-platform use.
  • ⚡ Optimized for speed and user experience.
  • 🔧 Designed to replace fragmented Python tools.
  • 🛠️ Solves hard problems in dependency resolution.
  • 📁 Utilizes efficient cache and IO operations.
  • 🚀 Changes traditional Python workflows with speed.
  • 📈 Rapidly adopted in the Python community.

タイムライン

  • 00:00:00 - 00:05:00

    Charlie Marsh, founder of Astral, discusses his company's high-performance Python developer tools, Rough and UV, which have achieved millions of monthly downloads. Rough is a linter and code formatter, while UV is a fast all-in-one package manager aimed at unifying the Python ecosystem akin to Rust’s tooling.

  • 00:05:00 - 00:10:00

    UV aims to provide a streamlined package management experience, replacing tools like pip and Poetry. Marsh emphasizes that UV’s speed transforms user interactions, allowing for faster creation and destruction of virtual environments without concern for their complexity, marking a shift in workflow and user expectations.

  • 00:10:00 - 00:15:00

    Marsh details the operations of UV, including resolving user requirements and generating a lock file for package management. He mentions the complexities in achieving a universal lock file that functions seamlessly across different systems, highlighting Python’s lack of multiversion support as a challenge they overcome with a SAT solver.

  • 00:15:00 - 00:20:00

    The presentation addresses the challenges in solving dependency graphs, especially Python's single-version limitation and complex dependency markers. UV employs a conflict-driven SAT solver approach, negotiating Boolean satisfiability problems equivalent to NP-hard challenges.

  • 00:20:00 - 00:25:00

    Further complexities arise with Python conditional dependencies, requiring UV's resolver to construct universal lock files that accommodate varied Python environments. This involves complex logical operations and marker algebra to ensure cross-platform compatibility in package resolutions.

  • 00:25:00 - 00:30:00

    Marsh introduces the architecture behind UV’s fast performance, focusing on Rust programming for efficient resource utilization. Rust's strengths contribute to UV's performance, though Marsh also attributes the speed to strategic IO operations and data caching techniques that minimize redundant processing.

  • 00:30:00 - 00:35:00

    In demonstrating IO optimization, Marsh describes UV's use of range requests to efficiently access package metadata, avoiding full downloads of large files. He explains UV’s cache infrastructure, optimizing speed and storage by hard linking files and using zero-copy techniques for metadata processing.

  • 00:35:00 - 00:40:33

    The talk concludes by reinforcing UV's impact on Python development, summarizing how its innovations in cache design and version handling offer not just speed but a new approach to managing Python environments effectively.

もっと見る

マインドマップ

ビデオQ&A

  • What is UV?

    UV is a fast, all-in-one Python package manager that simplifies dependency management and environment handling, similar to Cargo for Rust.

  • What makes UV different from other Python tools?

    UV stands out due to its speed, comprehensive scope, and ability to unify fragmented Python tooling into a single, efficient tool.

  • Why was Rust used to build UV?

    Rust provides efficient control over memory allocation and performance, which are crucial for building fast and reliable software like UV.

  • How does UV improve Python developer workflows?

    UV allows for rapid installation and management of Python environments and packages, changing workflows by making tasks that were previously slow, like environment recreation, much faster.

  • Can UV be used with existing Python projects?

    Yes, UV can be used as a drop-in alternative to tools like pip, handling package management with greater speed and efficiency.

ビデオをもっと見る

AIを活用したYouTubeの無料動画要約に即アクセス!
字幕
en
オートスクロール:
  • 00:00:05
    hi
  • 00:00:10
    everyone Charlie Marsh is the founder of
  • 00:00:13
    Astral a company building high
  • 00:00:15
    performance developer tools for the
  • 00:00:16
    python ecosystem over the past two years
  • 00:00:19
    he's released Ru a python lintern Auto
  • 00:00:21
    formatter and UV a python package and
  • 00:00:24
    project manager both projects have grown
  • 00:00:26
    to tens of millions of downloads per
  • 00:00:28
    month and have seen rapid adoption
  • 00:00:30
    across both open source and
  • 00:00:34
    Enterprise okay everyone can hear me
  • 00:00:36
    okay great excellent um wow very nice
  • 00:00:38
    intro uh I now what am I gonna do with
  • 00:00:41
    my own intro slides um yeah my name is
  • 00:00:44
    Charlie I'm the founder of um yeah so I
  • 00:00:47
    uh me and my team we spend our time
  • 00:00:48
    trying to build really fast python
  • 00:00:50
    tooling and rust um we're primarily
  • 00:00:52
    known for two tools right the first of
  • 00:00:55
    which is rough uh which is a linter
  • 00:00:57
    formatter and code transformation tool
  • 00:01:00
    so you can use it to format your code
  • 00:01:02
    but also to identify issues like unused
  • 00:01:04
    Imports and fix them
  • 00:01:05
    automatically um and the second which is
  • 00:01:08
    going to be the focus of what I'm
  • 00:01:09
    talking about today is UV which is uh
  • 00:01:11
    our python package manager um uh I gave
  • 00:01:15
    a talk at Pyon about rough and it sort
  • 00:01:17
    of covered like what rough is a little
  • 00:01:20
    bit of how it works and then what makes
  • 00:01:21
    it fast um this is going to be uh
  • 00:01:24
    similar but focused on UV so I want to
  • 00:01:26
    talk through what UV is I'll try to keep
  • 00:01:29
    that short because maybe the least
  • 00:01:30
    interesting part uh why we built it um
  • 00:01:33
    some of the hard problems that went into
  • 00:01:35
    building it and then uh end with a
  • 00:01:37
    couple examples or sort of case studies
  • 00:01:39
    of things we did that uh make it really
  • 00:01:42
    fast because that's the that's the thing
  • 00:01:43
    that we tend to get the most questions
  • 00:01:45
    about is why is it
  • 00:01:47
    fast um so UV is what I would call a
  • 00:01:50
    fast all-in-one python package manager
  • 00:01:52
    so uh you can use UV to install python
  • 00:01:55
    itself create virtual environments
  • 00:01:57
    resolve dependencies install packages of
  • 00:01:59
    of course it is package manager uh you
  • 00:02:01
    can use it to build your own python
  • 00:02:03
    packages that you would then like upload
  • 00:02:04
    and
  • 00:02:05
    redistribute so UV is if you're familiar
  • 00:02:08
    with the python ecosystem you could
  • 00:02:09
    think it as a drop in alternative to
  • 00:02:11
    tools like pip pipex Pym virtual M
  • 00:02:15
    poetry uh and it's not a replacement for
  • 00:02:17
    any one of these it's really intended to
  • 00:02:18
    be a replacement for all of them uh so
  • 00:02:21
    we model what we're trying to do with UV
  • 00:02:24
    after cargo so if you've worked in the
  • 00:02:26
    rust ecosystem the tooling is very
  • 00:02:28
    streamlined and very Unified
  • 00:02:30
    um and the way I would describe it is I
  • 00:02:31
    feel like rust tooling oh
  • 00:02:38
    no
  • 00:02:40
    okay hold on okay the way I think of it
  • 00:02:44
    is that uh rust rust tooling is very
  • 00:02:48
    high confidence like when I clone a rust
  • 00:02:50
    project I'm very confident that I can
  • 00:02:52
    run it it will run successfully and I
  • 00:02:54
    know how to run it and we're trying to
  • 00:02:55
    get to a similar experience with UV for
  • 00:02:57
    python tooling so UV is this single
  • 00:03:00
    static binary that ideally gives you
  • 00:03:01
    everything you need to be productive
  • 00:03:03
    with python you install UV and then
  • 00:03:04
    everything is sort of taken care of for
  • 00:03:06
    you so UV does a lot of stuff um and it
  • 00:03:10
    does it all while being just way way
  • 00:03:12
    faster than a lot of the other tools in
  • 00:03:13
    the ecosystem um and when we started I I
  • 00:03:17
    knew when we started working on a
  • 00:03:18
    package manager and this is was some of
  • 00:03:21
    the reaction there would be a little bit
  • 00:03:22
    of this because in the python ecosystem
  • 00:03:24
    there's just a lot of different tools
  • 00:03:25
    for packaging um and so when you come
  • 00:03:27
    out and say hey we built we finally bu
  • 00:03:30
    the a Python package manager um you know
  • 00:03:32
    there's a lot of well this cartoon um
  • 00:03:36
    and we see we still see a little bit of
  • 00:03:37
    this actually which is kind of funny to
  • 00:03:38
    me because Tool's pretty popular um but
  • 00:03:42
    uh I think UV is actually pretty
  • 00:03:45
    different from a lot of the other things
  • 00:03:46
    that exist in the ecosystem in the
  • 00:03:47
    previous attempts to build this tool for
  • 00:03:50
    python primarily for two reasons one is
  • 00:03:53
    just like the scope of what we're trying
  • 00:03:55
    to do um so I mentioned this before or
  • 00:03:58
    hinted at at least like python tooling
  • 00:04:00
    is very fragmented and I think of that
  • 00:04:02
    in like two ways one is for anything you
  • 00:04:04
    want to do there's like a bunch of
  • 00:04:06
    different options and it's very hard to
  • 00:04:07
    choose and then know which one you
  • 00:04:09
    should use and the second dimension is
  • 00:04:11
    like for anything that you're trying to
  • 00:04:12
    do that's non-trivial you have to like
  • 00:04:14
    chain together a bunch of tools um and
  • 00:04:17
    for UV it's meant to be like a totally
  • 00:04:20
    unified stack so we didn't like build on
  • 00:04:22
    top of anything else right we didn't
  • 00:04:23
    like build on top of pip or inherit any
  • 00:04:25
    of the baggage that comes from existing
  • 00:04:27
    python tooling we build like everything
  • 00:04:29
    from scratch
  • 00:04:30
    and that creates a really powerful model
  • 00:04:32
    both in terms of like the user
  • 00:04:33
    experience we can deliver uh but also um
  • 00:04:38
    uh how how good it can be I guess
  • 00:04:40
    internally uh the second reason I think
  • 00:04:42
    that it's a little bit different is just
  • 00:04:43
    the performance right like UV as I
  • 00:04:45
    mentioned it's very very fast um and the
  • 00:04:48
    thing that I've seen in my time working
  • 00:04:50
    on this stuff is that when things are
  • 00:04:52
    like way way
  • 00:04:54
    faster they really change like the
  • 00:04:56
    user's relationship to the tool and even
  • 00:04:58
    like the way that they work with it so
  • 00:04:59
    like we saw this in rough
  • 00:05:01
    where jobs that people like used to only
  • 00:05:04
    be able to run in CI they could now make
  • 00:05:06
    like pre-commit hooks because like the
  • 00:05:07
    speed differences were just so different
  • 00:05:09
    and so suddenly this thing that you like
  • 00:05:10
    Dread running is something that you can
  • 00:05:11
    just like do locally and get in pass um
  • 00:05:15
    and with UV another analogy would be
  • 00:05:16
    like virtual environments like in Python
  • 00:05:18
    often like you have a virtual
  • 00:05:19
    environment on your machine it's in like
  • 00:05:21
    a certain State and you really don't
  • 00:05:23
    want to mess it up because you won't be
  • 00:05:24
    able to recreate it and like get it into
  • 00:05:26
    that place or it's really expensive to
  • 00:05:27
    recreate because it has all these
  • 00:05:28
    packages installed with UV like
  • 00:05:30
    destroying and creating a virtual
  • 00:05:31
    environment is extremely fast like we
  • 00:05:33
    try to view them as totally ephemeral
  • 00:05:35
    like you can just destroy them and
  • 00:05:36
    recreate them because it's so cheap so
  • 00:05:38
    we're actually trying to change like
  • 00:05:40
    it's not just about you know there's a
  • 00:05:42
    lot of obvious nice things that come
  • 00:05:43
    with being much faster but I also think
  • 00:05:45
    that it can just change a lot of the
  • 00:05:47
    workflows around how people uh work with
  • 00:05:50
    python um so over yeah over we released
  • 00:05:55
    UV in like
  • 00:05:56
    mid-February um and since then yeah it's
  • 00:05:58
    just grown a lot lot so it's at like 16
  • 00:06:00
    million downloads a month um it's now
  • 00:06:04
    more than 10% of the requests to pii
  • 00:06:07
    come from people running UV which is
  • 00:06:09
    kind of crazy like I I would have been
  • 00:06:11
    happy maybe with like
  • 00:06:13
    1% um just because the sheer volume of
  • 00:06:16
    what people are doing with python um is
  • 00:06:18
    is is is pretty wild so um that's been a
  • 00:06:21
    cool thing to see um yeah we have a lot
  • 00:06:23
    of stars your contributors blah blah
  • 00:06:25
    blah but uh you know the point is we
  • 00:06:27
    built this thing I consider it fairly
  • 00:06:29
    well battl it's been used across the
  • 00:06:30
    industry so hopefully the stuff I say
  • 00:06:32
    has some credibility behind
  • 00:06:34
    it um all right so the main job right or
  • 00:06:40
    the main thing you do with a package
  • 00:06:41
    manager is you install packages so I
  • 00:06:43
    just want to talk through the life cycle
  • 00:06:44
    of what happens when you run a command
  • 00:06:47
    to install packages with UV um and UV
  • 00:06:50
    has two primary interfaces that you can
  • 00:06:53
    um uh use to to to engage with it the
  • 00:06:57
    first is that we have like a pip
  • 00:06:58
    compatible interface so if you've run
  • 00:07:00
    like pip install you can just run like
  • 00:07:02
    UV pip install and we Implement a lot of
  • 00:07:04
    the same commands that pip would which
  • 00:07:06
    is great for people who want to adopt it
  • 00:07:08
    without changing their workflow although
  • 00:07:10
    we would like their workflow to change
  • 00:07:11
    obviously um and the second is we have
  • 00:07:14
    these higher level commands like UV sync
  • 00:07:16
    and UV lock that um you know you sort of
  • 00:07:18
    declare your dependencies and then we
  • 00:07:20
    just take care of everything and make
  • 00:07:21
    sure the environment is in the right
  • 00:07:22
    State whenever you want to do anything
  • 00:07:25
    um but they both operate uh under a
  • 00:07:28
    fairly similar life cycle right so the
  • 00:07:30
    first thing we have to do is we have to
  • 00:07:32
    like find the user's python interpreter
  • 00:07:34
    UV does not depend on python um but we
  • 00:07:37
    do like need a python interpreter in
  • 00:07:39
    order to do a lot of useful things um so
  • 00:07:42
    like if you want to create a virtual
  • 00:07:43
    environment for example a virtual
  • 00:07:45
    environment has to sim Link in a python
  • 00:07:47
    interpreter so like we can't create a we
  • 00:07:49
    can create a virtual environment but we
  • 00:07:50
    wouldn't have a python to put in it um
  • 00:07:52
    and we need to know things like what
  • 00:07:54
    version of python are you running like
  • 00:07:56
    what platform are you on uh all that
  • 00:07:58
    kind of stuff um so this is actually
  • 00:08:00
    pretty hard but really not very
  • 00:08:02
    interesting um
  • 00:08:04
    so um next thing we need to do WR is we
  • 00:08:06
    need to discover the actual user
  • 00:08:08
    requirements this is the user telling us
  • 00:08:09
    like the state they want to be in at the
  • 00:08:11
    end of the command and that could be you
  • 00:08:13
    know they gave it to us directly maybe
  • 00:08:14
    we read it from requirements txt file
  • 00:08:17
    something similar given those
  • 00:08:19
    requirements we resolve them into you
  • 00:08:22
    know this is the core job of a package
  • 00:08:23
    manager you give us some requirements
  • 00:08:25
    and we try and figure out uh a set of
  • 00:08:27
    versions that satisfy those requirements
  • 00:08:29
    so you know the user might say uh I want
  • 00:08:32
    pantic and um that's not really enough
  • 00:08:35
    information on its own right for us to
  • 00:08:37
    like uh like what does it mean for the
  • 00:08:39
    user to want pantic so the first thing
  • 00:08:40
    we have to do is we have to resolve that
  • 00:08:43
    into a set of versions such that
  • 00:08:45
    everyone's dependencies are satisfied
  • 00:08:47
    and all the versions are compatible and
  • 00:08:48
    ideally it's like the latest version of
  • 00:08:50
    pantic too because that's what the user
  • 00:08:51
    asked for
  • 00:08:54
    um so even this isn't like quite enough
  • 00:08:57
    information for us to really do anything
  • 00:08:59
    because this just just describes the
  • 00:09:00
    packages and the versions but it doesn't
  • 00:09:02
    really tell us anything about like where
  • 00:09:03
    to get them um so ultimately this is not
  • 00:09:06
    what we're trying to produce we're
  • 00:09:07
    trying to produce something that looks
  • 00:09:08
    more like this um so UV ultimately will
  • 00:09:11
    create a lock file um and that lock file
  • 00:09:13
    represents a resolution and in that lock
  • 00:09:16
    file we have information like this is
  • 00:09:17
    one entry from a lock file so we have
  • 00:09:20
    like the package name the version but we
  • 00:09:21
    also have information about where it
  • 00:09:23
    came from also the packages it depends
  • 00:09:25
    on we have like a sha we have file size
  • 00:09:27
    etc etc so ultimately like when we
  • 00:09:30
    resolve we're trying to create something
  • 00:09:32
    like this um and I'll talk more about
  • 00:09:33
    like how this is structured in a bit
  • 00:09:36
    once we have that graph we come up with
  • 00:09:39
    term I made up like an install plan uh
  • 00:09:41
    the idea is like we know the state that
  • 00:09:43
    the user wants to get to which is like
  • 00:09:44
    represented by the lock file we have to
  • 00:09:46
    look at the current state of the user's
  • 00:09:48
    machine like maybe they have like an old
  • 00:09:50
    version of pantic installed so we need
  • 00:09:52
    to like uninstall it and then install
  • 00:09:54
    the newer version of pantic um so most
  • 00:09:58
    of the well not but most of the
  • 00:10:00
    interesting work happens in here uh in
  • 00:10:02
    the actual resolver and uh this was also
  • 00:10:06
    I think the hardest part of building UV
  • 00:10:10
    so I want to talk about a couple of the
  • 00:10:12
    hard problems that are maybe like
  • 00:10:13
    nonobvious
  • 00:10:15
    um especially if you don't spend a lot
  • 00:10:17
    of time like thinking about python
  • 00:10:20
    packaging which I hope like most of you
  • 00:10:22
    don't
  • 00:10:24
    um so okay so the first thing that makes
  • 00:10:28
    this problem quite hard is that python
  • 00:10:31
    has no multiversion support so you
  • 00:10:34
    cannot have two versions of the same
  • 00:10:36
    package installed at the same time um
  • 00:10:39
    this might sound like a very obvious so
  • 00:10:40
    you can't have like pantic version one
  • 00:10:42
    and pantic version two installed at the
  • 00:10:44
    same time this might sound obvious but
  • 00:10:46
    actually like a lot of languages do
  • 00:10:48
    support this so like rust and node will
  • 00:10:50
    let you do this without any issues um in
  • 00:10:52
    Python it's it's basically a limitation
  • 00:10:54
    of the runtime um there's like Imports
  • 00:10:57
    are like a global cache key button
  • 00:10:59
    module name so like you can't have
  • 00:11:00
    multiple modules with the same name um
  • 00:11:03
    so as like a concrete example let's say
  • 00:11:06
    the root is like our project and we
  • 00:11:08
    depend on like a specific version of
  • 00:11:11
    VM and we also depend on a specific
  • 00:11:14
    version of Lang
  • 00:11:16
    chain and VM depends on pantic 2 but
  • 00:11:20
    like this old old version of Lang chain
  • 00:11:22
    does not work with pantic 2 it requires
  • 00:11:23
    ptic version one so like this is not a
  • 00:11:26
    solvable graph in Python you cannot like
  • 00:11:28
    you cannot satisfy these dependencies
  • 00:11:30
    and if you try to give those to UV
  • 00:11:32
    you'll get you know this pretty error
  • 00:11:34
    message that tells you you this doesn't
  • 00:11:37
    work because you have these two
  • 00:11:38
    dependencies and they have an
  • 00:11:39
    incompatible ptic
  • 00:11:42
    requirement um so you know instead
  • 00:11:45
    imagine that the user says like I'll
  • 00:11:46
    accept any version of VM but I still
  • 00:11:48
    need this like old version of
  • 00:11:50
    pantic in that case what we need to do
  • 00:11:52
    right is we need to backtrack we need to
  • 00:11:54
    test out all the versions of VM and try
  • 00:11:56
    to find a version of VM that does work
  • 00:11:59
    So eventually we go and find the
  • 00:12:01
    previous version was like VM 0.6.1 I
  • 00:12:04
    think so we tried out a bunch of
  • 00:12:05
    versions eventually we find you know a
  • 00:12:07
    set of compatible
  • 00:12:09
    requirements and when we like when we do
  • 00:12:11
    the solve right it's not it's typically
  • 00:12:13
    not just like these four packages like
  • 00:12:15
    this is just a snapshot of like
  • 00:12:17
    ultimately the resolved graph from those
  • 00:12:18
    set of dependencies right like it's
  • 00:12:20
    typically a sprawling thing with lots of
  • 00:12:22
    different requirements and there's lots
  • 00:12:23
    of different ways to satisfy it and you
  • 00:12:26
    know ultimately I mean this like the
  • 00:12:28
    shape of this might look familiar to
  • 00:12:30
    some of you we're trying to do version
  • 00:12:32
    solving so like given we have a universe
  • 00:12:34
    of package versions they have
  • 00:12:36
    constraints like some things depend on
  • 00:12:38
    different versions of
  • 00:12:39
    pantic we need to find the set such that
  • 00:12:42
    like all the dependencies are satisfied
  • 00:12:44
    we can only have one version of every
  • 00:12:46
    package um and also we don't want to
  • 00:12:48
    have like extraneous packages um and
  • 00:12:51
    this yeah this is a Boolean
  • 00:12:52
    satisfiability problem um it is NP hard
  • 00:12:55
    so uh you know it's I like think that is
  • 00:12:59
    quite hard
  • 00:13:02
    um and uh I'm not going to go into the
  • 00:13:04
    details of like exactly what our solver
  • 00:13:06
    looks like but if you maybe if you think
  • 00:13:09
    back to school um you know we use a SAT
  • 00:13:11
    solver it's based on cdcl which is like
  • 00:13:13
    conflict driven Clause learning it's
  • 00:13:14
    basically just a fancy thing to Tres to
  • 00:13:15
    solve those graphs in as efficient a way
  • 00:13:17
    as it can by exploiting her istics and
  • 00:13:20
    things that it can learn but it you know
  • 00:13:21
    it can be exponential like there's no
  • 00:13:23
    guarantee that it's actually going to
  • 00:13:24
    solve it in a reasonable amount of
  • 00:13:26
    time um so because we don't have multi
  • 00:13:29
    verion support we have to do this sat
  • 00:13:30
    solve um if we had multiversion support
  • 00:13:33
    by the way like we wouldn't necessarily
  • 00:13:34
    have to do that like Russ like cargo's
  • 00:13:36
    solver like it's not a SAT solver like
  • 00:13:38
    it does like a graph reversal but if you
  • 00:13:41
    get to a hard place where like things
  • 00:13:42
    are not quite working like you can kind
  • 00:13:44
    of just bail out and say like let's add
  • 00:13:46
    two versions of this package so that
  • 00:13:47
    like Escape valve exists but it does not
  • 00:13:49
    exist in Python and this is also true of
  • 00:13:51
    other
  • 00:13:52
    languages okay second thing um and I've
  • 00:13:56
    never tried to explain this this new
  • 00:13:59
    material by the way and this in
  • 00:14:00
    particular I've never tried to explain
  • 00:14:01
    to a group of people so I might you know
  • 00:14:03
    let me know afterwards if it makes any
  • 00:14:05
    sense um but um this was like
  • 00:14:09
    surprisingly or parts of this were
  • 00:14:12
    surprising but um this is maybe like the
  • 00:14:14
    hardest part of building this resolver
  • 00:14:16
    which is python has this like very rich
  • 00:14:18
    Syntax for declaring requirements that
  • 00:14:22
    should only be installed on certain
  • 00:14:23
    python versions or only on certain
  • 00:14:24
    platforms etc etc um so like just as an
  • 00:14:28
    example
  • 00:14:29
    um these are the dependencies of a real
  • 00:14:31
    package I can't remember what maybe
  • 00:14:32
    flask um and you see the last one has uh
  • 00:14:36
    the import Li package should only be
  • 00:14:38
    installed if the user's python version
  • 00:14:40
    is 3.10 or
  • 00:14:41
    earlier and if we look at some of the
  • 00:14:43
    transitive dependencies here um like
  • 00:14:46
    click itself depends on colorama but
  • 00:14:49
    only on Windows and it also depends on
  • 00:14:52
    import lid but only if the python
  • 00:14:53
    version is less than 3.8 um so when you
  • 00:14:57
    see this like set of requirements
  • 00:14:59
    there's kind of two ways to think about
  • 00:15:00
    solving the graph like one is solving
  • 00:15:03
    the graph for like a specific user at a
  • 00:15:05
    specific point in time that's on a
  • 00:15:06
    specific computer right so maybe a user
  • 00:15:09
    comes up and they're using Windows on
  • 00:15:10
    python 3.12 so like some things here are
  • 00:15:12
    relevant and some things aren't um
  • 00:15:15
    that's actually pretty easy because
  • 00:15:16
    you're basically just filtering things
  • 00:15:17
    out while you solve it's not a huge
  • 00:15:18
    problem um but we want to solve a
  • 00:15:21
    slightly different problem which is we
  • 00:15:23
    want to generate a lock file that like
  • 00:15:26
    any user on any machine can then use to
  • 00:15:28
    get a repr will install um and what that
  • 00:15:33
    means is like you know if a user is on
  • 00:15:35
    Windows and a user on Mac they may not
  • 00:15:37
    get the exact same set of packages like
  • 00:15:39
    the user on Windows would get colorama
  • 00:15:41
    the user on Mac would not but like all
  • 00:15:43
    the users on Windows on the same python
  • 00:15:45
    version should get the same set of
  • 00:15:46
    packages and ideally like the
  • 00:15:48
    differences between those users are as
  • 00:15:50
    small as possible that's not actually
  • 00:15:52
    something we guarantee but you know the
  • 00:15:53
    gist of it is you want to be able to
  • 00:15:55
    take the lock file and like any user on
  • 00:15:57
    any machine should be able to take it
  • 00:15:58
    and install like we don't just want a
  • 00:16:00
    lock file for Windows 3.12 we want what
  • 00:16:02
    we would call like a universal lock
  • 00:16:05
    file um and that problem is like a lot
  • 00:16:07
    harder um so again like the core of our
  • 00:16:11
    solver is the SAT
  • 00:16:12
    solver and then there are kind of like
  • 00:16:15
    two pieces that go into trying to build
  • 00:16:16
    this Universal
  • 00:16:18
    resolution so one is that at a high
  • 00:16:22
    level we kind of try to find a solution
  • 00:16:24
    that works on all platforms like
  • 00:16:26
    effectively we assume that all of those
  • 00:16:27
    markers are true and try to see if we
  • 00:16:30
    can find a solution so like the marker
  • 00:16:32
    that said colorama like only on Windows
  • 00:16:33
    we would basically say let's just assume
  • 00:16:35
    that's true and see if we can find a
  • 00:16:36
    solution and then afterwards we'll like
  • 00:16:38
    filter out the packages that are only
  • 00:16:40
    for Windows so that's that's good but
  • 00:16:42
    the problem is right you can have
  • 00:16:43
    conflict these conflicting dependencies
  • 00:16:45
    so like this is totally valid like you
  • 00:16:47
    could have a user say it has to be
  • 00:16:49
    pantic to on Windows but it cannot be
  • 00:16:51
    pantic to on all other platforms and
  • 00:16:53
    again we're trying to find a solution
  • 00:16:55
    such that a user shows up and they're on
  • 00:16:57
    Windows they install they ptic version
  • 00:16:59
    two a window show a user shows up and
  • 00:17:00
    they're on Mac they get ptic version
  • 00:17:02
    less than
  • 00:17:03
    two um okay so the way that we solve
  • 00:17:06
    this is uh we and again we've made up
  • 00:17:10
    all this terminology because I didn't I
  • 00:17:12
    don't really know if there was good
  • 00:17:13
    terminology for it that existed um but
  • 00:17:15
    what we do is we basically try and fork
  • 00:17:17
    and solve the two graphs separately so
  • 00:17:21
    you know in this case on the left we
  • 00:17:23
    would try to solve like pantic greater
  • 00:17:25
    than two uh assume we're on Windows
  • 00:17:27
    basically and do and just solve the rest
  • 00:17:29
    of the
  • 00:17:30
    graph on the right we would do the same
  • 00:17:32
    thing the graph ends up being like a lot
  • 00:17:34
    simpler um but we would solve these two
  • 00:17:36
    graphs like effectively
  • 00:17:38
    independently and then we merge the
  • 00:17:40
    results back together so this is like
  • 00:17:44
    the merged resolution of taking those
  • 00:17:46
    two um and oh wow great oh it's all okay
  • 00:17:51
    I messed up the transitions on this
  • 00:17:53
    slide but I think we'll be okay um okay
  • 00:17:56
    this is what this is supposed to look
  • 00:17:57
    like so basically right the thing on the
  • 00:17:59
    bottom right is like the merged
  • 00:18:00
    resolution and on the two sides we have
  • 00:18:02
    like the platform specific resolutions
  • 00:18:04
    so annotated types needs to be included
  • 00:18:07
    but only on Windows because it was only
  • 00:18:08
    present in the windows resolution um I
  • 00:18:11
    think this is going to do it on like all
  • 00:18:13
    of these okay uh piden is included twice
  • 00:18:16
    right but once for Windows and once for
  • 00:18:18
    non- Windows and those markers are
  • 00:18:20
    disjoint like there's no overlap on them
  • 00:18:22
    so everyone will get one version of
  • 00:18:23
    pantic but it will be like one of these
  • 00:18:26
    two um and then
  • 00:18:29
    importantly there's also a package
  • 00:18:31
    that's included in both resolutions um
  • 00:18:34
    so typing extensions is included both on
  • 00:18:36
    Windows and on not Windows and sorry I
  • 00:18:39
    know this is like super annoying um so
  • 00:18:41
    if you look at like the way that that
  • 00:18:43
    marker gets uh
  • 00:18:45
    constructed we have typing extensions
  • 00:18:47
    and we're saying we want to include it
  • 00:18:48
    on like Windows or but also include it
  • 00:18:51
    on not Windows right and that marker
  • 00:18:53
    like these come from these two different
  • 00:18:55
    places and that marker is always true
  • 00:18:58
    right so we can actually just ignore it
  • 00:19:00
    completely that's why it doesn't it's
  • 00:19:01
    not present in the final resolution um
  • 00:19:05
    so not only do we have to like solve
  • 00:19:07
    these graphs in this way but we end up
  • 00:19:08
    doing a lot of different uh there's this
  • 00:19:12
    whole marker algebra that we have to
  • 00:19:13
    consider like the ores and the ANS that
  • 00:19:15
    you're seeing there we have to do a lot
  • 00:19:16
    of operations like here evaluating that
  • 00:19:19
    that marker always figuring out that
  • 00:19:21
    that marker always evaluates the true
  • 00:19:22
    and that we can just emit it like you
  • 00:19:24
    know I mean it's easy in that case but
  • 00:19:27
    like we'll see you know some hard cas
  • 00:19:29
    similarly we have to able to test for
  • 00:19:31
    disjointness with these um I mentioned
  • 00:19:33
    that like we had the two pantic
  • 00:19:35
    requirements and they were disjoint that
  • 00:19:37
    just means that like they can never both
  • 00:19:38
    be true effectively um so you know for
  • 00:19:42
    example maybe we're like doing the solve
  • 00:19:45
    on the left side of the previous slide
  • 00:19:46
    like we're solving for Windows and then
  • 00:19:48
    we like see a dependency that has this
  • 00:19:50
    marker on it we want to know like is
  • 00:19:53
    this dependency relevant like we know
  • 00:19:55
    we're solving for Windows so like should
  • 00:19:57
    we even care about this dependency
  • 00:19:58
    because it it's only applicable on these
  • 00:20:00
    platforms right and that question is
  • 00:20:02
    basically can they both be true are they
  • 00:20:05
    disjoint this is also Boolean
  • 00:20:07
    satisfiability problem um and the
  • 00:20:09
    markers can be like pretty complicated
  • 00:20:10
    right this is also MP hard like totally
  • 00:20:12
    separate MP hard problem which is we
  • 00:20:15
    have to be able to test uh I'll go back
  • 00:20:18
    on we have to be able to test whether
  • 00:20:19
    like these two these two uh you know
  • 00:20:21
    Boolean expressions are destroying um
  • 00:20:24
    and we're doing this like all the time
  • 00:20:26
    um now most markers are
  • 00:20:29
    pretty simple which is great um but like
  • 00:20:32
    these are just some examples of these
  • 00:20:34
    are like the fully simplified markers
  • 00:20:36
    for like a real example from our test
  • 00:20:38
    case this is resolving uh if any of you
  • 00:20:40
    use like Transformers like the hugging
  • 00:20:42
    face project
  • 00:20:43
    um one of our test cases is we take that
  • 00:20:46
    project and we enable all of the
  • 00:20:48
    optional dependencies which like no one
  • 00:20:49
    should do but it creates like a very
  • 00:20:51
    very large graph and so it's one of our
  • 00:20:53
    harder test cases and like these are the
  • 00:20:54
    fully simplified markers um and uh you
  • 00:20:58
    can see like some of them are pretty
  • 00:21:00
    large um and by the way before we did
  • 00:21:04
    this before we did this marker
  • 00:21:05
    simplification of trying to get to like
  • 00:21:07
    these simplifies normalized
  • 00:21:10
    forms we we would do that resolution and
  • 00:21:12
    like each dependency would have like
  • 00:21:14
    tens of kilobytes of markers like the
  • 00:21:16
    marker Expressions were huge um and that
  • 00:21:18
    was even we even had like some very
  • 00:21:20
    basic charistics like you know just kind
  • 00:21:22
    of like simple stuff for trying to
  • 00:21:25
    normalize and filter them out ultimately
  • 00:21:28
    um someone on the team wrote this like
  • 00:21:31
    marker normalizer based on Tech a
  • 00:21:33
    technique called algebraic decision
  • 00:21:34
    diagrams um it's like a totally separate
  • 00:21:37
    solver that we had to build to try and
  • 00:21:39
    normalize those markers and ask
  • 00:21:41
    questions like are they disjoint um so
  • 00:21:43
    these were both like very very hard
  • 00:21:45
    problems um a third that I'll just
  • 00:21:47
    mention briefly is that and it's not I'm
  • 00:21:51
    not actually like going to talk about
  • 00:21:52
    this one that much but I do like to
  • 00:21:54
    complain about it a little bit there's
  • 00:21:56
    so in the python ecosystem there there's
  • 00:21:58
    really like no guarantee that you have
  • 00:22:01
    static metadata for a package or like a
  • 00:22:03
    dependency you want to resolve um and by
  • 00:22:05
    that I mean like if we're trying to
  • 00:22:07
    resolve pantic version two um we're
  • 00:22:09
    going to go to the registry and we're
  • 00:22:10
    going to say like what's the metadata
  • 00:22:12
    for pantic version two and it's actually
  • 00:22:14
    not guaranteed that they will be able to
  • 00:22:16
    give us an answer um ultimately what
  • 00:22:19
    might happen is we might have to run
  • 00:22:20
    some sort of arbitrary python code in
  • 00:22:23
    order to get the dependencies um like
  • 00:22:27
    basically if they publish a a built
  • 00:22:28
    distribution it'll have dependencies but
  • 00:22:31
    if they only publish the source for the
  • 00:22:32
    package we might have to effectively
  • 00:22:35
    like pull that down and run if you've
  • 00:22:37
    ever seen like a setup.py file before we
  • 00:22:39
    have to like run a setup.py file we
  • 00:22:40
    might have to build the whole package
  • 00:22:41
    even just to get the dependencies um so
  • 00:22:44
    we do a lot of things to try and avoid
  • 00:22:46
    doing that um while still being correct
  • 00:22:49
    uh and I'll talk about some of those in
  • 00:22:50
    a bit um but kind of like just
  • 00:22:54
    concluding on this section like
  • 00:22:55
    ultimately what we're trying to build
  • 00:22:56
    here um we model it as a graph right the
  • 00:23:00
    nodes are packages at specific versions
  • 00:23:03
    and the edges are weighted by markers um
  • 00:23:05
    so you know in this case we have like
  • 00:23:07
    the two versions of pantic but one Edge
  • 00:23:09
    is weighted by only being on Windows the
  • 00:23:12
    other is like never being on Windows um
  • 00:23:14
    and you can see they like share some
  • 00:23:15
    common nodes etc etc and like the nice
  • 00:23:18
    thing about this representation is when
  • 00:23:19
    a user comes along and wants to install
  • 00:23:21
    on Linux or whatever we just are
  • 00:23:23
    traversing we're just doing a graph
  • 00:23:24
    traversal and saying like which edges
  • 00:23:27
    are relevant and which are not
  • 00:23:29
    um so there are some tools in the python
  • 00:23:30
    ecosystem that try to do this but they
  • 00:23:33
    then have to do like a separate sat
  • 00:23:35
    solve at install time so they have like
  • 00:23:38
    a set of packages and then when you
  • 00:23:39
    install they actually have to like run a
  • 00:23:40
    SAT solver to figure out the right
  • 00:23:42
    versions We have the nice property that
  • 00:23:44
    like it's just there there's no like
  • 00:23:46
    second resolution when you install we're
  • 00:23:47
    just like traversing this graph and
  • 00:23:49
    figuring out the things to include so
  • 00:23:50
    this is like the ultimate goal like all
  • 00:23:52
    that work goes into trying to produce
  • 00:23:53
    this thing um okay so I talked a lot
  • 00:23:58
    about
  • 00:23:59
    um some of the hard problems we had to
  • 00:24:01
    solve to build this now I want to talk a
  • 00:24:04
    little bit about things we did to make
  • 00:24:06
    it fast um and uh you know the first
  • 00:24:09
    thing that that comes to everyone's mind
  • 00:24:11
    and also that I just talk about a lot is
  • 00:24:13
    rust like rust is a big part of uh of UV
  • 00:24:18
    and of how we've made it so fast um like
  • 00:24:20
    UV is written in Rust uh like I said we
  • 00:24:23
    don't have a dependency on python aough
  • 00:24:24
    you do need to have python installed um
  • 00:24:28
    but the the observation for me this
  • 00:24:31
    slide is slightly sort of just opinions
  • 00:24:33
    um UV has gotten like faster and faster
  • 00:24:35
    over time like right like despite being
  • 00:24:36
    written in Rust the whole time so it's
  • 00:24:38
    not just about being written in Rust
  • 00:24:40
    like from my perspective I think rust
  • 00:24:42
    gives you a really fast
  • 00:24:44
    Baseline um and then it gives you a lot
  • 00:24:47
    of tools that you need if you want to
  • 00:24:49
    write really really fast programs and
  • 00:24:51
    like other programming languages can
  • 00:24:52
    expose these too but for example it's
  • 00:24:55
    pretty hard to like care deeply about
  • 00:24:57
    memory application if you're writing
  • 00:24:59
    python um like you just don't really
  • 00:25:01
    have a lot of control over what's
  • 00:25:03
    happening whereas in Rust you're
  • 00:25:05
    actually like forced to care about a lot
  • 00:25:07
    of those things right which some people
  • 00:25:10
    will complain
  • 00:25:11
    about but it's one of it is ultimately
  • 00:25:13
    one of the strengths of the language um
  • 00:25:16
    so like rust is part of it and I'm going
  • 00:25:18
    to talk about some things some parts of
  • 00:25:19
    rust that we use but UV is also like
  • 00:25:22
    most package managers like a lot of what
  • 00:25:24
    we do is IO and rust is like only so
  • 00:25:26
    helpful with IO there's like a lot of
  • 00:25:28
    other things we need to do so it's not
  • 00:25:30
    rust is a big part of UV but it's not
  • 00:25:31
    all about rust I do want to start though
  • 00:25:34
    with an example that I think illustrates
  • 00:25:36
    like why rust is uh helpful and
  • 00:25:38
    important and and you can do this in
  • 00:25:40
    other languages too but uh we do it in
  • 00:25:42
    Rust so that's what I'm going to talk
  • 00:25:43
    about um okay version
  • 00:25:47
    parsing okay so like in Python like
  • 00:25:50
    every package has a version right um and
  • 00:25:53
    you could have a very simple version
  • 00:25:54
    like 1.0.0 um but they can get very
  • 00:25:56
    complicated so you can have like uh
  • 00:25:58
    pre-releases that can be like Alpha Beta
  • 00:26:00
    or RC which is a release candidate and
  • 00:26:02
    the pre-release can have a number you
  • 00:26:03
    can have like beta 1 beta 2 Beta 3 um
  • 00:26:06
    you can also have post releases so like
  • 00:26:09
    if you need to update this would
  • 00:26:10
    typically be like if you need to update
  • 00:26:12
    the documentation but not the source
  • 00:26:14
    code you might do like a Post Release so
  • 00:26:15
    like the contents are the same but you
  • 00:26:17
    had to release it again for some
  • 00:26:18
    reason um okay there's also there's also
  • 00:26:22
    this piece called like the local version
  • 00:26:25
    identifier um which if you've ever
  • 00:26:27
    worked with py
  • 00:26:29
    you will probably be familiar with this
  • 00:26:31
    um this is intended for I'll probably
  • 00:26:34
    get this wrong but this in the spec at
  • 00:26:36
    least it's intended for like you're
  • 00:26:38
    building a package locally and you want
  • 00:26:40
    to be able to tag it in some way like on
  • 00:26:42
    your local
  • 00:26:43
    machine um pytorch has now used this due
  • 00:26:48
    to other limitations in the packaging
  • 00:26:49
    ecosystem to Mark uh packages as being
  • 00:26:53
    compatible with certain accelerators so
  • 00:26:55
    like you might have to build a lot of
  • 00:26:56
    different versions of pytorch that ort
  • 00:26:58
    different versions of Cuda and they now
  • 00:27:00
    use this part of the identifier just
  • 00:27:02
    because it was sort of an open like a
  • 00:27:04
    free space um uh to indicate that
  • 00:27:07
    because there's no other support in the
  • 00:27:08
    standards for like marking packages as
  • 00:27:11
    compatible with an accelerator it's sort
  • 00:27:12
    of a hole in the standards so anyway
  • 00:27:13
    this is become very popular now in the
  • 00:27:15
    python ecosystem um okay this one I
  • 00:27:19
    actually like forgot this one existed
  • 00:27:20
    and then I was doing the slides you can
  • 00:27:22
    do this like this like Epoch thing I
  • 00:27:24
    actually don't really remember what this
  • 00:27:25
    is for but you can put like a number in
  • 00:27:27
    the next exclamation mark um and that's
  • 00:27:30
    a valid that's a valid python version
  • 00:27:32
    and of course you can like you can
  • 00:27:34
    actually like compose these things
  • 00:27:35
    together like you can have like you can
  • 00:27:37
    have like a pre a post-release of a
  • 00:27:39
    pre-release I'm pretty sure you can have
  • 00:27:41
    like a local version of a pre-release
  • 00:27:42
    etc etc so like representing these is
  • 00:27:47
    pretty hard like it's a very rich syntax
  • 00:27:50
    so like the full representation of this
  • 00:27:53
    is something like this right you have we
  • 00:27:55
    have like multiple vectors um which
  • 00:27:57
    means we're going to be allocating
  • 00:27:59
    memory um because the release segments
  • 00:28:01
    there can be more than three there can
  • 00:28:02
    be like as many as you want actually
  • 00:28:04
    like it can be
  • 00:28:05
    1.1.1 one. one like that's fine um you
  • 00:28:08
    can have multiple of those local
  • 00:28:10
    segments like you can have like plus
  • 00:28:11
    something plus something blah blah blah
  • 00:28:13
    so like this is like this is pretty
  • 00:28:15
    heavy and we are dealing with these
  • 00:28:17
    things like all over the place so
  • 00:28:20
    someone on our team um uh he goes by
  • 00:28:24
    Burnt Sushi online so I should credit
  • 00:28:26
    him because he figured this out uh he
  • 00:28:28
    noticed that we can represent like over
  • 00:28:31
    90% of versions with like a single
  • 00:28:34
    u64 um which is great because one it's
  • 00:28:38
    like fully stack allocated um and
  • 00:28:41
    there's a second property that's really
  • 00:28:42
    nice about it that I'll get to in a
  • 00:28:43
    second but this is actually what we use
  • 00:28:45
    internally so like internally we have a
  • 00:28:46
    version and then we have an enum and we
  • 00:28:48
    try to represent most versions as like
  • 00:28:50
    the small this version small um and then
  • 00:28:53
    we represent things with the full
  • 00:28:55
    version if they don't fit into that
  • 00:28:56
    scheme so think
  • 00:28:59
    um if so it's a u64 so we have like
  • 00:29:02
    eight byes to work with Okay eight bytes
  • 00:29:05
    of space um so the first two or the the
  • 00:29:09
    the first or last two bytes however you
  • 00:29:11
    want to think about it byes six and
  • 00:29:12
    seven refer to the first release segment
  • 00:29:14
    so like when we had 1.0.0 that would be
  • 00:29:16
    the one and that's because calendar
  • 00:29:19
    versioning is still fairly popular in
  • 00:29:21
    the python EOS system so uh we need two
  • 00:29:24
    bytes for the for the first release
  • 00:29:26
    segment because people would have
  • 00:29:27
    packages that have a version like 20231
  • 00:29:30
    20234
  • 00:29:31
    Etc um okay the next three byes just
  • 00:29:34
    represent like the second third and
  • 00:29:37
    fourth release segment and then the
  • 00:29:39
    three bytes at the end represent one of
  • 00:29:42
    a pre-release specifier or a post-
  • 00:29:44
    relase specifier um we cannot we do not
  • 00:29:47
    try to even capture both of them in this
  • 00:29:49
    representation um but the really nice
  • 00:29:52
    thing about this it's not just that it's
  • 00:29:55
    cheap to uh to to sorry it's not just
  • 00:29:58
    that it doesn't have to allocate memory
  • 00:29:59
    like the really great thing is that
  • 00:30:01
    greater versions map to larger integers
  • 00:30:05
    so like we're Ines that we're parsing
  • 00:30:07
    creating these all the time we're also
  • 00:30:08
    comparing them constantly because we
  • 00:30:10
    want to know like is this version
  • 00:30:11
    greater than this version does this
  • 00:30:12
    version satisfy like this version
  • 00:30:14
    specifier um and now like in this
  • 00:30:17
    representation you have to be very very
  • 00:30:19
    careful in how we constructed the
  • 00:30:20
    representation but in this
  • 00:30:22
    representation answering that question
  • 00:30:24
    is just a single meem comp it's just
  • 00:30:25
    like is this u64 greater than this other
  • 00:30:27
    u64 as opposed to dealing with those two
  • 00:30:29
    big version things that have vectors and
  • 00:30:31
    we have to like understand like blah
  • 00:30:32
    blah blah so like most of the
  • 00:30:33
    implementation of this is actually like
  • 00:30:35
    a huge comment explaining how the um
  • 00:30:39
    explaining the representation how it
  • 00:30:41
    works it does have limitations right we
  • 00:30:43
    can't support that Epoch thing we you
  • 00:30:45
    can't have more than four release
  • 00:30:47
    release segments like one. one. one. one
  • 00:30:49
    but again over 90% of versions can be
  • 00:30:51
    modeled this way um and yeah this is
  • 00:30:54
    actually something we did and like when
  • 00:30:57
    when there's when there's minimal IO so
  • 00:30:59
    like everything's fully cached and we
  • 00:31:00
    have this very hard resolution that has
  • 00:31:02
    to do a lot of package version testing
  • 00:31:05
    It sped things up like three or four
  • 00:31:06
    times it was like super super impactful
  • 00:31:08
    because this is where we were just
  • 00:31:09
    spending a ton of time parsing versions
  • 00:31:12
    uh allocating memory for them and
  • 00:31:13
    comparing them so again this is just
  • 00:31:16
    like you can do this with other
  • 00:31:17
    languages too but rust I think is very
  • 00:31:19
    amable to doing this kind of thing um
  • 00:31:21
    and it made a really big difference for
  • 00:31:22
    us um I mentioned that most uh you know
  • 00:31:27
    a lot of what we have to do with package
  • 00:31:28
    manager is actually IO so I want to have
  • 00:31:30
    a go through one or two examples of ways
  • 00:31:32
    that we try and cheat a little bit with
  • 00:31:35
    IO um so I I hinted at this before but
  • 00:31:39
    when you want to understand the metadata
  • 00:31:40
    for a package like you need to know its
  • 00:31:43
    dependencies um it's not guaranteed that
  • 00:31:45
    you can actually get that information
  • 00:31:47
    without like writing some python code um
  • 00:31:49
    but often you can so when you publish a
  • 00:31:52
    package to the index there are two kinds
  • 00:31:54
    of packages one is a source package and
  • 00:31:56
    one is a built distribution um and the
  • 00:31:58
    build distributions are probably like
  • 00:32:00
    most of what you interact with um and
  • 00:32:02
    those are really important because when
  • 00:32:03
    you interact with python a lot of what
  • 00:32:05
    you're doing is actually interacting
  • 00:32:06
    with native code right so if you use
  • 00:32:07
    like numpy or scipi or whatever those
  • 00:32:10
    have to be built on a bunch of different
  • 00:32:11
    platforms because they are not pure
  • 00:32:12
    python um so python has this extensive
  • 00:32:15
    support for built distributions and
  • 00:32:17
    built distributions do include the
  • 00:32:19
    metadata which is great
  • 00:32:22
    um the build distributions are actually
  • 00:32:25
    just zip archives they're called Wheels
  • 00:32:27
    I don't fully understand why um but the
  • 00:32:31
    the suffix is. WHL but it's actually
  • 00:32:32
    just a zip file and like somewhere in
  • 00:32:35
    the zip file there's a metadata file
  • 00:32:37
    like literally a file called metadata um
  • 00:32:39
    that contains the metadata for the
  • 00:32:42
    package um some Registries will let you
  • 00:32:45
    ask for this directly but like a lot of
  • 00:32:47
    them won't um it just depends on the
  • 00:32:49
    registry like pii the public index will
  • 00:32:52
    let you just say like give me the
  • 00:32:53
    metadata but like for whatever reason a
  • 00:32:55
    lot of the commercial Registries do not
  • 00:32:57
    support this yet yet so we want to like
  • 00:32:59
    get the metadata but we don't want to
  • 00:33:02
    download the whole wheel because the
  • 00:33:05
    wheel like the P torch wheels are like
  • 00:33:06
    hundreds of megabytes um and we don't
  • 00:33:09
    want to download them just to know the
  • 00:33:10
    metadata because we might have to test
  • 00:33:11
    like a bunch of versions too so what we
  • 00:33:15
    do instead um this is a representation
  • 00:33:19
    of a zip file um I used to be very
  • 00:33:22
    scared of file formats um but but zip is
  • 00:33:25
    very simple um it's sort of just a
  • 00:33:28
    series of entries like each entry has
  • 00:33:29
    like a header and then it has the
  • 00:33:30
    contents of the file and then at the
  • 00:33:32
    very end there is What's called the
  • 00:33:34
    central directory it's kind of like an
  • 00:33:35
    index so the central directory knows
  • 00:33:39
    what all the files are and where they're
  • 00:33:41
    located like you can think of the zip
  • 00:33:43
    file it's just like you know a stream of
  • 00:33:44
    of bytes and all the files are somewhere
  • 00:33:46
    the central directory knows where all
  • 00:33:48
    the files
  • 00:33:49
    are so what we do is we first make a
  • 00:33:53
    range request for the central directory
  • 00:33:55
    so we we guess where it is we say it's
  • 00:33:58
    probably within this you know this many
  • 00:34:00
    bytes at the end of the file and then we
  • 00:34:02
    grab the central directory which is
  • 00:34:04
    basically an index of information and
  • 00:34:06
    that does not require downloading the
  • 00:34:07
    whole wheel we can ask the registry to
  • 00:34:09
    just give us those you know end bytes at
  • 00:34:11
    the end of the file we then find the
  • 00:34:14
    metadata file in the central directory
  • 00:34:15
    and then we make a second range request
  • 00:34:17
    just for that metadata file because the
  • 00:34:18
    central directory knows where it is so
  • 00:34:20
    we grab the central directory we figure
  • 00:34:22
    out what we need to request and then we
  • 00:34:23
    go and get the metadata file um yeah
  • 00:34:25
    this has nothing to do with rust right
  • 00:34:27
    by the way like like other python tools
  • 00:34:29
    can do this too but it does save a lot
  • 00:34:31
    of time because we don't have to
  • 00:34:32
    download these huge files just to answer
  • 00:34:33
    the question of like what packages does
  • 00:34:36
    it depend on
  • 00:34:39
    um the probably the biggest contributor
  • 00:34:42
    to like why UV is so fast and why it
  • 00:34:45
    feels so fast is the cache design um so
  • 00:34:50
    UV is like the cache itself is all
  • 00:34:53
    optimized for like warm operations um as
  • 00:34:56
    in operations where uh you have the data
  • 00:34:59
    you want in the cach and you just need
  • 00:35:00
    to like get it into your environment and
  • 00:35:02
    that's because like one uh like most of
  • 00:35:06
    the time when you're installing a
  • 00:35:07
    package like you've probably installed
  • 00:35:09
    it already on your machine at some point
  • 00:35:10
    in time that's may not be true for like
  • 00:35:12
    a continuous integration environment but
  • 00:35:14
    like on your machine you probably have a
  • 00:35:15
    lot of copies of the same packages that
  • 00:35:17
    you've installed in different places um
  • 00:35:19
    and so we try to optimize for those
  • 00:35:21
    kinds of interactions where you have
  • 00:35:22
    data in the cache and we want to make it
  • 00:35:24
    really fast for you to do something with
  • 00:35:26
    it so the way that we model this is we
  • 00:35:28
    have this sort of global cache of un of
  • 00:35:31
    unpacked archives recall like every
  • 00:35:33
    archive is a ZIP file we don't actually
  • 00:35:35
    store the zip files in the cache what we
  • 00:35:38
    do instead is like while we download
  • 00:35:40
    those files we just unzip them directly
  • 00:35:42
    into the cache so the cache contains
  • 00:35:44
    like the fully unzipped contents of the
  • 00:35:46
    files and when we need to install the
  • 00:35:50
    installation operation is basically that
  • 00:35:52
    we just you know we we use ref linking
  • 00:35:54
    where we can or hard linking we just
  • 00:35:56
    link the files into your environment so
  • 00:35:58
    like if you're using UV and you need
  • 00:36:00
    numpy in like a bunch of different
  • 00:36:01
    environments we just install it in one
  • 00:36:03
    place and then when you install it in
  • 00:36:05
    your environment we are basically just
  • 00:36:06
    creating links to the files in the cache
  • 00:36:09
    um that's like really really fast and
  • 00:36:12
    it's also very space efficient because
  • 00:36:14
    it means that you're not installing the
  • 00:36:16
    same contents like over and over in all
  • 00:36:17
    your different projects um so again most
  • 00:36:20
    installs are just like hard linking a
  • 00:36:22
    bunch of files from the cache into your
  • 00:36:24
    environment um so like this is just this
  • 00:36:27
    is just literally just a screenshot of
  • 00:36:29
    like my file system of the cache um the
  • 00:36:31
    cache looks like this right there's like
  • 00:36:32
    packages package versions and then it's
  • 00:36:34
    just the unzipped contents and so when
  • 00:36:35
    you want to install rich in your virtual
  • 00:36:38
    environment we're just like creating
  • 00:36:39
    simbl to all these files
  • 00:36:41
    effectively um and that's really really
  • 00:36:43
    fast which is great so like this this
  • 00:36:46
    alone contributes to a lot of the
  • 00:36:48
    feeling of things being instant like if
  • 00:36:49
    you've installed something on your
  • 00:36:50
    machine you reinstall with UV um uh this
  • 00:36:53
    is like a lot of it is due to how we
  • 00:36:55
    Design This cache and the fact we try to
  • 00:36:57
    optimized for those kinds of
  • 00:36:59
    operations um okay last thing so this
  • 00:37:03
    this is really good but it only works
  • 00:37:06
    for this only applies to like files a
  • 00:37:09
    lot of what we need to store in the
  • 00:37:10
    cache is like metadata um so maybe uh
  • 00:37:13
    like Blobs of data right maybe we need
  • 00:37:15
    to know like what are all the available
  • 00:37:17
    versions of like this package um we
  • 00:37:20
    cannot this does not apply to that so we
  • 00:37:23
    use a slightly different trick for those
  • 00:37:25
    cases which is we use a technique called
  • 00:37:26
    zero copy
  • 00:37:28
    um and this will require the most rust
  • 00:37:30
    knowledge but I'll try to I'll try to
  • 00:37:31
    avoid it um the intuition here is like
  • 00:37:34
    let's say that you have a struct like
  • 00:37:36
    this blob struct and it has a
  • 00:37:38
    field and uh it's being stored as Json
  • 00:37:41
    so like if you want to deserialize this
  • 00:37:44
    you read the Json file from disk into
  • 00:37:46
    memory and then uh you know you run like
  • 00:37:49
    a a uh basically you run a parser and
  • 00:37:52
    then like you grab the contents and put
  • 00:37:54
    them in the struct um the observation of
  • 00:37:57
    zero copy though is like and I don't
  • 00:38:00
    know if anyone will know the difference
  • 00:38:01
    between these two things but um but uh
  • 00:38:04
    when you're trying to create the struct
  • 00:38:06
    you already read the Json file into
  • 00:38:08
    memory right so you already like
  • 00:38:10
    allocated memory to read that file into
  • 00:38:12
    a string so you don't actually need to
  • 00:38:15
    like allocate more memory to create that
  • 00:38:17
    blob struct the version on the left that
  • 00:38:19
    requires an allocation so if we want to
  • 00:38:22
    go from the thing at the top to the
  • 00:38:23
    thing at the bottom we have to allocate
  • 00:38:24
    once to read them read the contents into
  • 00:38:26
    into memory and then it again to create
  • 00:38:28
    the struct instead like we already know
  • 00:38:32
    that the string is basic is there
  • 00:38:33
    verbatim in the contents so ideally
  • 00:38:36
    instead we could just create a pointer
  • 00:38:37
    to it so we read the Json into memory we
  • 00:38:40
    parse it and then we ideally this is
  • 00:38:43
    sort of theoretical right we just like
  • 00:38:44
    create a pointer to it rather than
  • 00:38:46
    reallocating memory to create
  • 00:38:48
    everything um so this is what we do but
  • 00:38:50
    we do it like sort
  • 00:38:53
    of uh uh on steroids I guess like um the
  • 00:38:57
    way it works is we store the data um on
  • 00:39:01
    disk in effectively the same
  • 00:39:03
    representation that it will have in
  • 00:39:04
    memory and we read when we read it back
  • 00:39:08
    we're basically just doing like a
  • 00:39:09
    pointer cast to go from red data to
  • 00:39:12
    fully realiz structs we use a library
  • 00:39:14
    for this called archive which is very
  • 00:39:15
    good um and there are some safety checks
  • 00:39:18
    around this of course I mean it's
  • 00:39:19
    totally unsafe rust but like there are
  • 00:39:22
    there are safety and validation checks
  • 00:39:24
    you can do around this but the really
  • 00:39:26
    cool thing about this is um like the
  • 00:39:30
    deserialization does not scale with your
  • 00:39:32
    data so like you have to read the the
  • 00:39:35
    contents from disk and like as you have
  • 00:39:37
    more and more data that file will be
  • 00:39:38
    bigger and bigger and like you have to
  • 00:39:39
    read more and more but going from the
  • 00:39:42
    data that you read to like the fully
  • 00:39:44
    like the seral the Der serialized struct
  • 00:39:47
    that does not scale as the data gets
  • 00:39:48
    larger unlike with Json right like with
  • 00:39:50
    Json you would have to uh You' have to
  • 00:39:52
    like parse it right you would have to go
  • 00:39:54
    through all these operations that would
  • 00:39:55
    get slower and slower as the data got
  • 00:39:56
    bigger the really cool thing about in my
  • 00:39:58
    opinion at least about zero copy
  • 00:40:00
    serialization is the the Der
  • 00:40:03
    serialization does not scale with your
  • 00:40:04
    data so like it doesn't matter really
  • 00:40:06
    like how big the struct is or how large
  • 00:40:09
    that string is like it does matter for
  • 00:40:10
    reading the data from disk but it does
  • 00:40:12
    not matter for D serializing it into
  • 00:40:14
    into
  • 00:40:15
    memory um okay that was the last thing I
  • 00:40:18
    was going to cover I had a bunch of
  • 00:40:19
    other things I want to talk about um but
  • 00:40:21
    I just put links to them and maybe I can
  • 00:40:22
    share the slides um and I think that's
  • 00:40:25
    it
  • 00:40:26
    [Applause]
  • 00:40:32
    he
タグ
  • Charlie Marsh
  • Astral
  • Python
  • UV
  • Dependency Management
  • Rust
  • Performance Optimization
  • Package Manager
  • Virtual Environments
  • Tooling