uv: An Extremely Fast Python Package Manager
摘要
TLDRCharlie Marsh, founder of Astral, presents UV, a high-performance Python package manager that aims to unify and streamline Python tooling. Building on previous success with Rough, a Python linter and formatter, Marsh highlighted UV's ability to handle Python environments effectively, resolve packages rapidly, and maintain performance through efficient cache design and zero-copy serialization. UV stands out due to its comprehensive scope and speed. Unlike fragmented tools like pip or poetry, UV offers an all-in-one solution similar to Rust's Cargo, focusing on rapid installation and management of Python environments without relying on Python's interpreter. This approach enhances user experiences by allowing fast and ephemeral creation of virtual environments. The talk detailed the inner workings of UV, explaining hard problems solved during its development, like dependency resolution challenges without multiversion support, and creating universal lock files applicable across platforms. Efficiently handling Python's complex dependency syntax and performance challenges in dependency management were core themes. Additionally, Rust's role in UV was emphasized, providing low-level control and efficient memory usage, contributing to UV’s speed. Techniques like reading metadata efficiently from zip files and designing a global cache for expedited file linking were discussed as major optimization strategies.
心得
- 💻 UV is a unified, fast Python package manager.
- 🚀 Built with Rust for high performance.
- 🔄 Handles Python environments and dependencies efficiently.
- 📦 Universal lock files for cross-platform use.
- ⚡ Optimized for speed and user experience.
- 🔧 Designed to replace fragmented Python tools.
- 🛠️ Solves hard problems in dependency resolution.
- 📁 Utilizes efficient cache and IO operations.
- 🚀 Changes traditional Python workflows with speed.
- 📈 Rapidly adopted in the Python community.
时间轴
- 00:00:00 - 00:05:00
Charlie Marsh, founder of Astral, discusses his company's high-performance Python developer tools, Rough and UV, which have achieved millions of monthly downloads. Rough is a linter and code formatter, while UV is a fast all-in-one package manager aimed at unifying the Python ecosystem akin to Rust’s tooling.
- 00:05:00 - 00:10:00
UV aims to provide a streamlined package management experience, replacing tools like pip and Poetry. Marsh emphasizes that UV’s speed transforms user interactions, allowing for faster creation and destruction of virtual environments without concern for their complexity, marking a shift in workflow and user expectations.
- 00:10:00 - 00:15:00
Marsh details the operations of UV, including resolving user requirements and generating a lock file for package management. He mentions the complexities in achieving a universal lock file that functions seamlessly across different systems, highlighting Python’s lack of multiversion support as a challenge they overcome with a SAT solver.
- 00:15:00 - 00:20:00
The presentation addresses the challenges in solving dependency graphs, especially Python's single-version limitation and complex dependency markers. UV employs a conflict-driven SAT solver approach, negotiating Boolean satisfiability problems equivalent to NP-hard challenges.
- 00:20:00 - 00:25:00
Further complexities arise with Python conditional dependencies, requiring UV's resolver to construct universal lock files that accommodate varied Python environments. This involves complex logical operations and marker algebra to ensure cross-platform compatibility in package resolutions.
- 00:25:00 - 00:30:00
Marsh introduces the architecture behind UV’s fast performance, focusing on Rust programming for efficient resource utilization. Rust's strengths contribute to UV's performance, though Marsh also attributes the speed to strategic IO operations and data caching techniques that minimize redundant processing.
- 00:30:00 - 00:35:00
In demonstrating IO optimization, Marsh describes UV's use of range requests to efficiently access package metadata, avoiding full downloads of large files. He explains UV’s cache infrastructure, optimizing speed and storage by hard linking files and using zero-copy techniques for metadata processing.
- 00:35:00 - 00:40:33
The talk concludes by reinforcing UV's impact on Python development, summarizing how its innovations in cache design and version handling offer not just speed but a new approach to managing Python environments effectively.
思维导图
视频问答
What is UV?
UV is a fast, all-in-one Python package manager that simplifies dependency management and environment handling, similar to Cargo for Rust.
What makes UV different from other Python tools?
UV stands out due to its speed, comprehensive scope, and ability to unify fragmented Python tooling into a single, efficient tool.
Why was Rust used to build UV?
Rust provides efficient control over memory allocation and performance, which are crucial for building fast and reliable software like UV.
How does UV improve Python developer workflows?
UV allows for rapid installation and management of Python environments and packages, changing workflows by making tasks that were previously slow, like environment recreation, much faster.
Can UV be used with existing Python projects?
Yes, UV can be used as a drop-in alternative to tools like pip, handling package management with greater speed and efficiency.
查看更多视频摘要
WHY Russians Are DUMPING Chinese Yuan to Save the Ruble
What future of jobs do we want to build?
What to do if the optic nerve is damaged ?
The Entire Lore of Fear & Hunger Explained In Chronological Order | Part 1
XREAL One + One Pro Hands-On Impressions!
Musculoskeletal System | Muscle Structure and Function
- 00:00:05hi
- 00:00:10everyone Charlie Marsh is the founder of
- 00:00:13Astral a company building high
- 00:00:15performance developer tools for the
- 00:00:16python ecosystem over the past two years
- 00:00:19he's released Ru a python lintern Auto
- 00:00:21formatter and UV a python package and
- 00:00:24project manager both projects have grown
- 00:00:26to tens of millions of downloads per
- 00:00:28month and have seen rapid adoption
- 00:00:30across both open source and
- 00:00:34Enterprise okay everyone can hear me
- 00:00:36okay great excellent um wow very nice
- 00:00:38intro uh I now what am I gonna do with
- 00:00:41my own intro slides um yeah my name is
- 00:00:44Charlie I'm the founder of um yeah so I
- 00:00:47uh me and my team we spend our time
- 00:00:48trying to build really fast python
- 00:00:50tooling and rust um we're primarily
- 00:00:52known for two tools right the first of
- 00:00:55which is rough uh which is a linter
- 00:00:57formatter and code transformation tool
- 00:01:00so you can use it to format your code
- 00:01:02but also to identify issues like unused
- 00:01:04Imports and fix them
- 00:01:05automatically um and the second which is
- 00:01:08going to be the focus of what I'm
- 00:01:09talking about today is UV which is uh
- 00:01:11our python package manager um uh I gave
- 00:01:15a talk at Pyon about rough and it sort
- 00:01:17of covered like what rough is a little
- 00:01:20bit of how it works and then what makes
- 00:01:21it fast um this is going to be uh
- 00:01:24similar but focused on UV so I want to
- 00:01:26talk through what UV is I'll try to keep
- 00:01:29that short because maybe the least
- 00:01:30interesting part uh why we built it um
- 00:01:33some of the hard problems that went into
- 00:01:35building it and then uh end with a
- 00:01:37couple examples or sort of case studies
- 00:01:39of things we did that uh make it really
- 00:01:42fast because that's the that's the thing
- 00:01:43that we tend to get the most questions
- 00:01:45about is why is it
- 00:01:47fast um so UV is what I would call a
- 00:01:50fast all-in-one python package manager
- 00:01:52so uh you can use UV to install python
- 00:01:55itself create virtual environments
- 00:01:57resolve dependencies install packages of
- 00:01:59of course it is package manager uh you
- 00:02:01can use it to build your own python
- 00:02:03packages that you would then like upload
- 00:02:04and
- 00:02:05redistribute so UV is if you're familiar
- 00:02:08with the python ecosystem you could
- 00:02:09think it as a drop in alternative to
- 00:02:11tools like pip pipex Pym virtual M
- 00:02:15poetry uh and it's not a replacement for
- 00:02:17any one of these it's really intended to
- 00:02:18be a replacement for all of them uh so
- 00:02:21we model what we're trying to do with UV
- 00:02:24after cargo so if you've worked in the
- 00:02:26rust ecosystem the tooling is very
- 00:02:28streamlined and very Unified
- 00:02:30um and the way I would describe it is I
- 00:02:31feel like rust tooling oh
- 00:02:38no
- 00:02:40okay hold on okay the way I think of it
- 00:02:44is that uh rust rust tooling is very
- 00:02:48high confidence like when I clone a rust
- 00:02:50project I'm very confident that I can
- 00:02:52run it it will run successfully and I
- 00:02:54know how to run it and we're trying to
- 00:02:55get to a similar experience with UV for
- 00:02:57python tooling so UV is this single
- 00:03:00static binary that ideally gives you
- 00:03:01everything you need to be productive
- 00:03:03with python you install UV and then
- 00:03:04everything is sort of taken care of for
- 00:03:06you so UV does a lot of stuff um and it
- 00:03:10does it all while being just way way
- 00:03:12faster than a lot of the other tools in
- 00:03:13the ecosystem um and when we started I I
- 00:03:17knew when we started working on a
- 00:03:18package manager and this is was some of
- 00:03:21the reaction there would be a little bit
- 00:03:22of this because in the python ecosystem
- 00:03:24there's just a lot of different tools
- 00:03:25for packaging um and so when you come
- 00:03:27out and say hey we built we finally bu
- 00:03:30the a Python package manager um you know
- 00:03:32there's a lot of well this cartoon um
- 00:03:36and we see we still see a little bit of
- 00:03:37this actually which is kind of funny to
- 00:03:38me because Tool's pretty popular um but
- 00:03:42uh I think UV is actually pretty
- 00:03:45different from a lot of the other things
- 00:03:46that exist in the ecosystem in the
- 00:03:47previous attempts to build this tool for
- 00:03:50python primarily for two reasons one is
- 00:03:53just like the scope of what we're trying
- 00:03:55to do um so I mentioned this before or
- 00:03:58hinted at at least like python tooling
- 00:04:00is very fragmented and I think of that
- 00:04:02in like two ways one is for anything you
- 00:04:04want to do there's like a bunch of
- 00:04:06different options and it's very hard to
- 00:04:07choose and then know which one you
- 00:04:09should use and the second dimension is
- 00:04:11like for anything that you're trying to
- 00:04:12do that's non-trivial you have to like
- 00:04:14chain together a bunch of tools um and
- 00:04:17for UV it's meant to be like a totally
- 00:04:20unified stack so we didn't like build on
- 00:04:22top of anything else right we didn't
- 00:04:23like build on top of pip or inherit any
- 00:04:25of the baggage that comes from existing
- 00:04:27python tooling we build like everything
- 00:04:29from scratch
- 00:04:30and that creates a really powerful model
- 00:04:32both in terms of like the user
- 00:04:33experience we can deliver uh but also um
- 00:04:38uh how how good it can be I guess
- 00:04:40internally uh the second reason I think
- 00:04:42that it's a little bit different is just
- 00:04:43the performance right like UV as I
- 00:04:45mentioned it's very very fast um and the
- 00:04:48thing that I've seen in my time working
- 00:04:50on this stuff is that when things are
- 00:04:52like way way
- 00:04:54faster they really change like the
- 00:04:56user's relationship to the tool and even
- 00:04:58like the way that they work with it so
- 00:04:59like we saw this in rough
- 00:05:01where jobs that people like used to only
- 00:05:04be able to run in CI they could now make
- 00:05:06like pre-commit hooks because like the
- 00:05:07speed differences were just so different
- 00:05:09and so suddenly this thing that you like
- 00:05:10Dread running is something that you can
- 00:05:11just like do locally and get in pass um
- 00:05:15and with UV another analogy would be
- 00:05:16like virtual environments like in Python
- 00:05:18often like you have a virtual
- 00:05:19environment on your machine it's in like
- 00:05:21a certain State and you really don't
- 00:05:23want to mess it up because you won't be
- 00:05:24able to recreate it and like get it into
- 00:05:26that place or it's really expensive to
- 00:05:27recreate because it has all these
- 00:05:28packages installed with UV like
- 00:05:30destroying and creating a virtual
- 00:05:31environment is extremely fast like we
- 00:05:33try to view them as totally ephemeral
- 00:05:35like you can just destroy them and
- 00:05:36recreate them because it's so cheap so
- 00:05:38we're actually trying to change like
- 00:05:40it's not just about you know there's a
- 00:05:42lot of obvious nice things that come
- 00:05:43with being much faster but I also think
- 00:05:45that it can just change a lot of the
- 00:05:47workflows around how people uh work with
- 00:05:50python um so over yeah over we released
- 00:05:55UV in like
- 00:05:56mid-February um and since then yeah it's
- 00:05:58just grown a lot lot so it's at like 16
- 00:06:00million downloads a month um it's now
- 00:06:04more than 10% of the requests to pii
- 00:06:07come from people running UV which is
- 00:06:09kind of crazy like I I would have been
- 00:06:11happy maybe with like
- 00:06:131% um just because the sheer volume of
- 00:06:16what people are doing with python um is
- 00:06:18is is is pretty wild so um that's been a
- 00:06:21cool thing to see um yeah we have a lot
- 00:06:23of stars your contributors blah blah
- 00:06:25blah but uh you know the point is we
- 00:06:27built this thing I consider it fairly
- 00:06:29well battl it's been used across the
- 00:06:30industry so hopefully the stuff I say
- 00:06:32has some credibility behind
- 00:06:34it um all right so the main job right or
- 00:06:40the main thing you do with a package
- 00:06:41manager is you install packages so I
- 00:06:43just want to talk through the life cycle
- 00:06:44of what happens when you run a command
- 00:06:47to install packages with UV um and UV
- 00:06:50has two primary interfaces that you can
- 00:06:53um uh use to to to engage with it the
- 00:06:57first is that we have like a pip
- 00:06:58compatible interface so if you've run
- 00:07:00like pip install you can just run like
- 00:07:02UV pip install and we Implement a lot of
- 00:07:04the same commands that pip would which
- 00:07:06is great for people who want to adopt it
- 00:07:08without changing their workflow although
- 00:07:10we would like their workflow to change
- 00:07:11obviously um and the second is we have
- 00:07:14these higher level commands like UV sync
- 00:07:16and UV lock that um you know you sort of
- 00:07:18declare your dependencies and then we
- 00:07:20just take care of everything and make
- 00:07:21sure the environment is in the right
- 00:07:22State whenever you want to do anything
- 00:07:25um but they both operate uh under a
- 00:07:28fairly similar life cycle right so the
- 00:07:30first thing we have to do is we have to
- 00:07:32like find the user's python interpreter
- 00:07:34UV does not depend on python um but we
- 00:07:37do like need a python interpreter in
- 00:07:39order to do a lot of useful things um so
- 00:07:42like if you want to create a virtual
- 00:07:43environment for example a virtual
- 00:07:45environment has to sim Link in a python
- 00:07:47interpreter so like we can't create a we
- 00:07:49can create a virtual environment but we
- 00:07:50wouldn't have a python to put in it um
- 00:07:52and we need to know things like what
- 00:07:54version of python are you running like
- 00:07:56what platform are you on uh all that
- 00:07:58kind of stuff um so this is actually
- 00:08:00pretty hard but really not very
- 00:08:02interesting um
- 00:08:04so um next thing we need to do WR is we
- 00:08:06need to discover the actual user
- 00:08:08requirements this is the user telling us
- 00:08:09like the state they want to be in at the
- 00:08:11end of the command and that could be you
- 00:08:13know they gave it to us directly maybe
- 00:08:14we read it from requirements txt file
- 00:08:17something similar given those
- 00:08:19requirements we resolve them into you
- 00:08:22know this is the core job of a package
- 00:08:23manager you give us some requirements
- 00:08:25and we try and figure out uh a set of
- 00:08:27versions that satisfy those requirements
- 00:08:29so you know the user might say uh I want
- 00:08:32pantic and um that's not really enough
- 00:08:35information on its own right for us to
- 00:08:37like uh like what does it mean for the
- 00:08:39user to want pantic so the first thing
- 00:08:40we have to do is we have to resolve that
- 00:08:43into a set of versions such that
- 00:08:45everyone's dependencies are satisfied
- 00:08:47and all the versions are compatible and
- 00:08:48ideally it's like the latest version of
- 00:08:50pantic too because that's what the user
- 00:08:51asked for
- 00:08:54um so even this isn't like quite enough
- 00:08:57information for us to really do anything
- 00:08:59because this just just describes the
- 00:09:00packages and the versions but it doesn't
- 00:09:02really tell us anything about like where
- 00:09:03to get them um so ultimately this is not
- 00:09:06what we're trying to produce we're
- 00:09:07trying to produce something that looks
- 00:09:08more like this um so UV ultimately will
- 00:09:11create a lock file um and that lock file
- 00:09:13represents a resolution and in that lock
- 00:09:16file we have information like this is
- 00:09:17one entry from a lock file so we have
- 00:09:20like the package name the version but we
- 00:09:21also have information about where it
- 00:09:23came from also the packages it depends
- 00:09:25on we have like a sha we have file size
- 00:09:27etc etc so ultimately like when we
- 00:09:30resolve we're trying to create something
- 00:09:32like this um and I'll talk more about
- 00:09:33like how this is structured in a bit
- 00:09:36once we have that graph we come up with
- 00:09:39term I made up like an install plan uh
- 00:09:41the idea is like we know the state that
- 00:09:43the user wants to get to which is like
- 00:09:44represented by the lock file we have to
- 00:09:46look at the current state of the user's
- 00:09:48machine like maybe they have like an old
- 00:09:50version of pantic installed so we need
- 00:09:52to like uninstall it and then install
- 00:09:54the newer version of pantic um so most
- 00:09:58of the well not but most of the
- 00:10:00interesting work happens in here uh in
- 00:10:02the actual resolver and uh this was also
- 00:10:06I think the hardest part of building UV
- 00:10:10so I want to talk about a couple of the
- 00:10:12hard problems that are maybe like
- 00:10:13nonobvious
- 00:10:15um especially if you don't spend a lot
- 00:10:17of time like thinking about python
- 00:10:20packaging which I hope like most of you
- 00:10:22don't
- 00:10:24um so okay so the first thing that makes
- 00:10:28this problem quite hard is that python
- 00:10:31has no multiversion support so you
- 00:10:34cannot have two versions of the same
- 00:10:36package installed at the same time um
- 00:10:39this might sound like a very obvious so
- 00:10:40you can't have like pantic version one
- 00:10:42and pantic version two installed at the
- 00:10:44same time this might sound obvious but
- 00:10:46actually like a lot of languages do
- 00:10:48support this so like rust and node will
- 00:10:50let you do this without any issues um in
- 00:10:52Python it's it's basically a limitation
- 00:10:54of the runtime um there's like Imports
- 00:10:57are like a global cache key button
- 00:10:59module name so like you can't have
- 00:11:00multiple modules with the same name um
- 00:11:03so as like a concrete example let's say
- 00:11:06the root is like our project and we
- 00:11:08depend on like a specific version of
- 00:11:11VM and we also depend on a specific
- 00:11:14version of Lang
- 00:11:16chain and VM depends on pantic 2 but
- 00:11:20like this old old version of Lang chain
- 00:11:22does not work with pantic 2 it requires
- 00:11:23ptic version one so like this is not a
- 00:11:26solvable graph in Python you cannot like
- 00:11:28you cannot satisfy these dependencies
- 00:11:30and if you try to give those to UV
- 00:11:32you'll get you know this pretty error
- 00:11:34message that tells you you this doesn't
- 00:11:37work because you have these two
- 00:11:38dependencies and they have an
- 00:11:39incompatible ptic
- 00:11:42requirement um so you know instead
- 00:11:45imagine that the user says like I'll
- 00:11:46accept any version of VM but I still
- 00:11:48need this like old version of
- 00:11:50pantic in that case what we need to do
- 00:11:52right is we need to backtrack we need to
- 00:11:54test out all the versions of VM and try
- 00:11:56to find a version of VM that does work
- 00:11:59So eventually we go and find the
- 00:12:01previous version was like VM 0.6.1 I
- 00:12:04think so we tried out a bunch of
- 00:12:05versions eventually we find you know a
- 00:12:07set of compatible
- 00:12:09requirements and when we like when we do
- 00:12:11the solve right it's not it's typically
- 00:12:13not just like these four packages like
- 00:12:15this is just a snapshot of like
- 00:12:17ultimately the resolved graph from those
- 00:12:18set of dependencies right like it's
- 00:12:20typically a sprawling thing with lots of
- 00:12:22different requirements and there's lots
- 00:12:23of different ways to satisfy it and you
- 00:12:26know ultimately I mean this like the
- 00:12:28shape of this might look familiar to
- 00:12:30some of you we're trying to do version
- 00:12:32solving so like given we have a universe
- 00:12:34of package versions they have
- 00:12:36constraints like some things depend on
- 00:12:38different versions of
- 00:12:39pantic we need to find the set such that
- 00:12:42like all the dependencies are satisfied
- 00:12:44we can only have one version of every
- 00:12:46package um and also we don't want to
- 00:12:48have like extraneous packages um and
- 00:12:51this yeah this is a Boolean
- 00:12:52satisfiability problem um it is NP hard
- 00:12:55so uh you know it's I like think that is
- 00:12:59quite hard
- 00:13:02um and uh I'm not going to go into the
- 00:13:04details of like exactly what our solver
- 00:13:06looks like but if you maybe if you think
- 00:13:09back to school um you know we use a SAT
- 00:13:11solver it's based on cdcl which is like
- 00:13:13conflict driven Clause learning it's
- 00:13:14basically just a fancy thing to Tres to
- 00:13:15solve those graphs in as efficient a way
- 00:13:17as it can by exploiting her istics and
- 00:13:20things that it can learn but it you know
- 00:13:21it can be exponential like there's no
- 00:13:23guarantee that it's actually going to
- 00:13:24solve it in a reasonable amount of
- 00:13:26time um so because we don't have multi
- 00:13:29verion support we have to do this sat
- 00:13:30solve um if we had multiversion support
- 00:13:33by the way like we wouldn't necessarily
- 00:13:34have to do that like Russ like cargo's
- 00:13:36solver like it's not a SAT solver like
- 00:13:38it does like a graph reversal but if you
- 00:13:41get to a hard place where like things
- 00:13:42are not quite working like you can kind
- 00:13:44of just bail out and say like let's add
- 00:13:46two versions of this package so that
- 00:13:47like Escape valve exists but it does not
- 00:13:49exist in Python and this is also true of
- 00:13:51other
- 00:13:52languages okay second thing um and I've
- 00:13:56never tried to explain this this new
- 00:13:59material by the way and this in
- 00:14:00particular I've never tried to explain
- 00:14:01to a group of people so I might you know
- 00:14:03let me know afterwards if it makes any
- 00:14:05sense um but um this was like
- 00:14:09surprisingly or parts of this were
- 00:14:12surprising but um this is maybe like the
- 00:14:14hardest part of building this resolver
- 00:14:16which is python has this like very rich
- 00:14:18Syntax for declaring requirements that
- 00:14:22should only be installed on certain
- 00:14:23python versions or only on certain
- 00:14:24platforms etc etc um so like just as an
- 00:14:28example
- 00:14:29um these are the dependencies of a real
- 00:14:31package I can't remember what maybe
- 00:14:32flask um and you see the last one has uh
- 00:14:36the import Li package should only be
- 00:14:38installed if the user's python version
- 00:14:40is 3.10 or
- 00:14:41earlier and if we look at some of the
- 00:14:43transitive dependencies here um like
- 00:14:46click itself depends on colorama but
- 00:14:49only on Windows and it also depends on
- 00:14:52import lid but only if the python
- 00:14:53version is less than 3.8 um so when you
- 00:14:57see this like set of requirements
- 00:14:59there's kind of two ways to think about
- 00:15:00solving the graph like one is solving
- 00:15:03the graph for like a specific user at a
- 00:15:05specific point in time that's on a
- 00:15:06specific computer right so maybe a user
- 00:15:09comes up and they're using Windows on
- 00:15:10python 3.12 so like some things here are
- 00:15:12relevant and some things aren't um
- 00:15:15that's actually pretty easy because
- 00:15:16you're basically just filtering things
- 00:15:17out while you solve it's not a huge
- 00:15:18problem um but we want to solve a
- 00:15:21slightly different problem which is we
- 00:15:23want to generate a lock file that like
- 00:15:26any user on any machine can then use to
- 00:15:28get a repr will install um and what that
- 00:15:33means is like you know if a user is on
- 00:15:35Windows and a user on Mac they may not
- 00:15:37get the exact same set of packages like
- 00:15:39the user on Windows would get colorama
- 00:15:41the user on Mac would not but like all
- 00:15:43the users on Windows on the same python
- 00:15:45version should get the same set of
- 00:15:46packages and ideally like the
- 00:15:48differences between those users are as
- 00:15:50small as possible that's not actually
- 00:15:52something we guarantee but you know the
- 00:15:53gist of it is you want to be able to
- 00:15:55take the lock file and like any user on
- 00:15:57any machine should be able to take it
- 00:15:58and install like we don't just want a
- 00:16:00lock file for Windows 3.12 we want what
- 00:16:02we would call like a universal lock
- 00:16:05file um and that problem is like a lot
- 00:16:07harder um so again like the core of our
- 00:16:11solver is the SAT
- 00:16:12solver and then there are kind of like
- 00:16:15two pieces that go into trying to build
- 00:16:16this Universal
- 00:16:18resolution so one is that at a high
- 00:16:22level we kind of try to find a solution
- 00:16:24that works on all platforms like
- 00:16:26effectively we assume that all of those
- 00:16:27markers are true and try to see if we
- 00:16:30can find a solution so like the marker
- 00:16:32that said colorama like only on Windows
- 00:16:33we would basically say let's just assume
- 00:16:35that's true and see if we can find a
- 00:16:36solution and then afterwards we'll like
- 00:16:38filter out the packages that are only
- 00:16:40for Windows so that's that's good but
- 00:16:42the problem is right you can have
- 00:16:43conflict these conflicting dependencies
- 00:16:45so like this is totally valid like you
- 00:16:47could have a user say it has to be
- 00:16:49pantic to on Windows but it cannot be
- 00:16:51pantic to on all other platforms and
- 00:16:53again we're trying to find a solution
- 00:16:55such that a user shows up and they're on
- 00:16:57Windows they install they ptic version
- 00:16:59two a window show a user shows up and
- 00:17:00they're on Mac they get ptic version
- 00:17:02less than
- 00:17:03two um okay so the way that we solve
- 00:17:06this is uh we and again we've made up
- 00:17:10all this terminology because I didn't I
- 00:17:12don't really know if there was good
- 00:17:13terminology for it that existed um but
- 00:17:15what we do is we basically try and fork
- 00:17:17and solve the two graphs separately so
- 00:17:21you know in this case on the left we
- 00:17:23would try to solve like pantic greater
- 00:17:25than two uh assume we're on Windows
- 00:17:27basically and do and just solve the rest
- 00:17:29of the
- 00:17:30graph on the right we would do the same
- 00:17:32thing the graph ends up being like a lot
- 00:17:34simpler um but we would solve these two
- 00:17:36graphs like effectively
- 00:17:38independently and then we merge the
- 00:17:40results back together so this is like
- 00:17:44the merged resolution of taking those
- 00:17:46two um and oh wow great oh it's all okay
- 00:17:51I messed up the transitions on this
- 00:17:53slide but I think we'll be okay um okay
- 00:17:56this is what this is supposed to look
- 00:17:57like so basically right the thing on the
- 00:17:59bottom right is like the merged
- 00:18:00resolution and on the two sides we have
- 00:18:02like the platform specific resolutions
- 00:18:04so annotated types needs to be included
- 00:18:07but only on Windows because it was only
- 00:18:08present in the windows resolution um I
- 00:18:11think this is going to do it on like all
- 00:18:13of these okay uh piden is included twice
- 00:18:16right but once for Windows and once for
- 00:18:18non- Windows and those markers are
- 00:18:20disjoint like there's no overlap on them
- 00:18:22so everyone will get one version of
- 00:18:23pantic but it will be like one of these
- 00:18:26two um and then
- 00:18:29importantly there's also a package
- 00:18:31that's included in both resolutions um
- 00:18:34so typing extensions is included both on
- 00:18:36Windows and on not Windows and sorry I
- 00:18:39know this is like super annoying um so
- 00:18:41if you look at like the way that that
- 00:18:43marker gets uh
- 00:18:45constructed we have typing extensions
- 00:18:47and we're saying we want to include it
- 00:18:48on like Windows or but also include it
- 00:18:51on not Windows right and that marker
- 00:18:53like these come from these two different
- 00:18:55places and that marker is always true
- 00:18:58right so we can actually just ignore it
- 00:19:00completely that's why it doesn't it's
- 00:19:01not present in the final resolution um
- 00:19:05so not only do we have to like solve
- 00:19:07these graphs in this way but we end up
- 00:19:08doing a lot of different uh there's this
- 00:19:12whole marker algebra that we have to
- 00:19:13consider like the ores and the ANS that
- 00:19:15you're seeing there we have to do a lot
- 00:19:16of operations like here evaluating that
- 00:19:19that marker always figuring out that
- 00:19:21that marker always evaluates the true
- 00:19:22and that we can just emit it like you
- 00:19:24know I mean it's easy in that case but
- 00:19:27like we'll see you know some hard cas
- 00:19:29similarly we have to able to test for
- 00:19:31disjointness with these um I mentioned
- 00:19:33that like we had the two pantic
- 00:19:35requirements and they were disjoint that
- 00:19:37just means that like they can never both
- 00:19:38be true effectively um so you know for
- 00:19:42example maybe we're like doing the solve
- 00:19:45on the left side of the previous slide
- 00:19:46like we're solving for Windows and then
- 00:19:48we like see a dependency that has this
- 00:19:50marker on it we want to know like is
- 00:19:53this dependency relevant like we know
- 00:19:55we're solving for Windows so like should
- 00:19:57we even care about this dependency
- 00:19:58because it it's only applicable on these
- 00:20:00platforms right and that question is
- 00:20:02basically can they both be true are they
- 00:20:05disjoint this is also Boolean
- 00:20:07satisfiability problem um and the
- 00:20:09markers can be like pretty complicated
- 00:20:10right this is also MP hard like totally
- 00:20:12separate MP hard problem which is we
- 00:20:15have to be able to test uh I'll go back
- 00:20:18on we have to be able to test whether
- 00:20:19like these two these two uh you know
- 00:20:21Boolean expressions are destroying um
- 00:20:24and we're doing this like all the time
- 00:20:26um now most markers are
- 00:20:29pretty simple which is great um but like
- 00:20:32these are just some examples of these
- 00:20:34are like the fully simplified markers
- 00:20:36for like a real example from our test
- 00:20:38case this is resolving uh if any of you
- 00:20:40use like Transformers like the hugging
- 00:20:42face project
- 00:20:43um one of our test cases is we take that
- 00:20:46project and we enable all of the
- 00:20:48optional dependencies which like no one
- 00:20:49should do but it creates like a very
- 00:20:51very large graph and so it's one of our
- 00:20:53harder test cases and like these are the
- 00:20:54fully simplified markers um and uh you
- 00:20:58can see like some of them are pretty
- 00:21:00large um and by the way before we did
- 00:21:04this before we did this marker
- 00:21:05simplification of trying to get to like
- 00:21:07these simplifies normalized
- 00:21:10forms we we would do that resolution and
- 00:21:12like each dependency would have like
- 00:21:14tens of kilobytes of markers like the
- 00:21:16marker Expressions were huge um and that
- 00:21:18was even we even had like some very
- 00:21:20basic charistics like you know just kind
- 00:21:22of like simple stuff for trying to
- 00:21:25normalize and filter them out ultimately
- 00:21:28um someone on the team wrote this like
- 00:21:31marker normalizer based on Tech a
- 00:21:33technique called algebraic decision
- 00:21:34diagrams um it's like a totally separate
- 00:21:37solver that we had to build to try and
- 00:21:39normalize those markers and ask
- 00:21:41questions like are they disjoint um so
- 00:21:43these were both like very very hard
- 00:21:45problems um a third that I'll just
- 00:21:47mention briefly is that and it's not I'm
- 00:21:51not actually like going to talk about
- 00:21:52this one that much but I do like to
- 00:21:54complain about it a little bit there's
- 00:21:56so in the python ecosystem there there's
- 00:21:58really like no guarantee that you have
- 00:22:01static metadata for a package or like a
- 00:22:03dependency you want to resolve um and by
- 00:22:05that I mean like if we're trying to
- 00:22:07resolve pantic version two um we're
- 00:22:09going to go to the registry and we're
- 00:22:10going to say like what's the metadata
- 00:22:12for pantic version two and it's actually
- 00:22:14not guaranteed that they will be able to
- 00:22:16give us an answer um ultimately what
- 00:22:19might happen is we might have to run
- 00:22:20some sort of arbitrary python code in
- 00:22:23order to get the dependencies um like
- 00:22:27basically if they publish a a built
- 00:22:28distribution it'll have dependencies but
- 00:22:31if they only publish the source for the
- 00:22:32package we might have to effectively
- 00:22:35like pull that down and run if you've
- 00:22:37ever seen like a setup.py file before we
- 00:22:39have to like run a setup.py file we
- 00:22:40might have to build the whole package
- 00:22:41even just to get the dependencies um so
- 00:22:44we do a lot of things to try and avoid
- 00:22:46doing that um while still being correct
- 00:22:49uh and I'll talk about some of those in
- 00:22:50a bit um but kind of like just
- 00:22:54concluding on this section like
- 00:22:55ultimately what we're trying to build
- 00:22:56here um we model it as a graph right the
- 00:23:00nodes are packages at specific versions
- 00:23:03and the edges are weighted by markers um
- 00:23:05so you know in this case we have like
- 00:23:07the two versions of pantic but one Edge
- 00:23:09is weighted by only being on Windows the
- 00:23:12other is like never being on Windows um
- 00:23:14and you can see they like share some
- 00:23:15common nodes etc etc and like the nice
- 00:23:18thing about this representation is when
- 00:23:19a user comes along and wants to install
- 00:23:21on Linux or whatever we just are
- 00:23:23traversing we're just doing a graph
- 00:23:24traversal and saying like which edges
- 00:23:27are relevant and which are not
- 00:23:29um so there are some tools in the python
- 00:23:30ecosystem that try to do this but they
- 00:23:33then have to do like a separate sat
- 00:23:35solve at install time so they have like
- 00:23:38a set of packages and then when you
- 00:23:39install they actually have to like run a
- 00:23:40SAT solver to figure out the right
- 00:23:42versions We have the nice property that
- 00:23:44like it's just there there's no like
- 00:23:46second resolution when you install we're
- 00:23:47just like traversing this graph and
- 00:23:49figuring out the things to include so
- 00:23:50this is like the ultimate goal like all
- 00:23:52that work goes into trying to produce
- 00:23:53this thing um okay so I talked a lot
- 00:23:58about
- 00:23:59um some of the hard problems we had to
- 00:24:01solve to build this now I want to talk a
- 00:24:04little bit about things we did to make
- 00:24:06it fast um and uh you know the first
- 00:24:09thing that that comes to everyone's mind
- 00:24:11and also that I just talk about a lot is
- 00:24:13rust like rust is a big part of uh of UV
- 00:24:18and of how we've made it so fast um like
- 00:24:20UV is written in Rust uh like I said we
- 00:24:23don't have a dependency on python aough
- 00:24:24you do need to have python installed um
- 00:24:28but the the observation for me this
- 00:24:31slide is slightly sort of just opinions
- 00:24:33um UV has gotten like faster and faster
- 00:24:35over time like right like despite being
- 00:24:36written in Rust the whole time so it's
- 00:24:38not just about being written in Rust
- 00:24:40like from my perspective I think rust
- 00:24:42gives you a really fast
- 00:24:44Baseline um and then it gives you a lot
- 00:24:47of tools that you need if you want to
- 00:24:49write really really fast programs and
- 00:24:51like other programming languages can
- 00:24:52expose these too but for example it's
- 00:24:55pretty hard to like care deeply about
- 00:24:57memory application if you're writing
- 00:24:59python um like you just don't really
- 00:25:01have a lot of control over what's
- 00:25:03happening whereas in Rust you're
- 00:25:05actually like forced to care about a lot
- 00:25:07of those things right which some people
- 00:25:10will complain
- 00:25:11about but it's one of it is ultimately
- 00:25:13one of the strengths of the language um
- 00:25:16so like rust is part of it and I'm going
- 00:25:18to talk about some things some parts of
- 00:25:19rust that we use but UV is also like
- 00:25:22most package managers like a lot of what
- 00:25:24we do is IO and rust is like only so
- 00:25:26helpful with IO there's like a lot of
- 00:25:28other things we need to do so it's not
- 00:25:30rust is a big part of UV but it's not
- 00:25:31all about rust I do want to start though
- 00:25:34with an example that I think illustrates
- 00:25:36like why rust is uh helpful and
- 00:25:38important and and you can do this in
- 00:25:40other languages too but uh we do it in
- 00:25:42Rust so that's what I'm going to talk
- 00:25:43about um okay version
- 00:25:47parsing okay so like in Python like
- 00:25:50every package has a version right um and
- 00:25:53you could have a very simple version
- 00:25:54like 1.0.0 um but they can get very
- 00:25:56complicated so you can have like uh
- 00:25:58pre-releases that can be like Alpha Beta
- 00:26:00or RC which is a release candidate and
- 00:26:02the pre-release can have a number you
- 00:26:03can have like beta 1 beta 2 Beta 3 um
- 00:26:06you can also have post releases so like
- 00:26:09if you need to update this would
- 00:26:10typically be like if you need to update
- 00:26:12the documentation but not the source
- 00:26:14code you might do like a Post Release so
- 00:26:15like the contents are the same but you
- 00:26:17had to release it again for some
- 00:26:18reason um okay there's also there's also
- 00:26:22this piece called like the local version
- 00:26:25identifier um which if you've ever
- 00:26:27worked with py
- 00:26:29you will probably be familiar with this
- 00:26:31um this is intended for I'll probably
- 00:26:34get this wrong but this in the spec at
- 00:26:36least it's intended for like you're
- 00:26:38building a package locally and you want
- 00:26:40to be able to tag it in some way like on
- 00:26:42your local
- 00:26:43machine um pytorch has now used this due
- 00:26:48to other limitations in the packaging
- 00:26:49ecosystem to Mark uh packages as being
- 00:26:53compatible with certain accelerators so
- 00:26:55like you might have to build a lot of
- 00:26:56different versions of pytorch that ort
- 00:26:58different versions of Cuda and they now
- 00:27:00use this part of the identifier just
- 00:27:02because it was sort of an open like a
- 00:27:04free space um uh to indicate that
- 00:27:07because there's no other support in the
- 00:27:08standards for like marking packages as
- 00:27:11compatible with an accelerator it's sort
- 00:27:12of a hole in the standards so anyway
- 00:27:13this is become very popular now in the
- 00:27:15python ecosystem um okay this one I
- 00:27:19actually like forgot this one existed
- 00:27:20and then I was doing the slides you can
- 00:27:22do this like this like Epoch thing I
- 00:27:24actually don't really remember what this
- 00:27:25is for but you can put like a number in
- 00:27:27the next exclamation mark um and that's
- 00:27:30a valid that's a valid python version
- 00:27:32and of course you can like you can
- 00:27:34actually like compose these things
- 00:27:35together like you can have like you can
- 00:27:37have like a pre a post-release of a
- 00:27:39pre-release I'm pretty sure you can have
- 00:27:41like a local version of a pre-release
- 00:27:42etc etc so like representing these is
- 00:27:47pretty hard like it's a very rich syntax
- 00:27:50so like the full representation of this
- 00:27:53is something like this right you have we
- 00:27:55have like multiple vectors um which
- 00:27:57means we're going to be allocating
- 00:27:59memory um because the release segments
- 00:28:01there can be more than three there can
- 00:28:02be like as many as you want actually
- 00:28:04like it can be
- 00:28:051.1.1 one. one like that's fine um you
- 00:28:08can have multiple of those local
- 00:28:10segments like you can have like plus
- 00:28:11something plus something blah blah blah
- 00:28:13so like this is like this is pretty
- 00:28:15heavy and we are dealing with these
- 00:28:17things like all over the place so
- 00:28:20someone on our team um uh he goes by
- 00:28:24Burnt Sushi online so I should credit
- 00:28:26him because he figured this out uh he
- 00:28:28noticed that we can represent like over
- 00:28:3190% of versions with like a single
- 00:28:34u64 um which is great because one it's
- 00:28:38like fully stack allocated um and
- 00:28:41there's a second property that's really
- 00:28:42nice about it that I'll get to in a
- 00:28:43second but this is actually what we use
- 00:28:45internally so like internally we have a
- 00:28:46version and then we have an enum and we
- 00:28:48try to represent most versions as like
- 00:28:50the small this version small um and then
- 00:28:53we represent things with the full
- 00:28:55version if they don't fit into that
- 00:28:56scheme so think
- 00:28:59um if so it's a u64 so we have like
- 00:29:02eight byes to work with Okay eight bytes
- 00:29:05of space um so the first two or the the
- 00:29:09the first or last two bytes however you
- 00:29:11want to think about it byes six and
- 00:29:12seven refer to the first release segment
- 00:29:14so like when we had 1.0.0 that would be
- 00:29:16the one and that's because calendar
- 00:29:19versioning is still fairly popular in
- 00:29:21the python EOS system so uh we need two
- 00:29:24bytes for the for the first release
- 00:29:26segment because people would have
- 00:29:27packages that have a version like 20231
- 00:29:3020234
- 00:29:31Etc um okay the next three byes just
- 00:29:34represent like the second third and
- 00:29:37fourth release segment and then the
- 00:29:39three bytes at the end represent one of
- 00:29:42a pre-release specifier or a post-
- 00:29:44relase specifier um we cannot we do not
- 00:29:47try to even capture both of them in this
- 00:29:49representation um but the really nice
- 00:29:52thing about this it's not just that it's
- 00:29:55cheap to uh to to sorry it's not just
- 00:29:58that it doesn't have to allocate memory
- 00:29:59like the really great thing is that
- 00:30:01greater versions map to larger integers
- 00:30:05so like we're Ines that we're parsing
- 00:30:07creating these all the time we're also
- 00:30:08comparing them constantly because we
- 00:30:10want to know like is this version
- 00:30:11greater than this version does this
- 00:30:12version satisfy like this version
- 00:30:14specifier um and now like in this
- 00:30:17representation you have to be very very
- 00:30:19careful in how we constructed the
- 00:30:20representation but in this
- 00:30:22representation answering that question
- 00:30:24is just a single meem comp it's just
- 00:30:25like is this u64 greater than this other
- 00:30:27u64 as opposed to dealing with those two
- 00:30:29big version things that have vectors and
- 00:30:31we have to like understand like blah
- 00:30:32blah blah so like most of the
- 00:30:33implementation of this is actually like
- 00:30:35a huge comment explaining how the um
- 00:30:39explaining the representation how it
- 00:30:41works it does have limitations right we
- 00:30:43can't support that Epoch thing we you
- 00:30:45can't have more than four release
- 00:30:47release segments like one. one. one. one
- 00:30:49but again over 90% of versions can be
- 00:30:51modeled this way um and yeah this is
- 00:30:54actually something we did and like when
- 00:30:57when there's when there's minimal IO so
- 00:30:59like everything's fully cached and we
- 00:31:00have this very hard resolution that has
- 00:31:02to do a lot of package version testing
- 00:31:05It sped things up like three or four
- 00:31:06times it was like super super impactful
- 00:31:08because this is where we were just
- 00:31:09spending a ton of time parsing versions
- 00:31:12uh allocating memory for them and
- 00:31:13comparing them so again this is just
- 00:31:16like you can do this with other
- 00:31:17languages too but rust I think is very
- 00:31:19amable to doing this kind of thing um
- 00:31:21and it made a really big difference for
- 00:31:22us um I mentioned that most uh you know
- 00:31:27a lot of what we have to do with package
- 00:31:28manager is actually IO so I want to have
- 00:31:30a go through one or two examples of ways
- 00:31:32that we try and cheat a little bit with
- 00:31:35IO um so I I hinted at this before but
- 00:31:39when you want to understand the metadata
- 00:31:40for a package like you need to know its
- 00:31:43dependencies um it's not guaranteed that
- 00:31:45you can actually get that information
- 00:31:47without like writing some python code um
- 00:31:49but often you can so when you publish a
- 00:31:52package to the index there are two kinds
- 00:31:54of packages one is a source package and
- 00:31:56one is a built distribution um and the
- 00:31:58build distributions are probably like
- 00:32:00most of what you interact with um and
- 00:32:02those are really important because when
- 00:32:03you interact with python a lot of what
- 00:32:05you're doing is actually interacting
- 00:32:06with native code right so if you use
- 00:32:07like numpy or scipi or whatever those
- 00:32:10have to be built on a bunch of different
- 00:32:11platforms because they are not pure
- 00:32:12python um so python has this extensive
- 00:32:15support for built distributions and
- 00:32:17built distributions do include the
- 00:32:19metadata which is great
- 00:32:22um the build distributions are actually
- 00:32:25just zip archives they're called Wheels
- 00:32:27I don't fully understand why um but the
- 00:32:31the suffix is. WHL but it's actually
- 00:32:32just a zip file and like somewhere in
- 00:32:35the zip file there's a metadata file
- 00:32:37like literally a file called metadata um
- 00:32:39that contains the metadata for the
- 00:32:42package um some Registries will let you
- 00:32:45ask for this directly but like a lot of
- 00:32:47them won't um it just depends on the
- 00:32:49registry like pii the public index will
- 00:32:52let you just say like give me the
- 00:32:53metadata but like for whatever reason a
- 00:32:55lot of the commercial Registries do not
- 00:32:57support this yet yet so we want to like
- 00:32:59get the metadata but we don't want to
- 00:33:02download the whole wheel because the
- 00:33:05wheel like the P torch wheels are like
- 00:33:06hundreds of megabytes um and we don't
- 00:33:09want to download them just to know the
- 00:33:10metadata because we might have to test
- 00:33:11like a bunch of versions too so what we
- 00:33:15do instead um this is a representation
- 00:33:19of a zip file um I used to be very
- 00:33:22scared of file formats um but but zip is
- 00:33:25very simple um it's sort of just a
- 00:33:28series of entries like each entry has
- 00:33:29like a header and then it has the
- 00:33:30contents of the file and then at the
- 00:33:32very end there is What's called the
- 00:33:34central directory it's kind of like an
- 00:33:35index so the central directory knows
- 00:33:39what all the files are and where they're
- 00:33:41located like you can think of the zip
- 00:33:43file it's just like you know a stream of
- 00:33:44of bytes and all the files are somewhere
- 00:33:46the central directory knows where all
- 00:33:48the files
- 00:33:49are so what we do is we first make a
- 00:33:53range request for the central directory
- 00:33:55so we we guess where it is we say it's
- 00:33:58probably within this you know this many
- 00:34:00bytes at the end of the file and then we
- 00:34:02grab the central directory which is
- 00:34:04basically an index of information and
- 00:34:06that does not require downloading the
- 00:34:07whole wheel we can ask the registry to
- 00:34:09just give us those you know end bytes at
- 00:34:11the end of the file we then find the
- 00:34:14metadata file in the central directory
- 00:34:15and then we make a second range request
- 00:34:17just for that metadata file because the
- 00:34:18central directory knows where it is so
- 00:34:20we grab the central directory we figure
- 00:34:22out what we need to request and then we
- 00:34:23go and get the metadata file um yeah
- 00:34:25this has nothing to do with rust right
- 00:34:27by the way like like other python tools
- 00:34:29can do this too but it does save a lot
- 00:34:31of time because we don't have to
- 00:34:32download these huge files just to answer
- 00:34:33the question of like what packages does
- 00:34:36it depend on
- 00:34:39um the probably the biggest contributor
- 00:34:42to like why UV is so fast and why it
- 00:34:45feels so fast is the cache design um so
- 00:34:50UV is like the cache itself is all
- 00:34:53optimized for like warm operations um as
- 00:34:56in operations where uh you have the data
- 00:34:59you want in the cach and you just need
- 00:35:00to like get it into your environment and
- 00:35:02that's because like one uh like most of
- 00:35:06the time when you're installing a
- 00:35:07package like you've probably installed
- 00:35:09it already on your machine at some point
- 00:35:10in time that's may not be true for like
- 00:35:12a continuous integration environment but
- 00:35:14like on your machine you probably have a
- 00:35:15lot of copies of the same packages that
- 00:35:17you've installed in different places um
- 00:35:19and so we try to optimize for those
- 00:35:21kinds of interactions where you have
- 00:35:22data in the cache and we want to make it
- 00:35:24really fast for you to do something with
- 00:35:26it so the way that we model this is we
- 00:35:28have this sort of global cache of un of
- 00:35:31unpacked archives recall like every
- 00:35:33archive is a ZIP file we don't actually
- 00:35:35store the zip files in the cache what we
- 00:35:38do instead is like while we download
- 00:35:40those files we just unzip them directly
- 00:35:42into the cache so the cache contains
- 00:35:44like the fully unzipped contents of the
- 00:35:46files and when we need to install the
- 00:35:50installation operation is basically that
- 00:35:52we just you know we we use ref linking
- 00:35:54where we can or hard linking we just
- 00:35:56link the files into your environment so
- 00:35:58like if you're using UV and you need
- 00:36:00numpy in like a bunch of different
- 00:36:01environments we just install it in one
- 00:36:03place and then when you install it in
- 00:36:05your environment we are basically just
- 00:36:06creating links to the files in the cache
- 00:36:09um that's like really really fast and
- 00:36:12it's also very space efficient because
- 00:36:14it means that you're not installing the
- 00:36:16same contents like over and over in all
- 00:36:17your different projects um so again most
- 00:36:20installs are just like hard linking a
- 00:36:22bunch of files from the cache into your
- 00:36:24environment um so like this is just this
- 00:36:27is just literally just a screenshot of
- 00:36:29like my file system of the cache um the
- 00:36:31cache looks like this right there's like
- 00:36:32packages package versions and then it's
- 00:36:34just the unzipped contents and so when
- 00:36:35you want to install rich in your virtual
- 00:36:38environment we're just like creating
- 00:36:39simbl to all these files
- 00:36:41effectively um and that's really really
- 00:36:43fast which is great so like this this
- 00:36:46alone contributes to a lot of the
- 00:36:48feeling of things being instant like if
- 00:36:49you've installed something on your
- 00:36:50machine you reinstall with UV um uh this
- 00:36:53is like a lot of it is due to how we
- 00:36:55Design This cache and the fact we try to
- 00:36:57optimized for those kinds of
- 00:36:59operations um okay last thing so this
- 00:37:03this is really good but it only works
- 00:37:06for this only applies to like files a
- 00:37:09lot of what we need to store in the
- 00:37:10cache is like metadata um so maybe uh
- 00:37:13like Blobs of data right maybe we need
- 00:37:15to know like what are all the available
- 00:37:17versions of like this package um we
- 00:37:20cannot this does not apply to that so we
- 00:37:23use a slightly different trick for those
- 00:37:25cases which is we use a technique called
- 00:37:26zero copy
- 00:37:28um and this will require the most rust
- 00:37:30knowledge but I'll try to I'll try to
- 00:37:31avoid it um the intuition here is like
- 00:37:34let's say that you have a struct like
- 00:37:36this blob struct and it has a
- 00:37:38field and uh it's being stored as Json
- 00:37:41so like if you want to deserialize this
- 00:37:44you read the Json file from disk into
- 00:37:46memory and then uh you know you run like
- 00:37:49a a uh basically you run a parser and
- 00:37:52then like you grab the contents and put
- 00:37:54them in the struct um the observation of
- 00:37:57zero copy though is like and I don't
- 00:38:00know if anyone will know the difference
- 00:38:01between these two things but um but uh
- 00:38:04when you're trying to create the struct
- 00:38:06you already read the Json file into
- 00:38:08memory right so you already like
- 00:38:10allocated memory to read that file into
- 00:38:12a string so you don't actually need to
- 00:38:15like allocate more memory to create that
- 00:38:17blob struct the version on the left that
- 00:38:19requires an allocation so if we want to
- 00:38:22go from the thing at the top to the
- 00:38:23thing at the bottom we have to allocate
- 00:38:24once to read them read the contents into
- 00:38:26into memory and then it again to create
- 00:38:28the struct instead like we already know
- 00:38:32that the string is basic is there
- 00:38:33verbatim in the contents so ideally
- 00:38:36instead we could just create a pointer
- 00:38:37to it so we read the Json into memory we
- 00:38:40parse it and then we ideally this is
- 00:38:43sort of theoretical right we just like
- 00:38:44create a pointer to it rather than
- 00:38:46reallocating memory to create
- 00:38:48everything um so this is what we do but
- 00:38:50we do it like sort
- 00:38:53of uh uh on steroids I guess like um the
- 00:38:57way it works is we store the data um on
- 00:39:01disk in effectively the same
- 00:39:03representation that it will have in
- 00:39:04memory and we read when we read it back
- 00:39:08we're basically just doing like a
- 00:39:09pointer cast to go from red data to
- 00:39:12fully realiz structs we use a library
- 00:39:14for this called archive which is very
- 00:39:15good um and there are some safety checks
- 00:39:18around this of course I mean it's
- 00:39:19totally unsafe rust but like there are
- 00:39:22there are safety and validation checks
- 00:39:24you can do around this but the really
- 00:39:26cool thing about this is um like the
- 00:39:30deserialization does not scale with your
- 00:39:32data so like you have to read the the
- 00:39:35contents from disk and like as you have
- 00:39:37more and more data that file will be
- 00:39:38bigger and bigger and like you have to
- 00:39:39read more and more but going from the
- 00:39:42data that you read to like the fully
- 00:39:44like the seral the Der serialized struct
- 00:39:47that does not scale as the data gets
- 00:39:48larger unlike with Json right like with
- 00:39:50Json you would have to uh You' have to
- 00:39:52like parse it right you would have to go
- 00:39:54through all these operations that would
- 00:39:55get slower and slower as the data got
- 00:39:56bigger the really cool thing about in my
- 00:39:58opinion at least about zero copy
- 00:40:00serialization is the the Der
- 00:40:03serialization does not scale with your
- 00:40:04data so like it doesn't matter really
- 00:40:06like how big the struct is or how large
- 00:40:09that string is like it does matter for
- 00:40:10reading the data from disk but it does
- 00:40:12not matter for D serializing it into
- 00:40:14into
- 00:40:15memory um okay that was the last thing I
- 00:40:18was going to cover I had a bunch of
- 00:40:19other things I want to talk about um but
- 00:40:21I just put links to them and maybe I can
- 00:40:22share the slides um and I think that's
- 00:40:25it
- 00:40:26[Applause]
- 00:40:32he
- Charlie Marsh
- Astral
- Python
- UV
- Dependency Management
- Rust
- Performance Optimization
- Package Manager
- Virtual Environments
- Tooling