Data + AI Summit 2025 - Keynote Recap

00:45:11
https://www.youtube.com/watch?v=Kqx4eeSDtAI

Ringkasan

TLDRVideo ini merangkum pengumuman penting dari Data and AI Summit yang diadakan di San Francisco, termasuk pelancaran produk baru seperti Lakebase, yang memperkenalkan OLTP dengan pemisahan penyimpanan dan pengiraan, serta Agent Bricks, yang memudahkan pembinaan agen AI. Spark 4.0 kini mempunyai mod SQL standard dan pelbagai ciri baru yang telah dibuka sumber. Edisi percuma Databricks juga diperkenalkan, membolehkan pelajar dan individu belajar dan bereksperimen dengan alat ini. Selain itu, terdapat kemas kini pada Unity Catalog dan Genie, yang meningkatkan pengurusan dan interaksi dengan data. Video ini memberikan pandangan mendalam tentang semua perubahan dan inovasi yang berlaku dalam ekosistem Databricks.

Takeaways

  • 🚀 Lakebase memperkenalkan OLTP dengan pemisahan penyimpanan dan pengiraan.
  • 🤖 Agent Bricks memudahkan pembinaan agen AI.
  • 📊 Spark 4.0 kini mempunyai mod SQL standard.
  • 🎓 Edisi percuma Databricks untuk pelajar dan individu.
  • 📁 Unity Catalog meningkatkan pengurusan data.
  • 💬 Genie kini mempunyai ciri baru untuk pengambilan data.
  • 🔄 Lake Bridge membantu migrasi data ke Databricks.
  • 🔧 Lakeflow adalah rangka kerja ETL baru.
  • 🌐 AI Gateway mengurus API untuk model AI.
  • 📈 MLflow 3.0 memperkenalkan pengurusan versi untuk prompt.

Garis waktu

  • 00:00:00 - 00:05:00

    Video ini memperkenalkan kemas kini terkini dari Advancing Spark, yang merangkumi pengumuman penting dari Data dan AI Summit di San Francisco. Pembentang menyatakan bahawa mereka akan merangkum hampir 7 jam pengumuman menjadi poin-poin penting yang perlu diketahui.

  • 00:05:00 - 00:10:00

    Pembentang membincangkan visi baru platform Data Bricks, termasuk elemen seperti AI, SQL, dan pasar data. Penekanan diberikan kepada Unity Catalog yang kini menyokong Iceberg, membolehkan pengguna memilih antara Delta dan Iceberg untuk pengurusan data.

  • 00:10:00 - 00:15:00

    Pengumuman terbesar adalah Lakebase, yang menandakan Data Bricks memasuki pasaran OLTP (Online Transactional Processing). Ini membolehkan pemisahan penyimpanan dan pengiraan, yang berbeza dari pendekatan tradisional OLTP, dan dibina di atas Postgres.

  • 00:15:00 - 00:20:00

    Lakebase membolehkan pengguna untuk mencipta pangkalan data yang berfungsi dengan cepat dan efisien, dengan keupayaan untuk mengendalikan berjuta-juta pertanyaan secara serentak. Ini adalah langkah besar dalam dunia pengkomputeran data.

  • 00:20:00 - 00:25:00

    Agent Bricks diperkenalkan sebagai alat untuk membina sistem berasaskan agen dengan lebih mudah. Ini membolehkan pengguna untuk mencipta agen yang dapat mengautomasi aliran kerja tanpa memerlukan pengetahuan teknikal yang mendalam.

  • 00:25:00 - 00:30:00

    Spark 4.0 dilancarkan dengan banyak ciri baru, termasuk sintaks SQL yang lebih baik dan pengendalian kesalahan yang lebih ketat. Ini adalah perubahan besar yang perlu diperhatikan oleh pengguna yang telah terbiasa dengan cara lama Spark berfungsi.

  • 00:30:00 - 00:35:00

    Declarative Pipelines, sebelum ini dikenali sebagai Delta Live Tables, kini telah dibuka sumber dan diintegrasikan ke dalam Lakeflow, yang merupakan rangkaian alat untuk pengambilan dan pemprosesan data.

  • 00:35:00 - 00:40:00

    Lakehouse Apps kini tersedia secara umum, membolehkan pengguna membina antaramuka pengguna untuk berinteraksi dengan AI. Ini adalah langkah penting untuk menjadikan AI lebih mudah diakses oleh pengguna biasa.

  • 00:40:00 - 00:45:11

    Genie, alat interaksi data yang membolehkan pengguna berkomunikasi dengan data mereka, kini juga tersedia secara umum dengan pelbagai ciri baru yang meningkatkan pengalaman pengguna.

Tampilkan lebih banyak

Peta Pikiran

Video Tanya Jawab

  • Apa itu Lakebase?

    Lakebase adalah produk baru dari Databricks yang memperkenalkan OLTP dengan pemisahan penyimpanan dan pengiraan.

  • Apa itu Agent Bricks?

    Agent Bricks adalah alat untuk membina agen AI dengan cara yang lebih mudah dan rendah kod.

  • Apa yang baru dalam Spark 4.0?

    Spark 4.0 kini mempunyai mod SQL standard dan pelbagai ciri baru yang telah dibuka sumber.

  • Apa itu edisi percuma Databricks?

    Edisi percuma Databricks adalah versi penuh yang boleh digunakan oleh pelajar dan individu untuk belajar dan bereksperimen.

  • Apa itu Unity Catalog?

    Unity Catalog adalah alat pengurusan data yang membolehkan pengguna mengakses dan mengurus data dengan lebih baik.

  • Apa yang baru dalam Genie?

    Genie kini mempunyai ciri baru seperti pengambilan data dan cadangan pertanyaan.

  • Apa itu Lake Bridge?

    Lake Bridge adalah alat migrasi yang membantu memindahkan data dari sistem lain ke Databricks.

  • Apa itu Lakeflow?

    Lakeflow adalah rangka kerja ETL baru yang menggabungkan pelbagai alat untuk pengurusan data.

  • Apa itu AI Gateway?

    AI Gateway adalah alat untuk mengurus dan mengoptimumkan API untuk model AI.

  • Apa itu MLflow 3.0?

    MLflow 3.0 adalah versi baru yang memperkenalkan pengurusan versi untuk prompt dalam aliran kerja AI.

Lihat lebih banyak ringkasan video

Dapatkan akses instan ke ringkasan video YouTube gratis yang didukung oleh AI!
Teks
en
Gulir Otomatis:
  • 00:00:02
    Hello Spark fans. Welcome back to
  • 00:00:04
    Advancing Spark brought to you by
  • 00:00:06
    Advancing Analytics, the only people who
  • 00:00:08
    actually understand what's going on in
  • 00:00:10
    the world of data bricks currently. Now
  • 00:00:12
    for you is the big question cuz last
  • 00:00:15
    week was the data and AI summit over in
  • 00:00:17
    San Francisco and me and the team were
  • 00:00:19
    over there for so many hours of keynotes
  • 00:00:23
    and announcements and things going on in
  • 00:00:25
    the wacky world that is data bricks. So,
  • 00:00:27
    what I thought I would do is take an
  • 00:00:29
    almost 7 hours worth of announcements
  • 00:00:31
    and crushed it right down to just tell
  • 00:00:34
    you what I think is important, what I
  • 00:00:35
    think of it, and how to understand it.
  • 00:00:37
    Now, that is still going to take us a
  • 00:00:38
    little while. So, strap yourselves in.
  • 00:00:40
    This is not going to be a short video.
  • 00:00:42
    So, yeah, I've got a lot of marketing.
  • 00:00:44
    I've taken some screenshots of slides.
  • 00:00:46
    There's some dodgy quality in the things
  • 00:00:47
    I've snipped, but that's fine. We can go
  • 00:00:49
    and have a look through that. If it's
  • 00:00:51
    your first time around here, well,
  • 00:00:52
    welcome. Don't forget to like and
  • 00:00:54
    subscribe. And yeah, just buckle up.
  • 00:00:56
    We'll talk about a ton of stuff that is
  • 00:00:58
    happening in the world of data
  • 00:01:00
    intelligence on data bricks. So let's go
  • 00:01:05
    and have a look. So this is like the big
  • 00:01:06
    opening slide we saw them going back to.
  • 00:01:09
    Essentially we always see this new
  • 00:01:12
    vision each year of this is what the
  • 00:01:14
    platform looks like. Similar idea you
  • 00:01:17
    tend to have bit AI bit SQL. The data
  • 00:01:20
    marketplace is a new thing. Apps is kind
  • 00:01:22
    of a new thing in this uh circle. Lake
  • 00:01:24
    flow we'll talk about more in a minute.
  • 00:01:26
    and the AIBI. Loads of stuff going on.
  • 00:01:28
    All underpinned by Unity Catalog with a
  • 00:01:31
    special new entrance down there in terms
  • 00:01:33
    of Unity Catalog is underpinned not just
  • 00:01:36
    by Delta but also by Iceberg. One of the
  • 00:01:38
    big announcements this year was that
  • 00:01:40
    iceberg is now a fully managed available
  • 00:01:43
    for Unity Catalog. You can just create a
  • 00:01:45
    table, decide ah it's going to be a
  • 00:01:47
    delta table. This one's going to be an
  • 00:01:48
    iceberg table. They both work together.
  • 00:01:50
    No one really cares what's under the
  • 00:01:51
    hood.
  • 00:01:53
    all full parity of support for both of
  • 00:01:56
    those two. So very very cool in terms of
  • 00:01:58
    what we're seeing there. But there's a
  • 00:02:01
    ton of new things. Lake Flow itself is
  • 00:02:04
    new, but we'll get on to that. So I want
  • 00:02:06
    to take a while just talking through the
  • 00:02:09
    new concepts, what I thought of it when
  • 00:02:10
    I first heard about it and what I think
  • 00:02:12
    about it now, which is different. So,
  • 00:02:15
    the first one, probably the the biggest
  • 00:02:16
    one, and word of warning, there's a lot
  • 00:02:19
    of products that are called either lake
  • 00:02:21
    something or something bricks. So,
  • 00:02:23
    you're going to have to get used to
  • 00:02:24
    remembering all these different things
  • 00:02:25
    cuz there's a lot. So, number one, lake
  • 00:02:28
    base. Probably the most surprising
  • 00:02:30
    announcement that we actually had this
  • 00:02:32
    year. Essentially, data bricks are
  • 00:02:34
    entering the OOLTP market. Now, if
  • 00:02:37
    you're not from a databasey background,
  • 00:02:39
    uh, OOLTP is online transactional
  • 00:02:41
    processing. And you've also got the idea
  • 00:02:43
    of OLAP, online analytical processing.
  • 00:02:45
    OLAP looks after big chunky set based
  • 00:02:48
    queries. I'm looking to aggregate over
  • 00:02:50
    millions, billions of rows and give me
  • 00:02:52
    an aggregate answer. OLTP is looking for
  • 00:02:54
    lots of small little concurrent uh
  • 00:02:58
    singleton reads and writes. Very
  • 00:02:59
    different type of technology. Most of
  • 00:03:01
    the time it's still a database. You
  • 00:03:02
    still use SQL. You still work with it in
  • 00:03:04
    the same way. But actually the tools
  • 00:03:06
    that have grown and evolved in the big
  • 00:03:08
    data ecosystem, the big crunchy parallel
  • 00:03:11
    distributed compute aka Spark has always
  • 00:03:14
    been about analytics. It's always been
  • 00:03:17
    about doing things at scale to many many
  • 00:03:19
    rows at the same time. So data bricks
  • 00:03:21
    going hey now we now do OOLTP was always
  • 00:03:24
    a bit of a weird one. When I first heard
  • 00:03:26
    about it I was like why? Who's that for?
  • 00:03:30
    Is that just to tick a box saying well
  • 00:03:33
    other tools have got a database in there
  • 00:03:35
    we've now got a database. So I honestly
  • 00:03:37
    I held my hand up going all right and I
  • 00:03:41
    will happily eat my words cuz what what
  • 00:03:43
    makes the difference if they were just
  • 00:03:45
    ticking a box and going hey we've got
  • 00:03:46
    manage Postgress you can now go and
  • 00:03:48
    build a database and it works like a
  • 00:03:49
    database looks like a database that
  • 00:03:51
    would not be exciting and I'd struggle
  • 00:03:53
    to see why they're actually trying to do
  • 00:03:54
    anything different.
  • 00:03:56
    There's there's a lot of stuff actually
  • 00:03:58
    behind what's going on. So firstly this
  • 00:04:00
    new lake base yes it is managed OOLTP it
  • 00:04:02
    is Postgres working inside the thing
  • 00:04:05
    it's but it's Postgres with a split of
  • 00:04:08
    separation of storage and compute and
  • 00:04:11
    that's that's you don't get that with
  • 00:04:12
    OLTP OLTP to get it fast to get it
  • 00:04:15
    millisecond latency you have to have the
  • 00:04:17
    data stored in a very highly indexed way
  • 00:04:19
    so you can do that very very fast record
  • 00:04:22
    return that gives you the low latency
  • 00:04:24
    that OLTP needs so that announcement
  • 00:04:26
    going hey look we've got a database
  • 00:04:28
    that's now lot of data bricks but it
  • 00:04:30
    we've separated storage to compute is
  • 00:04:32
    nuts.
  • 00:04:34
    So there's a few things behind it.
  • 00:04:36
    There's a few things that they were
  • 00:04:36
    saying well this is what we need to get
  • 00:04:38
    this to actually work. So one being open
  • 00:04:40
    source yes of course it's data bricks
  • 00:04:42
    they are all about open sourcing things.
  • 00:04:44
    Uh so it's built on postgres biggest
  • 00:04:47
    open source database there is that just
  • 00:04:49
    kind of makes sense. Uh the separated
  • 00:04:51
    storage compute yes is nuts. Uh so
  • 00:04:54
    getting a huge like QPS queries per
  • 00:04:56
    second that's concurrency like most of
  • 00:04:59
    the time if you're talking about
  • 00:05:00
    analytics you're going to have tens
  • 00:05:02
    maybe hundreds of users running queries
  • 00:05:05
    not crazy if you talk about a website
  • 00:05:08
    talk about e-commerce talk about an
  • 00:05:09
    actual application used to scale you're
  • 00:05:11
    talking about millions of queries per
  • 00:05:14
    second concurrency is king in the world
  • 00:05:17
    of any kind of OLTP system there's just
  • 00:05:20
    a lot going on
  • 00:05:23
    Now the other thing is AI. So actually
  • 00:05:26
    in this world of agents in the way that
  • 00:05:28
    we're actually going, we're expecting
  • 00:05:30
    all this stuff to be so much more
  • 00:05:31
    ephemeral than it used to be. Uh someone
  • 00:05:33
    writes what writes a query that spins
  • 00:05:36
    off an agent. The agent creates a
  • 00:05:38
    database that it uses to fulfill that
  • 00:05:40
    request and then it trashes it. That's a
  • 00:05:42
    weird idea. We're used to provisioning
  • 00:05:44
    compute and maybe the compute scales up
  • 00:05:46
    and down. But the idea of spinning up a
  • 00:05:48
    whole separate database to fulfill the
  • 00:05:50
    work and then crashing it isn't really
  • 00:05:52
    something we've ever done in the
  • 00:05:53
    application style of things because
  • 00:05:54
    we've never really separated storage and
  • 00:05:56
    compute in the application style of
  • 00:05:57
    things. So yeah, it's just it's just
  • 00:06:00
    different. It's way more different than
  • 00:06:02
    I actually thought it was. Dropping my
  • 00:06:05
    clicker.
  • 00:06:06
    So why is it different? I mean it makes
  • 00:06:09
    a little bit more sense if you roll back
  • 00:06:10
    a couple of weeks before summit. We
  • 00:06:12
    actually got the announcement that data
  • 00:06:13
    bricks had acquired a company called
  • 00:06:15
    Neon. Now Neon are well known. They're a
  • 00:06:17
    startup for creating serverless Postgres
  • 00:06:20
    that champions separation of storage and
  • 00:06:22
    compute. That's what they do. So this
  • 00:06:24
    shouldn't come as that much of a
  • 00:06:25
    surprise. Now the data bricks obviously
  • 00:06:29
    they've just acquired Neon. They've been
  • 00:06:30
    working on Lakebase for a good year or
  • 00:06:33
    so. But they were working with Neon.
  • 00:06:35
    They've said actually they've been
  • 00:06:36
    working in partnership very heavily and
  • 00:06:37
    then now they've acquired them. So Lake
  • 00:06:40
    Base isn't neon, but it's heavily worked
  • 00:06:42
    with and inferred by on that. And I'm
  • 00:06:44
    sure we'll see them come closer together
  • 00:06:45
    in the future. But that just makes a
  • 00:06:48
    little bit more sense about where they
  • 00:06:49
    got the idea from, where it's come from,
  • 00:06:51
    where the technical expertise has come
  • 00:06:53
    from. So it is it it's it's more novel
  • 00:06:56
    than I thought when I first heard the
  • 00:06:58
    announcement. Oh, hey, we're doing LTP.
  • 00:07:00
    It's oh, hey, we're doing LTP, but very
  • 00:07:02
    very different.
  • 00:07:05
    Now, how much that'll go, oh yeah,
  • 00:07:07
    that's different enough. I'm gonna take
  • 00:07:09
    my existing e-commerce system. I'm gonna
  • 00:07:11
    run it on top of Lakebase. I don't know.
  • 00:07:13
    I mean, there's a maturity piece out
  • 00:07:15
    there. This has just been announced
  • 00:07:16
    before people actually use it in anger
  • 00:07:18
    to run their entire business on.
  • 00:07:20
    There'll be a little bit of maturing, a
  • 00:07:22
    little bit of migration. There'll be an
  • 00:07:23
    adoption period, right? It's very new,
  • 00:07:26
    but the idea is novel enough that
  • 00:07:27
    actually, yeah, it's interesting.
  • 00:07:31
    So he thinks one of my comedy clipped
  • 00:07:34
    slides that's terribly low res but that
  • 00:07:36
    whole idea of going well there is the
  • 00:07:38
    object store actually in the lake
  • 00:07:40
    there's compute can just be ephemeral I
  • 00:07:42
    can spin up a new database that's
  • 00:07:43
    looking at existing data and just spin
  • 00:07:45
    that out each as new instances which
  • 00:07:47
    have their own controls around it but
  • 00:07:48
    it's that buffer store and concurrency
  • 00:07:50
    management that sits in the middle which
  • 00:07:52
    is essentially doing some of the magic
  • 00:07:53
    that's enabling this to actually work at
  • 00:07:55
    that level of scale now the actual
  • 00:07:57
    technical how that works I don't know
  • 00:08:00
    we'll we'll stick around. I'm sure
  • 00:08:01
    there's much much deeper videos we're
  • 00:08:03
    going to do about lake fakes in the
  • 00:08:04
    future. But yeah, that's probably the
  • 00:08:06
    biggest announcement, the biggest like,
  • 00:08:08
    oh, that's different.
  • 00:08:12
    It's number one. See, that's the first
  • 00:08:14
    nice gentle announcement just to eases
  • 00:08:16
    into things. Number two is a little
  • 00:08:18
    crazier. So, we got this thing, which
  • 00:08:20
    I'm so happy they finally decided the
  • 00:08:22
    name for is agent bricks. Yes, it's
  • 00:08:24
    another combination of lake and bricks.
  • 00:08:26
    So agent bricks everything obviously the
  • 00:08:29
    world has gone agent based world is
  • 00:08:31
    agentic these days two years ago
  • 00:08:33
    everyone was mad for checkp and the idea
  • 00:08:35
    of LLMs last year it was all about
  • 00:08:37
    vector databases and rags this year has
  • 00:08:39
    heavily been defined by everything is an
  • 00:08:42
    agent I'm automating all of my workflows
  • 00:08:44
    build an agent to go and automate it
  • 00:08:46
    you're migrating some better you build
  • 00:08:47
    an agent everything has gone agentic but
  • 00:08:49
    there's been this barrier to entry like
  • 00:08:51
    it's you can't go and just pick up and
  • 00:08:53
    use an agent you have to go and
  • 00:08:54
    understand well autogen and all the
  • 00:08:56
    other libraries that actually build up.
  • 00:08:58
    It's quite a technical high barrier to
  • 00:09:00
    entry. It's not crazy, but yeah, you got
  • 00:09:03
    to learn some stuff. Now, Agent Bricks
  • 00:09:06
    is going to tackle that. Agent Bricks is
  • 00:09:07
    trying to say, how do we do kind of low
  • 00:09:10
    code for building out an agent? Now,
  • 00:09:13
    it's this idea. Oh, that looks how do we
  • 00:09:16
    build an agentic system that answers a
  • 00:09:18
    certain question. Essentially, writing a
  • 00:09:20
    prompt that says this is what this is
  • 00:09:23
    what good looks like. Now you got that
  • 00:09:24
    LLN judge and that was a whole thing
  • 00:09:26
    that came out around the idea about how
  • 00:09:29
    how do you evaluate a large language
  • 00:09:31
    model. If we're looking at machine
  • 00:09:32
    learning and we say I'm going to predict
  • 00:09:34
    this figure I can then actually find out
  • 00:09:36
    what the figure actually was and I can
  • 00:09:37
    compare the two and I can say
  • 00:09:39
    scientifically mathematically this is
  • 00:09:41
    how accurate that model was. We can put
  • 00:09:44
    an actual percentage figure on the
  • 00:09:45
    accuracy of traditional machine
  • 00:09:46
    learning. When a large language model is
  • 00:09:49
    bringing back some text I can go well
  • 00:09:51
    that that looks right or that doesn't
  • 00:09:53
    look right. Well, that's factually
  • 00:09:54
    incorrect. I can't put a number on it. I
  • 00:09:56
    can't quantify as much. So, how do you
  • 00:09:58
    do that at scale? How do you make that
  • 00:10:00
    repeatable? Well, the answer is get a
  • 00:10:02
    large language model to judge the output
  • 00:10:04
    of another large language model. That
  • 00:10:05
    whole LLMs as judges. And that's
  • 00:10:08
    actually grown. There's technology
  • 00:10:09
    around it. There's ratings. There's
  • 00:10:10
    there's now frameworks to allow you to
  • 00:10:12
    do it built into ML flow. So this whole
  • 00:10:14
    idea of essentially we can type some
  • 00:10:16
    sentences to tell the judge what it's
  • 00:10:18
    looking for and then actually under the
  • 00:10:20
    hood allow agent bricks to then go and
  • 00:10:22
    build an agent that satisfies that and
  • 00:10:24
    uses it as acceptance criteria and then
  • 00:10:26
    it goes right I've got some options that
  • 00:10:28
    meet it. Which one do you want to go
  • 00:10:29
    with? Essentially it's an agentic
  • 00:10:31
    workflow to build agents. The robots are
  • 00:10:33
    now building robots. This is where we've
  • 00:10:36
    got to. That's essentially what agent
  • 00:10:37
    bricks is. Now to get this to work
  • 00:10:40
    they've had to kind of collapse it down.
  • 00:10:41
    They can't just do anything in the world
  • 00:10:43
    you can think of. Essentially, there's
  • 00:10:45
    some out ofthe-box pre-anned solutions.
  • 00:10:47
    So, I can want to do information
  • 00:10:48
    extraction. So, here's a load of
  • 00:10:50
    documents. Sign a databick volume. I
  • 00:10:52
    want to build an agent that will go and
  • 00:10:54
    figure some of that stuff out. Do a
  • 00:10:55
    quick rag lookup and answer some
  • 00:10:56
    questions. Uh, I want to do like, yeah,
  • 00:10:58
    the Q&A style things. Multi-agent
  • 00:11:01
    supervisor super. I was not expecting
  • 00:11:02
    that so early. One of the things people
  • 00:11:04
    have got with data genie, the whole chat
  • 00:11:06
    over your data thing, is because you
  • 00:11:08
    build them scoped around domains of your
  • 00:11:10
    data, you can't really have a user come
  • 00:11:12
    just to ask a question across anything.
  • 00:11:14
    You need to build an agent that sits on
  • 00:11:16
    the top of a load of Genie APIs that
  • 00:11:18
    then decides which Genie room to ask a
  • 00:11:19
    question to or use a multi- aent
  • 00:11:22
    supervisor, put that on top of Genie,
  • 00:11:24
    give it the ability to ask a question to
  • 00:11:26
    a load of Genie or a separate agent.
  • 00:11:28
    That's just an out of the box thing as
  • 00:11:30
    part of agent bricks, which is crazy. So
  • 00:11:32
    there's loads of really cool stuff sat
  • 00:11:34
    underneath there in terms of how it
  • 00:11:36
    works. There are obviously lots of cool
  • 00:11:38
    demo videos that you've got, but this
  • 00:11:40
    this is the kind of workflow of
  • 00:11:41
    describing things essentially writing a
  • 00:11:43
    prompt so it understands how to actually
  • 00:11:45
    do it, picking that off, looking at the
  • 00:11:47
    response, evaluating it and says that's
  • 00:11:49
    not good enough, that's good enough,
  • 00:11:51
    letting it iterate over that stuff and
  • 00:11:53
    then actually just coming out and go
  • 00:11:54
    well there's a production agent we can
  • 00:11:56
    go and work with.
  • 00:11:59
    So I we're expecting huge amounts of
  • 00:12:02
    uptake in terms of agent bricks just
  • 00:12:04
    because everyone's trying to build
  • 00:12:06
    agents right now, but not everyone can
  • 00:12:07
    build an agent unless it's now one of
  • 00:12:09
    those agents in which case you can
  • 00:12:11
    actually build a fairly decent
  • 00:12:12
    production grade agent hire it. It's
  • 00:12:15
    better than things like AutoML. AutoML
  • 00:12:18
    is doing a similar thing with
  • 00:12:19
    traditional machine learning when we're
  • 00:12:20
    saying I've got this problem suggest a
  • 00:12:22
    load of models. But nearly always would
  • 00:12:24
    need to take that model and then
  • 00:12:25
    actually give it to a data scientist to
  • 00:12:26
    actually turn into a production grade
  • 00:12:28
    thing. Some of these agents are
  • 00:12:30
    production grade when you build them,
  • 00:12:32
    which is madness, but very cool. All
  • 00:12:35
    right, let's take a step back from all
  • 00:12:37
    the AI and hype. We've got a release of
  • 00:12:39
    Spark 4.0, which been hotly
  • 00:12:41
    anticipating. loads of stuff going into
  • 00:12:42
    it, but it's an interesting one if
  • 00:12:45
    you've been working inside data bricks
  • 00:12:46
    for a long time cuz it's essentially
  • 00:12:49
    here's a load of data bricks features
  • 00:12:50
    that are now open source as part of
  • 00:12:52
    Spark. So, if we get the big wall of
  • 00:12:54
    what's gone into it, loads of things in
  • 00:12:56
    there that we've seen before. You got
  • 00:12:57
    the SQL pipe syntax. Uh you've got the
  • 00:12:59
    improvements to the SQL udfs in there.
  • 00:13:01
    You got the variant data type. You've
  • 00:13:02
    got the uh data frame.plot. So, if you
  • 00:13:05
    can do some native plotting rather than
  • 00:13:06
    having to call a separate library for
  • 00:13:08
    it. Loads of things we've seen inside
  • 00:13:10
    data runtimes over the past 69 months
  • 00:13:13
    have now be bundled up and actually put
  • 00:13:14
    inside uh the main Spark 4.0 release. So
  • 00:13:18
    huge release just open sourcing all of
  • 00:13:20
    this good stuff. Now there is the other
  • 00:13:22
    thing that's in there and this is going
  • 00:13:23
    to catch out so many people in that as
  • 00:13:26
    of Spark 4.0 an SQL mode is standard. So
  • 00:13:30
    if you're using a runtime that has Spark
  • 00:13:32
    4.0 baked into it then it is going to
  • 00:13:34
    use an SQL mode. Generally that's good.
  • 00:13:38
    SQL behaves more like SQL in other
  • 00:13:40
    database systems. That's generally what
  • 00:13:42
    people want. However, Spark has always
  • 00:13:45
    had this idea of I'm just going to run
  • 00:13:47
    and I'm just if something happens, I'm
  • 00:13:50
    just going to ignore it. I'm going to
  • 00:13:51
    get to the end. I'm not going to fail.
  • 00:13:53
    So, if you had a divide by zero error, B
  • 00:13:55
    could just return a null and carry on
  • 00:13:57
    going. If you had a data type collision,
  • 00:13:59
    it would just return a null and keep on
  • 00:14:01
    going. Now, anti-standard mode, that is
  • 00:14:03
    going to throw an error. So if you've
  • 00:14:05
    got a lot of pipelines that were nulling
  • 00:14:08
    out data, but you weren't really aware
  • 00:14:09
    of it or you're doing that deliberately
  • 00:14:11
    and you're just letting that run
  • 00:14:12
    through, these are now going to error.
  • 00:14:14
    So be careful when you adopt Spark 4.0.
  • 00:14:16
    There is a behavioral breaking change
  • 00:14:18
    around how it actually uses SQL because
  • 00:14:21
    of anti-standard mode. But otherwise,
  • 00:14:23
    loads and loads of good stuff in there.
  • 00:14:24
    I would say check it out. But that's not
  • 00:14:27
    the biggest thing that has happened in
  • 00:14:29
    the world of open source spark. The
  • 00:14:31
    biggest thing is this declarative
  • 00:14:34
    pipelines for spark previously known as
  • 00:14:37
    delta live tableables. So DLT delta live
  • 00:14:41
    tables is an ETL framework that's been
  • 00:14:42
    baked into data bricks for ages and it
  • 00:14:45
    allows you to you can define some tables
  • 00:14:47
    as pispark or as SQL and it will then
  • 00:14:50
    just wrap it in dependency management in
  • 00:14:54
    restartability in full telemetry and
  • 00:14:56
    logging with data quality expectations
  • 00:14:59
    all of the stuff that you would expect a
  • 00:15:01
    good data engineer to actually build
  • 00:15:03
    into any of their pipelines it just does
  • 00:15:05
    automatically. So DT is a fairly robust
  • 00:15:08
    now fairly mature uh processing
  • 00:15:11
    framework and they've just open sourced
  • 00:15:14
    it. So in ter in open sourcing it they
  • 00:15:16
    have changed the name. So you will see
  • 00:15:18
    loads of people talking about
  • 00:15:20
    declarative pipelines for spark
  • 00:15:23
    DP DPFS declarative pipelines for spark
  • 00:15:27
    is the new name for delta live tables.
  • 00:15:29
    You will not see DT referred to in any
  • 00:15:32
    of the docs in any of the the libraries
  • 00:15:33
    anymore. It is now declarative
  • 00:15:35
    pipelines. Now there is a slight
  • 00:15:37
    difference. You got declarative
  • 00:15:38
    pipelines open source version but you've
  • 00:15:41
    also got this how is it implemented
  • 00:15:43
    inside of data bricks? Well the answer
  • 00:15:45
    is to this new product group called
  • 00:15:47
    lakeflow. Now they announced lakeflow
  • 00:15:49
    last year and they said this is what's
  • 00:15:51
    coming. We're going to build all these
  • 00:15:52
    different steps but essentially DT
  • 00:15:55
    implemented inside of data is part of
  • 00:15:57
    lakeflow. You got other things like
  • 00:15:58
    lakeflow connect is part of lakeflow.
  • 00:16:00
    There's this whole essentially how you
  • 00:16:03
    go about doing ETL inside of data bricks
  • 00:16:05
    is now under this umbrella of tools
  • 00:16:08
    known as lakeflow. So two things cloud
  • 00:16:12
    pipelines now open source in spark all
  • 00:16:15
    the other components inside of data
  • 00:16:16
    bricks rebranded relaunched with a load
  • 00:16:18
    of new stuff as data bricks lakeflow
  • 00:16:21
    which is now generally available so
  • 00:16:23
    there's three different aspects to it
  • 00:16:26
    most of which we've kind of seen before.
  • 00:16:27
    So lakeflow connect where part of it
  • 00:16:29
    went GA earlier this year that is the
  • 00:16:32
    low code ability to go and get data from
  • 00:16:34
    somewhere go get data from Salesforce
  • 00:16:36
    from workday a bunch of fairly common
  • 00:16:39
    data sources click some buttons and just
  • 00:16:41
    have it automatically start doing
  • 00:16:43
    incremental CDC of that data into my
  • 00:16:45
    lake great declarative pipeline sitting
  • 00:16:49
    in the middle well that's Delta live
  • 00:16:50
    tables but with a few updates and some
  • 00:16:52
    more stuff going into it and some other
  • 00:16:54
    cool stuff on top we'll look at in a
  • 00:16:56
    minute and then lakeflow jobs
  • 00:16:58
    essentially a rebranded revamped version
  • 00:17:00
    of uh datick workflows sitting around
  • 00:17:04
    there. So this this suite of tools data
  • 00:17:06
    acquisition data transformation and then
  • 00:17:08
    data orchestration sits under that
  • 00:17:11
    umbrella of LakeFlow. So if you've not
  • 00:17:13
    seen LakeFlow connect it's this kind of
  • 00:17:15
    thing. I've got a little nice clean easy
  • 00:17:17
    gooey to go off get some data. This is
  • 00:17:20
    the source pull it in. Now, they've
  • 00:17:22
    added a load more sources that are
  • 00:17:24
    compatible in terms of the release and
  • 00:17:26
    the announcements last week, but there
  • 00:17:28
    just still needs to be more. So, we're
  • 00:17:29
    going to see this like running
  • 00:17:31
    incremental addition of we've added this
  • 00:17:33
    data source, this data source, this data
  • 00:17:34
    source. So, just gradually we'll just be
  • 00:17:36
    able to do 100% of all the dual sources
  • 00:17:39
    that you're doing. You'll be able to do
  • 00:17:40
    from within LakeFlow. Not yet. There's a
  • 00:17:42
    bunch of sources it can currently do out
  • 00:17:44
    of the box. Some of which are GA, some
  • 00:17:46
    of which are still in preview. So, yeah,
  • 00:17:48
    lots of stuff going on there. uh changes
  • 00:17:51
    to how DT now known as the collaborative
  • 00:17:54
    pipelines actually works. You've got
  • 00:17:55
    this a multifile editor view. So if
  • 00:17:58
    you've been using DT for a long time, it
  • 00:18:00
    used to be pretty awful in that you'd
  • 00:18:02
    write a notebook and then have to go to
  • 00:18:03
    an entirely different screen to test it
  • 00:18:05
    and run it and then go back to your
  • 00:18:06
    notebook and the dev experience was a
  • 00:18:08
    bit mad. Now we've got this idea of one
  • 00:18:10
    an opinionated project structure going
  • 00:18:13
    actually this is how you should
  • 00:18:14
    structure your Python and SQL code
  • 00:18:16
    actually into your layers. fairly fairly
  • 00:18:18
    decent way of actually building out a
  • 00:18:20
    declared job. You've got your editor in
  • 00:18:23
    there that has part of your result set.
  • 00:18:25
    It's got your DT uh like execution graph
  • 00:18:28
    all baked in. So basically the IDE uh
  • 00:18:32
    declarative pipelines has now changed in
  • 00:18:34
    line with these announcements. So there
  • 00:18:36
    just more stuff you can do in there. So
  • 00:18:38
    yeah, lots of lots of nice stuff.
  • 00:18:39
    Essentially it's going to be nicer to
  • 00:18:41
    work in that environment. The big
  • 00:18:43
    announcement around this area is this
  • 00:18:45
    thing flow designer. And now this is
  • 00:18:47
    something that they demoed in the
  • 00:18:48
    keynote and everyone just came out kind
  • 00:18:50
    of gobsmacked openmouth going oh that's
  • 00:18:53
    going to change things. Now this is on
  • 00:18:56
    the face of things just a low code
  • 00:18:58
    editor for building out declarative
  • 00:19:00
    pipelines. So I can drag and drop and
  • 00:19:02
    say well get some data from that source
  • 00:19:04
    transform it write it over there. That's
  • 00:19:06
    fairly straightforward. Um but they've
  • 00:19:09
    gone a step further. So, the thing that
  • 00:19:11
    they demoed at the keynote last week is
  • 00:19:14
    essentially saying, well, what if we did
  • 00:19:15
    that, but we just slapped um essentially
  • 00:19:18
    a low code like a natural language
  • 00:19:20
    editor on top of it. So, what if we had
  • 00:19:22
    something like this where we had our
  • 00:19:24
    things and we just tell it what to do
  • 00:19:26
    and it will go and build it for us.
  • 00:19:28
    That's terrifying. That absolutely
  • 00:19:31
    terrifying actually how that changes the
  • 00:19:33
    workflow of what we do. We are in the
  • 00:19:36
    era of vibe engineering my friends. you
  • 00:19:39
    are going to be able to just actually
  • 00:19:40
    build out pipelines component by
  • 00:19:42
    component by just saying what you want
  • 00:19:44
    it to do and it will go and build it
  • 00:19:45
    out. Now I've had interesting
  • 00:19:48
    conversations about this going around
  • 00:19:49
    the are we going to use this forever
  • 00:19:51
    thing? Well, probably not. It would take
  • 00:19:52
    me longer to ask that question of each
  • 00:19:54
    and every single table I'm going to load
  • 00:19:56
    than it would for me to define a generic
  • 00:19:58
    workflow and say and then do that a
  • 00:19:59
    thousand times. But this really really
  • 00:20:03
    lowers the technical barrier. So like
  • 00:20:05
    anyone can go and build these things.
  • 00:20:07
    I'm just slightly terrified that in over
  • 00:20:09
    a year or two essentially we're going to
  • 00:20:11
    have this mad spaghetti mess of
  • 00:20:13
    pipelines that have been built slightly
  • 00:20:14
    differently depending on the syntax and
  • 00:20:16
    the context of how the question was
  • 00:20:17
    asked for each and every separate
  • 00:20:19
    pipeline. So how we use it going to be
  • 00:20:22
    interesting but actually the fact that
  • 00:20:24
    we can use it so much power behind it so
  • 00:20:26
    much use behind it pretty pretty crazy
  • 00:20:29
    and cool. So yeah, LakeFlow designer low
  • 00:20:32
    code and agentic pipeline building on
  • 00:20:36
    top of declarative pipelines. It still
  • 00:20:38
    builds it using declarative pipelines
  • 00:20:40
    for Spark.
  • 00:20:42
    All right, next up we got this thing
  • 00:20:44
    lake bridge previously known as Blade
  • 00:20:46
    Bridge is a company that uh databicks
  • 00:20:48
    acquired which is a AI fueled migration
  • 00:20:51
    tool. So this is something saying if I'm
  • 00:20:53
    going from maybe I've got like an old
  • 00:20:55
    SQL server, I've got Oracle, I've got
  • 00:20:57
    Synapse and I'm trying to say actually
  • 00:20:59
    take it out of that and put it into data
  • 00:21:01
    bricks. It is a lift and shift migrator.
  • 00:21:03
    Now I need to be real real specific
  • 00:21:06
    there. It's a lift and shift migrator.
  • 00:21:08
    So if I had a thousand stored procs
  • 00:21:10
    inside Snowflake and I wanted to get
  • 00:21:12
    that into databicks, it would pick that
  • 00:21:14
    up. It would evaluate it. it would
  • 00:21:16
    translate all of that code into Spark
  • 00:21:18
    SQL compatible code and then create it
  • 00:21:21
    in data bricks as a thousand separate um
  • 00:21:24
    Spark SQL notebooks. So it's not going
  • 00:21:27
    to refactor. It's not going to change it
  • 00:21:28
    into a metadata driven framework. It's
  • 00:21:30
    not necessarily going to modernize any
  • 00:21:32
    of my code. It's just going to be able
  • 00:21:34
    to take my code, make it compatible, and
  • 00:21:37
    get it onto the new platform. It's about
  • 00:21:39
    getting onto the new platform by any
  • 00:21:41
    means possible. So that might not be the
  • 00:21:44
    end. It might be you do a two-phase
  • 00:21:45
    migration. First, you lift and shift to
  • 00:21:47
    get it onto data bricks so you can turn
  • 00:21:49
    off the other system. Then you do
  • 00:21:51
    refactor into the proper end state the
  • 00:21:54
    the way you want it to work eventually.
  • 00:21:55
    But yeah, massive use case for Lake
  • 00:21:57
    Bridge there in terms of just getting
  • 00:21:59
    stuff onto the same platform so you can
  • 00:22:01
    get it inside Unity Catalog and then you
  • 00:22:03
    have a single control plane across
  • 00:22:04
    everything. So few steps to go through.
  • 00:22:07
    It'll do a scan. It'll make a report.
  • 00:22:08
    It'll tell you here's all the migration
  • 00:22:10
    stats. This is what's going to work.
  • 00:22:11
    This is what's not going to work. It
  • 00:22:13
    will do the conversion for you and spit
  • 00:22:14
    it out from a whole bunch of different
  • 00:22:16
    potential languages and sources into uh
  • 00:22:19
    data SQL code and then it'll do a load
  • 00:22:21
    of testing. It'll do source target uh
  • 00:22:24
    comparisons. It will validate it. It
  • 00:22:26
    will go yes this is successfully
  • 00:22:27
    migrated. Really really cool tool that
  • 00:22:29
    is now available as this thing called
  • 00:22:31
    lake bridge.
  • 00:22:33
    Moving on, we got a few
  • 00:22:36
    uh GA announcements. So lakehouse apps
  • 00:22:39
    has gone generally available. So
  • 00:22:41
    obviously part of the whole story in
  • 00:22:43
    terms of where database has been going
  • 00:22:44
    over the past year is saying well yeah
  • 00:22:46
    you build AI but people need something
  • 00:22:48
    to be able to interact with AI you built
  • 00:22:50
    an agent great but where do they go to
  • 00:22:52
    type things you you need some kind of
  • 00:22:54
    user interface that's essentially what
  • 00:22:55
    lakehouse apps is so it is that thing
  • 00:22:58
    that allows us to use a load a bunch of
  • 00:23:00
    um common tooling and frameworks such as
  • 00:23:03
    streamlit and python radio those kind of
  • 00:23:05
    things and now also javascript if you
  • 00:23:07
    want to build a react web app front end
  • 00:23:10
    you can do that in lakehouse apps and
  • 00:23:12
    lakehouse apps itself sorry data bricks
  • 00:23:14
    apps is now generally available so you
  • 00:23:16
    can go and do that big piece of the
  • 00:23:18
    puzzle kind of fix there and then
  • 00:23:20
    there's another one on the data clean
  • 00:23:21
    rooms side clean rooms if you're not
  • 00:23:24
    familiar with it is this essentially
  • 00:23:26
    this third party data collaboration idea
  • 00:23:29
    so I can have two data rich workspaces
  • 00:23:31
    that each delta share data into this no
  • 00:23:33
    man's land in the middle I can run some
  • 00:23:36
    scripts on it in a very controlled way
  • 00:23:38
    and get some outputs the important part
  • 00:23:40
    is neither collaborator can see the
  • 00:23:42
    other person's data. We can take two
  • 00:23:45
    massively sensitive bits of data from
  • 00:23:46
    both sides, put it together, get the
  • 00:23:49
    outputs, and view the aggregate level
  • 00:23:50
    data without the other person actually
  • 00:23:52
    seeing the low-level transactional data.
  • 00:23:54
    I don't have to give away my sensitive
  • 00:23:56
    information in order to collaborate with
  • 00:23:58
    people. That's clean rooms itself. So,
  • 00:24:00
    that went g back January I think this
  • 00:24:02
    year, but it only allowed for two
  • 00:24:04
    collaborators and it that wasn't the
  • 00:24:07
    that wasn't the dream. That wasn't the
  • 00:24:08
    story we were sold. So as part of the
  • 00:24:10
    announcements last week, you can have up
  • 00:24:12
    to nine different collaborators from
  • 00:24:14
    different clouds. So we can have Azure
  • 00:24:17
    ADS, Azour, ADS, GCP all collaborating
  • 00:24:20
    in the same clean room with the same
  • 00:24:22
    level of security and controls around
  • 00:24:24
    it. Such massively more powerful in
  • 00:24:26
    terms of the kind of things you can do.
  • 00:24:28
    And previously you could never run the
  • 00:24:31
    code if you wrote the code. I could
  • 00:24:33
    submit my notebook and the other
  • 00:24:35
    collaborator would have to execute it.
  • 00:24:36
    you would never allowed to self-run your
  • 00:24:38
    code. They've added the ability to do
  • 00:24:40
    that now if it is approved by the other
  • 00:24:42
    collaborator. So if the other person
  • 00:24:44
    trusts you, they go, "Yeah, yeah, you
  • 00:24:45
    can run your own code. You just go and
  • 00:24:46
    do it." Just so you got a faster dev
  • 00:24:48
    cycle, but only if that's the way you
  • 00:24:51
    want to work. Otherwise, it will work in
  • 00:24:52
    the normal way, which is a person cannot
  • 00:24:55
    write and run the code. So yeah, some
  • 00:24:58
    nice updates to clean room just to make
  • 00:24:59
    it a little bit more accessible, more
  • 00:25:00
    functional, more more powerful.
  • 00:25:03
    Now the next thing that's gone is one of
  • 00:25:05
    my favorite things. databicks genie or
  • 00:25:06
    AIB genie that is the talk with your
  • 00:25:09
    data. It is the chatbot that writes SQL
  • 00:25:12
    for you and interacts with your data
  • 00:25:14
    that has gone generally available. You
  • 00:25:16
    can go and use it in Angular, use it in
  • 00:25:17
    production, put it out in the world.
  • 00:25:19
    Most people I know are already using it
  • 00:25:20
    in production. Would be surprised that
  • 00:25:22
    it's now just gone GA. But there's a
  • 00:25:25
    bunch of things in there. There's a load
  • 00:25:26
    of new features that kind of snuck out.
  • 00:25:28
    I didn't really see them in some of the
  • 00:25:29
    announcements, but the kind of uh I've
  • 00:25:31
    scraped this off some of their
  • 00:25:32
    documentation. They put a bunch of
  • 00:25:33
    things in there. So firstly they've
  • 00:25:35
    added some of this stuff which allows
  • 00:25:36
    you to do data scanning. So if I've got
  • 00:25:39
    if I was writing a filter statement and
  • 00:25:42
    I think the example they've got on the
  • 00:25:43
    website if I was going talking about the
  • 00:25:45
    country and I got genie to generate the
  • 00:25:47
    SQL script for me like well go tell me
  • 00:25:49
    all the results from Florida and that
  • 00:25:51
    would go it would write it out and it go
  • 00:25:52
    well where country equals Florida. But
  • 00:25:55
    actually what it doesn't know because
  • 00:25:57
    Genie normally never sees your data is
  • 00:25:59
    actually have stored all that data as
  • 00:26:02
    state codes. So yeah, country state
  • 00:26:04
    state equals Florida. Florida's not a
  • 00:26:06
    country. So I'd have it so that actually
  • 00:26:09
    state should be FL, not Florida. But it
  • 00:26:11
    didn't know that. But I can turn this
  • 00:26:13
    thing on now. I can say go do some data
  • 00:26:15
    sampling. It will build a value
  • 00:26:16
    dictionary. And now it knows exactly
  • 00:26:18
    what the categories it has to choose
  • 00:26:19
    from. And it will use that to make
  • 00:26:21
    better, more intelligent decisions in
  • 00:26:22
    the SQL it generates. So data sampling
  • 00:26:24
    one thing. But you are going to be
  • 00:26:26
    sending your data through to the large
  • 00:26:27
    language model. You need to accept that
  • 00:26:30
    if you're going to use data sampling.
  • 00:26:32
    That's number one new thing. Now, number
  • 00:26:35
    two, new thing is just an improvement to
  • 00:26:36
    the way we're actually getting feedback.
  • 00:26:38
    So, previously we had like a thumbs up,
  • 00:26:39
    thumbs down kind of thing and people
  • 00:26:40
    would give it a thumbs down, they'd
  • 00:26:41
    never take a thumbs up. Um, but is that
  • 00:26:44
    a really hard thing to go, well, I can
  • 00:26:45
    say thumbs down cuz I'm not sure, but I
  • 00:26:47
    can't really articulate why. So, this
  • 00:26:50
    whole thing of saying send it for
  • 00:26:51
    review, give it like a medium, I'm not
  • 00:26:54
    really sure, give it an explanation, and
  • 00:26:56
    there's a whole review workflow. So,
  • 00:26:57
    it's just making the idea, the act of
  • 00:27:00
    looking after a genie space just a
  • 00:27:02
    little bit more reactive, a little bit
  • 00:27:03
    more collaborative, a little bit of back
  • 00:27:04
    and forth. Oh, actually, they flagged it
  • 00:27:07
    for a review because they didn't quite
  • 00:27:08
    understand what it was doing. It's
  • 00:27:09
    actually doing the right thing. It's an
  • 00:27:10
    education issue. Or they flagged it for
  • 00:27:11
    review. No, they're right to that's
  • 00:27:13
    wrong. Let's go and fix it. Just just a
  • 00:27:15
    better feedback mechanism baked into
  • 00:27:17
    Genie itself.
  • 00:27:18
    Uh, we've got this thing suggested
  • 00:27:21
    queries. So not from the users within
  • 00:27:24
    Genie itself but actually from how
  • 00:27:25
    people are using those tables in Unity
  • 00:27:27
    catalog. So looking at popular queries
  • 00:27:29
    looking at the most recent queries it'll
  • 00:27:31
    actually come back and go hey look guys
  • 00:27:33
    in smart ways people are actually using
  • 00:27:35
    this data elsewhere why don't you add
  • 00:27:38
    these as the sample queries inside the
  • 00:27:40
    genie room just to give people a bit
  • 00:27:41
    more inspiration just to tweak it a
  • 00:27:43
    little bit give it a bit more of a nudge
  • 00:27:44
    in the right direction. So,
  • 00:27:46
    automatically generated query
  • 00:27:48
    suggestions inside of Genie that's going
  • 00:27:51
    to be coming soon. We're going to see
  • 00:27:52
    that appearing, which is cool. But it
  • 00:27:54
    gets crazier. So, we got more stuff that
  • 00:27:57
    we've got uh which is this idea for like
  • 00:28:00
    clarification. So, we've had
  • 00:28:01
    clarifications in the past. It comes
  • 00:28:02
    back and it asks a question. And if we
  • 00:28:04
    clarify, that's then just just used
  • 00:28:07
    within that chat. But actually, now
  • 00:28:09
    we've got this whole idea of going,
  • 00:28:11
    well, there's there's more stuff we can
  • 00:28:13
    do. I can say, yes, that was right. It's
  • 00:28:15
    going to look at the fact that was right
  • 00:28:16
    and go, "Wow, well, if that was right,
  • 00:28:18
    why don't I actually remember the fact
  • 00:28:21
    that there's this measure, this filter,
  • 00:28:23
    this this whole way of working. Why
  • 00:28:25
    don't I remember this next time? Do you
  • 00:28:27
    want to curate this idea of a set of
  • 00:28:29
    metrics, set of dimensions that we're
  • 00:28:32
    actually going to build up over time?"
  • 00:28:34
    again that's coming into Genie kind of
  • 00:28:36
    some of the announcements some of the
  • 00:28:37
    blogs they've put out there just a
  • 00:28:39
    better way to build up better semantic
  • 00:28:42
    definitions inside Genie based on user
  • 00:28:44
    interactions and again that can have a
  • 00:28:46
    whole approval workflow and people to
  • 00:28:47
    reject it and people to curate it over
  • 00:28:49
    time interesting stuff happening not as
  • 00:28:52
    crazy uh as this whole new deep research
  • 00:28:55
    bun which is the yeah I'm going to ask a
  • 00:28:58
    complex how do you optimize our
  • 00:28:59
    marketing funnel that's a complex
  • 00:29:01
    question it's going to go well I'd break
  • 00:29:03
    it down I'd run all these different
  • 00:29:04
    queries. I get data from there. I look
  • 00:29:06
    at it. I make data from there. I then
  • 00:29:07
    analyze it and make data from there.
  • 00:29:09
    It's basically doing a research
  • 00:29:11
    function. It's then going to run all
  • 00:29:13
    those queries. It's going to compile the
  • 00:29:15
    results of it. It's going to interpret
  • 00:29:17
    the results of it. It's then going to
  • 00:29:18
    actually build it into basically a white
  • 00:29:21
    paper essentially a here's all the ways
  • 00:29:24
    that we can actually work. Now, this is
  • 00:29:26
    very different to how genies worked
  • 00:29:27
    previously. Genie previously has just
  • 00:29:29
    been a SQL generator where it never sees
  • 00:29:31
    the the data. It just runs the code and
  • 00:29:33
    you see the results. This is it running
  • 00:29:35
    the results, interpreting the results,
  • 00:29:37
    and actually writing recommendations and
  • 00:29:39
    building that out for you as something
  • 00:29:41
    you can then take away and use. So, deep
  • 00:29:43
    research mode is huge. Not out yet, but
  • 00:29:47
    announced as something that is coming
  • 00:29:49
    and go find it on the Genie websites.
  • 00:29:51
    Yeah, bunch of stuff going on in the
  • 00:29:54
    world of Genie. We've got to crack on.
  • 00:29:56
    There are so many other things we need
  • 00:29:57
    to talk about. Now one of the big
  • 00:29:59
    announcements which is an interesting
  • 00:30:01
    one is datab bricks free edition. Now
  • 00:30:04
    that was met with some confusion. So
  • 00:30:06
    there has historically always been data
  • 00:30:08
    bricks community edition. Now that was a
  • 00:30:11
    very cut down version of datab bricks
  • 00:30:13
    that was great if you were learning
  • 00:30:14
    pispark or scala. You could b
  • 00:30:17
    essentially log on have like a single
  • 00:30:19
    box single core little spark cluster.
  • 00:30:21
    You could write some queries on it and
  • 00:30:23
    it was great for if you're just learning
  • 00:30:24
    how pispark works. It wasn't great if
  • 00:30:26
    you're learning data bricks. So it
  • 00:30:28
    didn't have all the other features. You
  • 00:30:30
    couldn't go and use DT now declarative
  • 00:30:32
    pipelines. You couldn't go and actually
  • 00:30:33
    play around with any of the AI stuff. It
  • 00:30:36
    was purely a little Spark playground to
  • 00:30:38
    help you learn Spark. So last week data
  • 00:30:40
    bricks announced free edition which is a
  • 00:30:42
    completely free edition of fully
  • 00:30:45
    featured data bricks. Now it only has a
  • 00:30:47
    tiny sliver of compute still. They're
  • 00:30:48
    not giving away their entire product for
  • 00:30:50
    free but actually it is much more fully
  • 00:30:53
    featured than community edition. So if
  • 00:30:55
    you're learning, if you're a student, if
  • 00:30:57
    you're trying to do some sandboxing in
  • 00:30:59
    your spare time to just try and get
  • 00:31:00
    ahead and understand these things, you
  • 00:31:02
    can use data free edition to go and do
  • 00:31:04
    that. Now, it's not meant for
  • 00:31:06
    businesses. It's not meant to be you're
  • 00:31:08
    a data influencer running a boot camp
  • 00:31:11
    and you want this to actually pay for
  • 00:31:12
    all your costs. That's not what it's
  • 00:31:14
    for. This is for students, for personal
  • 00:31:16
    things to use to be able to teach
  • 00:31:18
    yourself. Maybe you're going and like
  • 00:31:20
    you're learning and you're following
  • 00:31:21
    some training. Absolutely. as a student,
  • 00:31:22
    you can use this free edition to
  • 00:31:24
    actually do your learning. If you're
  • 00:31:25
    doing any of the Daily Rick searchs, you
  • 00:31:27
    can go and use this to do your learning.
  • 00:31:29
    Loads of good stuff in there. So, that
  • 00:31:30
    is Daily Rick free edition. Subtly
  • 00:31:33
    different to community edition cuz it's
  • 00:31:35
    actually got all the features in there,
  • 00:31:37
    which is cool. That's available now.
  • 00:31:40
    Now, the other thing that was announced
  • 00:31:42
    is one of the biggest complaints that
  • 00:31:44
    there's always been about data bricks is
  • 00:31:46
    a yeah, but we can't show it to the
  • 00:31:48
    business users. We can't let our execs
  • 00:31:50
    log into it. So even with AIBI
  • 00:31:52
    dashboards, even with Genie, with all
  • 00:31:54
    the the new features which are trying to
  • 00:31:56
    take that data consumer role, it's
  • 00:31:58
    trying to engage directly with the
  • 00:31:59
    business, we've always had this
  • 00:32:01
    complaints that we hear from our clients
  • 00:32:03
    going, "Yeah, but I need to pick it up
  • 00:32:05
    and put it somewhere else because I
  • 00:32:08
    can't show that to my exec. It looks too
  • 00:32:09
    technical. It's too intimidating. Got
  • 00:32:11
    too much stuff in there. It's too busy
  • 00:32:13
    on the screen. It feels intimidating and
  • 00:32:16
    it feels not welcoming."
  • 00:32:19
    That's where this thing comes in. So
  • 00:32:22
    brand spanking new is this idea of data
  • 00:32:24
    bricks one essentially a a rounder a
  • 00:32:27
    nice a softer a more businessfacing
  • 00:32:29
    portal into data bricks. Think about
  • 00:32:31
    this as your reporting portal where
  • 00:32:33
    you've built all your code. You've built
  • 00:32:35
    all your reports. Other users can log in
  • 00:32:37
    through data one and just see this nice
  • 00:32:40
    clean shiny version to access things
  • 00:32:42
    they've been given access to. So I can
  • 00:32:44
    go in I've got this things by domain. I
  • 00:32:46
    can go view my dashboards. I can do my
  • 00:32:47
    cross filtering and things. It's still
  • 00:32:49
    lake view dashboards. It's still a
  • 00:32:51
    dashboard that I built inside a
  • 00:32:52
    fullyfledged data bricks. It's just this
  • 00:32:54
    is a way for my business users to come
  • 00:32:55
    in and interact with these things. I've
  • 00:32:58
    got the whole idea of being able to work
  • 00:33:00
    via my different domains. I can see
  • 00:33:02
    different ways of organizing my data. I
  • 00:33:04
    can go and do my data discovery and
  • 00:33:05
    actually find out different things
  • 00:33:06
    inside Unity Catalog using I can use it
  • 00:33:08
    as a discovery tool. Just just far
  • 00:33:12
    better ways of interacting with our
  • 00:33:14
    users.
  • 00:33:15
    We can have Genie baked into it. So if
  • 00:33:17
    you're seeing all those new features
  • 00:33:18
    about Genie going, "Oh, that's cool. Ah,
  • 00:33:20
    but my users would never want to use
  • 00:33:21
    it." Well, they can use it through data
  • 00:33:24
    one and it's just a cleaner, much more
  • 00:33:26
    streamlined experience. Still using
  • 00:33:28
    Genie under the hood, still works the
  • 00:33:30
    same way any of their responses and
  • 00:33:32
    their kind of review quotes, which feed
  • 00:33:34
    back into the normal Genie room that you
  • 00:33:35
    administer through fully fledged data
  • 00:33:38
    bricks. It's just just how business
  • 00:33:41
    users can interact with it. Huge amounts
  • 00:33:43
    of stuff going on there. So we're
  • 00:33:44
    expecting data bricks one to be
  • 00:33:46
    absolutely massive in terms of uptake
  • 00:33:48
    from our clients just because it is a
  • 00:33:50
    much more engaging much more
  • 00:33:53
    approachable way of using data bricks
  • 00:33:56
    right bunch of other things moving on
  • 00:33:58
    we've got unity catalog itself has a
  • 00:34:00
    load of new things going in there now I
  • 00:34:02
    kind of mentioned this when we're
  • 00:34:03
    talking about genie you've last year
  • 00:34:05
    they mentioned this thing called uni uh
  • 00:34:07
    unity catalog metrics and we've got a
  • 00:34:10
    load more information we've actually see
  • 00:34:11
    it out in the wild now we can go and
  • 00:34:12
    have a play with You've got Unity
  • 00:34:14
    catalog metric views is how it's
  • 00:34:16
    eventually been released. Essentially,
  • 00:34:18
    you define a metric view with saying,
  • 00:34:20
    well, here are my measures. Here are my
  • 00:34:22
    dimensions. Here are my filter
  • 00:34:23
    statements. And that just means it's
  • 00:34:25
    much more like a cube. It means, well,
  • 00:34:26
    I've got this measure, but if I I can
  • 00:34:28
    cut and slice my data, and it will
  • 00:34:30
    calculate the measure based on the
  • 00:34:32
    filter context of the various different
  • 00:34:33
    dimensions I'm using, much like any
  • 00:34:35
    other semantic model in other tools.
  • 00:34:38
    Now, those are coming and going to be
  • 00:34:40
    baked into uh Unity Catalog. they'll be
  • 00:34:42
    consumed by Genie so you can start to
  • 00:34:44
    see and you saw Genie picking up some of
  • 00:34:46
    itself and then we can go and push that
  • 00:34:48
    out. So loads of things happening on
  • 00:34:49
    that space. I forgot the slides. Yeah,
  • 00:34:51
    icebergs in there. We talked about that
  • 00:34:52
    already. This is what I want to talk
  • 00:34:55
    about the idea of uni uh unity catalog
  • 00:34:57
    metrics. Now there's already a load of
  • 00:35:00
    announcements about a load of different
  • 00:35:01
    BI tools that are going to be able to
  • 00:35:02
    consume it. So if you're in Sigma,
  • 00:35:05
    Tableau, you're in Thorpspot, you'll be
  • 00:35:07
    able to bring in data from Unity Catalog
  • 00:35:10
    and it will have awareness of those
  • 00:35:11
    metrics. Now there's a big green F
  • 00:35:14
    missing from this diagram where we don't
  • 00:35:17
    see anything to do with PowerBI or
  • 00:35:18
    fabric. So we're really curious where
  • 00:35:20
    that is on the road map. I've not found
  • 00:35:21
    out where it is on the road map. I will
  • 00:35:23
    hopefully do soon. Um but that's a
  • 00:35:25
    missing piece currently. But most BI
  • 00:35:28
    tools in the stack can use these
  • 00:35:29
    metrics. So it's basically you can kind
  • 00:35:31
    of rather than have all your metrics and
  • 00:35:32
    logic defined downstream in your
  • 00:35:34
    different tools. And if you're using
  • 00:35:36
    Thorbot and Tableau and Sigma, you might
  • 00:35:39
    have those metrics defined three times
  • 00:35:41
    in each of those different ones. You can
  • 00:35:43
    pull that upstream into your lake into
  • 00:35:45
    your gold layer, store it actually next
  • 00:35:47
    to your data and then whatever's
  • 00:35:49
    consuming it downstream, you've just got
  • 00:35:50
    one definition of your KPIs, which is
  • 00:35:52
    what we want in life. So really cool.
  • 00:35:55
    Loads of stuff happening there. I've got
  • 00:35:57
    this idea of domains. So looking at
  • 00:35:59
    Unity catalog be able to actually say
  • 00:36:00
    well these different objects all part of
  • 00:36:02
    this domain helping us with the whole
  • 00:36:04
    kind of product ownership idea helping
  • 00:36:06
    us with the slightly uh distributed mesh
  • 00:36:08
    idea that people are going for lots of
  • 00:36:10
    stuff in there which is just help.
  • 00:36:13
    We got this whole idea of a data quality
  • 00:36:15
    monitoring tool that will go and
  • 00:36:17
    actively monitor my data quality in my
  • 00:36:19
    different tables. We've seen that with
  • 00:36:21
    lakehouse monitoring but that was like
  • 00:36:23
    such a super deep data profiling scan.
  • 00:36:25
    It wasn't really just the I just want a
  • 00:36:27
    little bit of information about the
  • 00:36:29
    quality of everything. This is
  • 00:36:31
    absolutely that kind of just keep track
  • 00:36:33
    of my data quality. Look at the
  • 00:36:35
    freshness. Look at the completeness.
  • 00:36:36
    Like nice good chunky data quality
  • 00:36:38
    metrics. It can just run on top of
  • 00:36:40
    everything and give me a nice dashboard
  • 00:36:42
    that is coming. Loads of stuff we've
  • 00:36:43
    seen about that. And then yeah, a bunch
  • 00:36:45
    of other stuff inside Unity. We've got
  • 00:36:48
    the ability to certify a data set and
  • 00:36:50
    say, "Oh, that's that's certified. That
  • 00:36:52
    table is not certified. That table used
  • 00:36:54
    to be the good one. It's now deprecated.
  • 00:36:56
    So, we're just just adding a bit more
  • 00:36:58
    information for our users so they can
  • 00:37:01
    use it as a data discovery tool. It's
  • 00:37:03
    moving unity catalog from being a
  • 00:37:04
    technical catalog to actually a data
  • 00:37:07
    discovery tool, a business facing
  • 00:37:09
    catalog through data bricks one, all
  • 00:37:11
    that good stuff. Uh got a request for
  • 00:37:13
    access workflow. So, if I if I discover
  • 00:37:15
    a table, I don't have access. Well,
  • 00:37:18
    request access much like any other
  • 00:37:19
    governance tool. Just seeing these
  • 00:37:21
    things come in. uh aback attribute-based
  • 00:37:24
    access control we we've heard talked
  • 00:37:26
    about by data bricks in previous summits
  • 00:37:28
    that's the thing where if I tag a table
  • 00:37:29
    as sensitive I can have a control ro
  • 00:37:32
    saying you have access to sensitive data
  • 00:37:34
    no one else has access to sensitive data
  • 00:37:36
    and then just by tagging that table that
  • 00:37:38
    security will be applied to it so aback
  • 00:37:40
    we're actually seeing loads of different
  • 00:37:42
    use cases for all things like GDPR and
  • 00:37:44
    sensor data sure but even tagging it to
  • 00:37:46
    a domain actually you can then have
  • 00:37:48
    controls around the domains that
  • 00:37:50
    separates it gives you a different
  • 00:37:51
    dimension control away from oh this
  • 00:37:54
    catalog this schema these tables you
  • 00:37:56
    might have tables across all your
  • 00:37:57
    different schemas that you want to give
  • 00:37:58
    one ro access to and you don't want to
  • 00:38:00
    have to do that manual on each table you
  • 00:38:02
    can do that via aback which is currently
  • 00:38:04
    in beta so it's not actually GA tag
  • 00:38:07
    policies you're not allowed to put a
  • 00:38:09
    table in unless you've actually followed
  • 00:38:10
    all the policies that's good data
  • 00:38:11
    classification scanning like just oh
  • 00:38:14
    someone loaded that table but that's an
  • 00:38:15
    email address that's that's PII data or
  • 00:38:17
    that's a bank account don't share that
  • 00:38:20
    via Genie Having that automatically tag
  • 00:38:23
    it means you can do that reactive
  • 00:38:24
    governance. It means you can just be sat
  • 00:38:26
    there with a control panel and having
  • 00:38:28
    things flag up and link those two
  • 00:38:30
    together. Oh, it's going to classify
  • 00:38:32
    something as sensitive because it's
  • 00:38:33
    found a bank number. It's going to apply
  • 00:38:35
    attribute based access control and lock
  • 00:38:37
    that table down before it gets exposed
  • 00:38:39
    and excfiltrated. All of that coming on
  • 00:38:42
    the catalog road map that we've had
  • 00:38:44
    talked through. Oh, we are nearly there
  • 00:38:46
    my friends. Do not worry. final section
  • 00:38:49
    is talking about the changes in Mosaic
  • 00:38:51
    AI and there's probably eight chunky
  • 00:38:53
    areas that we need to briefly talk
  • 00:38:55
    through. There's a load of stuff going
  • 00:38:57
    on. One of which I've talked to already
  • 00:38:58
    which is the whole agent bricks thing
  • 00:39:00
    that is huge but it was big enough that
  • 00:39:02
    I just pulled it out and did a separate
  • 00:39:03
    little run through it. So agent bricks
  • 00:39:05
    is coming currently only available in
  • 00:39:07
    certain data bricks regions. We're going
  • 00:39:08
    to see that slowly rolling out much like
  • 00:39:10
    any other data bricks feature. uh AI
  • 00:39:12
    functions. They're the SQL functions
  • 00:39:14
    that you write inside of uh Spark SQL in
  • 00:39:17
    data bricks. So all using all of your
  • 00:39:19
    like uh AI extract, AI pass, document,
  • 00:39:22
    all those kind of things. We're actually
  • 00:39:24
    seeing they've done a load of work to
  • 00:39:26
    make them much faster. So we're seeing
  • 00:39:28
    them what three times faster, four times
  • 00:39:29
    cheaper, doing them in bulk mode rather
  • 00:39:32
    than calling it incrementally. We're
  • 00:39:33
    just seeing a load of improvements
  • 00:39:35
    around it. See people adopt it more and
  • 00:39:37
    do more stuff with it. uh vector search
  • 00:39:40
    just before summit kicked off we saw an
  • 00:39:42
    announcement of storage optimized vector
  • 00:39:44
    search which had to carve out a load and
  • 00:39:45
    put it down uh into cheaper storage
  • 00:39:48
    which just makes the size and scale of
  • 00:39:50
    it so much more but also means it's much
  • 00:39:53
    much cheaper. So what a 7x cost
  • 00:39:55
    reduction and a massive increase in the
  • 00:39:57
    number of vectors you can have inside
  • 00:39:58
    your vector database. So if you're
  • 00:40:00
    building a rag architecture and you
  • 00:40:02
    struggle with how many embeddings you're
  • 00:40:04
    trying to get into that vector database,
  • 00:40:05
    well, you've now suddenly got a load
  • 00:40:08
    more space you can do if you're using
  • 00:40:09
    that storage optimized mode. So loads of
  • 00:40:11
    stuff around there. Oh, my little in
  • 00:40:14
    picture is hiding the title of this one.
  • 00:40:16
    Serverless GPUs
  • 00:40:18
    has been a big deal if you're dealing
  • 00:40:20
    with uh Spark clusters and you're trying
  • 00:40:22
    to sped it up with a GPU cuz you're
  • 00:40:24
    trying to do something that would
  • 00:40:25
    actually benefit from having it in
  • 00:40:26
    there. lots of your neural network kind
  • 00:40:28
    of AI workloads need a GPU to go fast be
  • 00:40:32
    really hard to get a hold of them
  • 00:40:33
    whereas now actually we're seeing datix
  • 00:40:36
    are rolling out serverless GPUs so if
  • 00:40:37
    you're using serverless you're like oh I
  • 00:40:39
    wish that was a GPU you can now going to
  • 00:40:41
    be able to start using it again only in
  • 00:40:42
    certain regions so far but we'll see
  • 00:40:44
    that rolled out as they find more GPUs
  • 00:40:47
    hidden down the back of the sofa
  • 00:40:48
    somewhere uh model serving has gotten
  • 00:40:51
    much faster so looking at what 25,000
  • 00:40:54
    QPS queries per second Just the size and
  • 00:40:58
    scale of how many things we can serve at
  • 00:41:01
    the same time has kind of exploded.
  • 00:41:02
    Loads and loads of stuff going on there.
  • 00:41:04
    Uh AI gateway is a massive thing for
  • 00:41:07
    productionizing. So many people just
  • 00:41:09
    build a PC and they don't get it into
  • 00:41:11
    production. AI gateway is allowing us to
  • 00:41:13
    do things like um throughut uh
  • 00:41:16
    bottlenecking. It's allowing to do
  • 00:41:17
    failover in case we get a bounce back
  • 00:41:19
    from a web point. just all of the good
  • 00:41:22
    security and management essentially
  • 00:41:24
    around managing an API baked into data
  • 00:41:27
    bricks allowing us to host several
  • 00:41:29
    different models and put certain bits of
  • 00:41:31
    traffic through to different models and
  • 00:41:34
    manage that over time all in AI gateway
  • 00:41:36
    loads of stuff going inside there really
  • 00:41:38
    really important if you're building
  • 00:41:40
    agents if you're building large language
  • 00:41:41
    model integrations if you're building
  • 00:41:43
    traditional IML lots of stuff inside
  • 00:41:45
    there ML 3.0 Oh, so there's a new
  • 00:41:48
    release of ML. I feel sad that MLFlow is
  • 00:41:50
    so far down the list of the things that
  • 00:41:52
    we're doing. Loads of stuff uh going in
  • 00:41:54
    there like lots of nice new features. Uh
  • 00:41:56
    one of the big ones that we called out
  • 00:41:58
    is prompt versioning. So if I was like
  • 00:42:00
    writing crafting the perfect prompt to
  • 00:42:03
    actually put into my agentic workflow,
  • 00:42:05
    uh and then I went back and I tweaked it
  • 00:42:07
    and I changed it and I changed it, I
  • 00:42:08
    didn't really have a way of tracking
  • 00:42:10
    over time what happened as I changed
  • 00:42:12
    that prompt. So the automatic storing of
  • 00:42:14
    different versions of a prompt as you go
  • 00:42:16
    through that workflow is now just part
  • 00:42:18
    of MLflow. Much like when they first
  • 00:42:20
    brought in automatic uh versioning of
  • 00:42:23
    notebooks when I was in an experiment,
  • 00:42:24
    they've now actually just getting more
  • 00:42:26
    and more of the whole LLM ops story into
  • 00:42:30
    MLflow. Lots of stuff happening there.
  • 00:42:32
    And finally MCP support coming in. So
  • 00:42:34
    MCP support that kind of like common
  • 00:42:37
    language to get agents to talk to each
  • 00:42:39
    other. So I can just really really
  • 00:42:40
    quickly add more and more different
  • 00:42:42
    integrations to other APIs to other
  • 00:42:44
    tools to other things using MCP as that
  • 00:42:46
    common integration protocol is being
  • 00:42:48
    rolled out. So that's now available
  • 00:42:50
    within data bricks
  • 00:42:53
    bunch of stuff there. Now all those
  • 00:42:55
    different ones again AI gateway and ML
  • 00:42:58
    flow 3.0 they are GA they are out in
  • 00:43:00
    anger. Everything else is preview or
  • 00:43:02
    incremental changes. They're still
  • 00:43:03
    rolling out. Some of them are in beta.
  • 00:43:05
    Do be aware not everything is absolutely
  • 00:43:07
    out in production currently. But yeah,
  • 00:43:10
    just huge amounts of stuff going on
  • 00:43:12
    currently in the world.
  • 00:43:16
    Okay,
  • 00:43:18
    we made it. That is my big old list of
  • 00:43:20
    everything that was announced across six
  • 00:43:22
    hours of keynotes, two different
  • 00:43:24
    keynotes on the data last week. As I'm
  • 00:43:28
    sure you're with me going, "Wow, that's
  • 00:43:30
    a lot of stuff." Lake Base alone
  • 00:43:33
    bringing OLTP using separation of
  • 00:43:35
    service and compute is a massively huge
  • 00:43:38
    step. What data bricks are trying to do
  • 00:43:41
    lake bridge amazing for trying to do
  • 00:43:42
    super quick migrations although we need
  • 00:43:44
    to then refactor it and get rid of the
  • 00:43:46
    technical debt. Agent bricks just
  • 00:43:49
    completely blowing open the idea of
  • 00:43:51
    agents and meaning anyone could build an
  • 00:43:53
    agent really easily. DT suddenly being
  • 00:43:56
    open sourced and renamed into clarative
  • 00:43:58
    pipelines.
  • 00:44:00
    Genie getting loads of extra features
  • 00:44:01
    announced that we're slowly going to see
  • 00:44:03
    over the next few months it getting
  • 00:44:05
    better and better and better and better.
  • 00:44:07
    Loads and loads and loads of things
  • 00:44:08
    happening. Databicks one I'm expecting
  • 00:44:10
    to be fairly huge even though it's just
  • 00:44:12
    a nicer UI. It's just a nicer UI for
  • 00:44:15
    business people but that is going to
  • 00:44:18
    drive mass business adoption and rather
  • 00:44:21
    it being you do a load of stuff in data
  • 00:44:22
    bricks and then someone else access it
  • 00:44:24
    from somewhere else. It's just bringing
  • 00:44:26
    people onto that same platform so we're
  • 00:44:28
    all working in the same place.
  • 00:44:30
    huge amounts of stuff going on. Now,
  • 00:44:32
    obviously, we have skimmed the surface.
  • 00:44:34
    Some of that we've just there's some
  • 00:44:35
    marketing slides and that's all we know
  • 00:44:37
    about it. Some of it uh we've been
  • 00:44:39
    tinkering and playing with and we were
  • 00:44:40
    now actually allowed to talk about it.
  • 00:44:42
    So, we've got follow-up videos planned
  • 00:44:43
    with a whole bunch of these features.
  • 00:44:45
    So, bear with me. I'll be back on making
  • 00:44:48
    videos over the next few weeks trying to
  • 00:44:50
    catch up with all of this stuff. And
  • 00:44:52
    yeah, if there's anything that you
  • 00:44:53
    really want to go, tell us more about
  • 00:44:55
    that. If I can, we will. Let us know
  • 00:44:57
    down in the comments what you think,
  • 00:44:58
    which is your favorite feature that's
  • 00:45:00
    been announced. And as always, don't
  • 00:45:02
    forget to like and subscribe. Cheers.
Tags
  • Lakebase
  • Agent Bricks
  • Spark 4.0
  • Edisi Percuma Databricks
  • Unity Catalog
  • Genie
  • Lake Bridge
  • Lakeflow
  • AI Gateway
  • MLflow 3.0