Data governance in the AI era

00:45:30
https://www.youtube.com/watch?v=3A855rN_9pE

摘要

TLDRThe session on data governance emphasizes the vital role of quality data in successfully leveraging AI technologies. With speakers Cynthia Gums from Ford and Steve Jared from Orange, insights are shared on the challenges faced companies today such as dark data and governance complexities. Solutions like DataX from Google Cloud are highlighted, showcasing capabilities like automated cataloging and intelligent data management, which help organizations better manage their data. Furthermore, the session discusses the evolving approaches to data discovery and governance within both organizations through user-centric interfaces and AI enhancements.

心得

  • 🌐 Data governance is critical in the age of AI.
  • 🔍 66% of organizations report having dark data.
  • 🔑 DataX automates data governance at scale.
  • 📊 Quality data is vital for effective AI outputs.
  • 🚀 Ford and Orange share their data governance journeys.
  • 🛠️ Automated cataloging enhances data discovery.
  • 🌍 Data democracy improves data accessibility.
  • ⚙️ Governance rules simplify compliance management.
  • 🤖 AI aids in metadata enrichment and quality checks.
  • 📈 Continuous user feedback improves data platforms.

时间轴

  • 00:00:00 - 00:05:00

    The session on data governance in the age of AI features speakers from Ford and Orange, discussing the importance of data governance as AI technology evolves. The agenda includes an introduction, case studies, and updates on data governance tools.

  • 00:05:00 - 00:10:00

    Data is essential for AI, but many organizations struggle with 'dark data' and data quality issues. A significant percentage of organizations report that much of their data is not utilized, leading to challenges in data governance due to the complexity of data landscapes.

  • 00:10:00 - 00:15:00

    Google Cloud's DataX is introduced as a solution for automating data governance and management. It integrates with various services to provide unified metadata, centralized security, and intelligent data management features, helping organizations build trust in their data.

  • 00:15:00 - 00:20:00

    Cynthia from Ford discusses her role in data discovery and classification, emphasizing the importance of data governance in creating a single source of truth. Ford's data platform, powered by Google Cloud, aims to organize data sources and improve data accessibility.

  • 00:20:00 - 00:25:00

    Ford's data governance strategy includes using DataX for capturing metadata and implementing data lineage to understand data origins and life cycles. They are also working on automating metadata enrichment to enhance data discovery.

  • 00:25:00 - 00:30:00

    Cynthia highlights the challenges of user experience in data discovery, noting the need for tailored interfaces for different user personas. Ford has developed a custom data discovery hub to improve user engagement and data accessibility.

  • 00:30:00 - 00:35:00

    Steve from Orange shares their journey in data governance, emphasizing the need to break down data silos and improve data accessibility across their operations in 26 countries. They aim to create a data democracy to enhance data utilization.

  • 00:35:00 - 00:40:00

    Orange's approach includes using policy as code for data governance, enabling better management of data quality and access control. They have established a centralized team to define architecture and support data product development across regions.

  • 00:40:00 - 00:45:30

    The session concludes with updates on new features in DataX, including automated cataloging, lineage tracking, and governance rules, aimed at enhancing data discovery and governance across organizations.

显示更多

思维导图

视频问答

  • What is data governance?

    Data governance refers to the management of data availability, usability, integrity, and security in an organization.

  • What are some challenges in data governance?

    Challenges include dark data (data that is not discovered or used), data quality issues, and the complexity of enterprise data landscapes.

  • What is DataX?

    DataX is a tool developed by Google Cloud for automating data governance and management at scale.

  • How does Ford use DataX?

    Ford utilizes DataX for capturing metadata, enhancing data discovery, and maintaining data quality across its platform.

  • What innovations are being implemented in data discovery?

    Innovations include automated cataloging, natural language querying, and enhanced metadata management.

  • What is the importance of metadata?

    Metadata provides context and enables effective data search, discovery, and governance.

  • How does Orange approach data governance?

    Orange adopts a data democracy approach, utilizing policy as code to manage access and maintain data quality across its operations.

  • What role does AI play in data governance?

    AI assists in automating data classification, anomaly detection, and enhancing the overall data discovery process.

  • What are governance rules in DataX?

    Governance rules allow organizations to define and enforce data governance policies at scale based on existing metadata.

  • What are the next steps for data discovery at Ford?

    Ford is piloting a data marketplace experience to enhance user access and streamline data requests.

查看更多视频摘要

即时访问由人工智能支持的免费 YouTube 视频摘要!
字幕
en
自动滚动:
  • 00:00:00
    [Music]
  • 00:00:11
    good morning everyone thank you so much
  • 00:00:13
    for coming to this breakout session on
  • 00:00:16
    data governance in the age of
  • 00:00:20
    AI we're very excited to have with us
  • 00:00:23
    two customer speakers today we have
  • 00:00:26
    Cynthia gums who is the manager of
  • 00:00:29
    global data insights and analytics at
  • 00:00:31
    Ford Driving key initiatives around the
  • 00:00:34
    new data Factory at Ford we also have
  • 00:00:37
    Steve Jared who is the chief AI officer
  • 00:00:41
    at Orange leading Ai and data strategy
  • 00:00:45
    for orange across 26 countries and my
  • 00:00:49
    name is louan I'm a product manager for
  • 00:00:51
    data Plex here at Google Cloud so we're
  • 00:00:55
    very excited to be sharing with you how
  • 00:00:58
    we think about data governance in this
  • 00:01:00
    age of AI and what do our Journeys each
  • 00:01:04
    look
  • 00:01:06
    like so here's our agenda for today
  • 00:01:09
    we're going to start with an
  • 00:01:10
    introduction and a product overview
  • 00:01:13
    followed by case studies from Ford and
  • 00:01:16
    orange and then we'll talk about what's
  • 00:01:19
    new what's upcoming in
  • 00:01:24
    datax so as all of us have experienced
  • 00:01:27
    recently generative AI is really this
  • 00:01:30
    paradigm shift that is
  • 00:01:33
    revolutionizing how we operate as
  • 00:01:36
    businesses whether it is generating
  • 00:01:38
    creative content whether it is working
  • 00:01:40
    with complex data whether it's improving
  • 00:01:43
    your customer experiences or even
  • 00:01:46
    training your own large language models
  • 00:01:48
    for Enterprise use cases the impact of
  • 00:01:51
    AI is really profound and all
  • 00:01:57
    encompassing at the same time we know
  • 00:02:00
    that data is the fuel that feeds into
  • 00:02:03
    the engine of AI it is really the
  • 00:02:06
    critical foundation for training and
  • 00:02:09
    grounding your models and in return this
  • 00:02:13
    rapid growth in AI Innovation is really
  • 00:02:17
    creating an accelerating demand for data
  • 00:02:21
    that is well governed high quality and
  • 00:02:24
    easy to
  • 00:02:27
    discover so given the strong need and
  • 00:02:30
    Northstar Vision what are the challenges
  • 00:02:33
    that companies are actually facing in
  • 00:02:37
    reality well first and foremost we have
  • 00:02:40
    the challenge of dark data which I'm
  • 00:02:43
    sure most of you could resonate with in
  • 00:02:46
    fact what we know that
  • 00:02:49
    66% of organizations have reported that
  • 00:02:52
    at least half of their data is dark
  • 00:02:56
    which means that it is data that is not
  • 00:02:58
    even discovered or used in the first
  • 00:03:03
    place and even if you're able to
  • 00:03:05
    discover and use that data there's still
  • 00:03:08
    a lot of questions about data quality
  • 00:03:11
    whether this data is valid whether this
  • 00:03:14
    data is
  • 00:03:16
    trustworthy we learned from our survey
  • 00:03:18
    that only
  • 00:03:20
    44% of data leaders are fully confident
  • 00:03:24
    in the quality of their organization's
  • 00:03:26
    data and as we all know local quality
  • 00:03:30
    data would only result in low quality
  • 00:03:33
    output which you really cannot trust for
  • 00:03:36
    any inside generation or
  • 00:03:41
    decision-making now the reason why
  • 00:03:44
    managing and governing your data it's so
  • 00:03:47
    difficult it's due to the complexity of
  • 00:03:50
    the Enterprise data
  • 00:03:52
    landscape as you can see here on the
  • 00:03:54
    diagram data is really coming in from
  • 00:03:57
    various different sources the are stored
  • 00:04:00
    and processed in different Services
  • 00:04:04
    whether it's data warehouse data lakes
  • 00:04:06
    or
  • 00:04:07
    databases they reside in different
  • 00:04:10
    formats and they're used by different
  • 00:04:13
    personas across different
  • 00:04:16
    workflows now please raise your hand if
  • 00:04:19
    this complex situation ever seemed
  • 00:04:21
    familiar to
  • 00:04:23
    you yes I see a lot of hands
  • 00:04:26
    raised definitely that's what we see all
  • 00:04:29
    the time as well so this challenge is
  • 00:04:33
    exactly what's keeping us busy here at
  • 00:04:35
    Google cloud and as you may have learned
  • 00:04:38
    from this conference so far we're really
  • 00:04:40
    evolving bigquery into a unified data
  • 00:04:44
    and AI governance platform a unified
  • 00:04:47
    data and AI platform and this platform
  • 00:04:49
    is designed with data governance as a
  • 00:04:52
    central builting consideration that is
  • 00:04:55
    contextual and pervasive across the
  • 00:04:58
    different layers of the text
  • 00:05:01
    deck now at the heart of providing this
  • 00:05:05
    unified data and AI governance is datax
  • 00:05:09
    which is our native offering for
  • 00:05:11
    automating data governance and data
  • 00:05:13
    management at
  • 00:05:15
    scale there are several key value
  • 00:05:17
    propositions of datax we have seen truly
  • 00:05:21
    resonating with our customers based on
  • 00:05:25
    interactions first and foremost dataplex
  • 00:05:27
    deeply integrates with various products
  • 00:05:30
    and services to really provide this
  • 00:05:33
    UniFi metadata across distributed data
  • 00:05:36
    and based on this you're able to perform
  • 00:05:38
    search across different projects across
  • 00:05:42
    different regions and across different
  • 00:05:44
    data
  • 00:05:46
    silos and based on that you're also able
  • 00:05:49
    to further enrich and organize your data
  • 00:05:52
    as needed so that's number one number
  • 00:05:56
    two on top of this wealth of metadata
  • 00:05:58
    datax offers centralized security and
  • 00:06:02
    governance
  • 00:06:04
    features this really allows you to
  • 00:06:06
    easily manage your data governance
  • 00:06:09
    policies based on understanding of the
  • 00:06:12
    metadata
  • 00:06:13
    context and last but not least datax has
  • 00:06:16
    a rich set of features around
  • 00:06:19
    intelligent data management from
  • 00:06:21
    tracking data lineage to assessing data
  • 00:06:25
    profile and to automating data quality
  • 00:06:28
    checks so really helping you build
  • 00:06:30
    better trust in your data and helping
  • 00:06:33
    you optimize data related
  • 00:06:38
    Roi now since GA launch back in 2022
  • 00:06:42
    datax has been widely adopted by
  • 00:06:46
    customers across different geographies
  • 00:06:49
    and different industry verticals as of
  • 00:06:52
    now over
  • 00:06:53
    95% of the top data analytics customers
  • 00:06:57
    at Google Cloud are all already using
  • 00:07:00
    data Plex for managing and governing
  • 00:07:02
    their data at
  • 00:07:04
    scale So today we're very excited to be
  • 00:07:07
    hearing from two of them Ford and orange
  • 00:07:11
    so please join me in first welcoming
  • 00:07:14
    Cynthia from Ford to talk about her
  • 00:07:17
    journey with data
  • 00:07:20
    [Applause]
  • 00:07:28
    governance
  • 00:07:33
    all right good day how's everybody doing
  • 00:07:38
    today so my name is Cynthia gums and I'm
  • 00:07:40
    responsible for data Discovery and
  • 00:07:42
    classification at Ford Motor Company
  • 00:07:45
    welcome to my TED
  • 00:07:48
    talk now I'm just
  • 00:07:53
    kidding and
  • 00:07:57
    so while this may not be a t talk I
  • 00:08:01
    promise you that this is an important
  • 00:08:02
    topic and I'm really honored to have the
  • 00:08:05
    opportunity to speak with you today
  • 00:08:07
    about it
  • 00:08:09
    okay so before we start o that's loud
  • 00:08:14
    before we start I'm curious about who's
  • 00:08:16
    in the crowd today so how many of you
  • 00:08:20
    consider yourselves a data governance
  • 00:08:22
    professional you can just wave your hand
  • 00:08:25
    all right there's a lot of you out there
  • 00:08:27
    and if you didn't raise your hand how
  • 00:08:29
    many of you have an appreciation for
  • 00:08:30
    what data governance
  • 00:08:33
    is okay so most of you awesome so I as
  • 00:08:37
    it and data professionals I'm sure
  • 00:08:39
    you've had the
  • 00:08:40
    opportunity to explain to nonata or
  • 00:08:43
    non-te people what you do for a living
  • 00:08:46
    right so when you tell someone I work in
  • 00:08:49
    it Information Technology they look at
  • 00:08:52
    you and they think oh you must be doing
  • 00:08:54
    tech support and then they start calling
  • 00:08:56
    you to help them with their Wi-Fi right
  • 00:08:59
    then you you tell them well I work in
  • 00:09:00
    the data analytics department and then
  • 00:09:03
    they think oh she just creates charts
  • 00:09:05
    and things all day she doesn't really do
  • 00:09:08
    a whole lot of anything because that's
  • 00:09:09
    what data analytics is right then you
  • 00:09:12
    tell them well I'm in data
  • 00:09:14
    governance and now they think you have
  • 00:09:17
    the magic key to give everybody access
  • 00:09:19
    to data data governance is so much more
  • 00:09:21
    than that and then for me when I say I
  • 00:09:23
    work in data Discovery and
  • 00:09:25
    classification they have no clue what
  • 00:09:27
    I'm talking about so then I have to
  • 00:09:29
    explain to them I am responsible for
  • 00:09:33
    making sure people can find the data
  • 00:09:35
    they need to solve business problems and
  • 00:09:39
    then I might give an analogy about a
  • 00:09:41
    shopping experience or being in a
  • 00:09:44
    library and then they get it okay so
  • 00:09:47
    data governance is a broad topic today
  • 00:09:50
    I'm just going to scratch the surface of
  • 00:09:51
    it but I'm really going to dig a little
  • 00:09:53
    bit deeper into Data Discovery so our
  • 00:09:56
    Ford data platform is powered by Google
  • 00:09:59
    cloud and we believe that data is
  • 00:10:02
    awesome valuable and worthy of respect
  • 00:10:07
    our platform objective is to establish a
  • 00:10:10
    single source of truth of data to enable
  • 00:10:12
    data fusion and
  • 00:10:15
    responsible data
  • 00:10:17
    usage our platform helps data Engineers
  • 00:10:21
    organize many many data sources across
  • 00:10:24
    every aspect of our business we have a
  • 00:10:27
    very complex environment
  • 00:10:29
    and we have a number of key capabilities
  • 00:10:32
    that allow us to meet our objective data
  • 00:10:35
    governance is a foundational capability
  • 00:10:37
    for our
  • 00:10:40
    platform as a part of our migration to
  • 00:10:43
    Google Cloud we have enabled datax data
  • 00:10:46
    catalog to capture Technical and
  • 00:10:48
    business metadata about our projects and
  • 00:10:51
    data sets each of our governance teams
  • 00:10:55
    benefits from data capabilities but the
  • 00:10:58
    main one is tag templates for the
  • 00:11:00
    collection of business metadata we also
  • 00:11:03
    benefit from dataplex apis to help
  • 00:11:07
    expose that metadata in our custom user
  • 00:11:09
    interfaces as well as our backend
  • 00:11:13
    processes additionally our data quality
  • 00:11:15
    team recently launched data lineage and
  • 00:11:18
    data lineage allows us to understand
  • 00:11:20
    where the data came from and its life
  • 00:11:22
    cycle we also use lineage to understand
  • 00:11:26
    how to troubleshoot the data how to
  • 00:11:28
    identify dependencies and also for data
  • 00:11:33
    Discovery so as I previously previously
  • 00:11:36
    mentioned we use Tag templates as the
  • 00:11:38
    foundation for our data catalog
  • 00:11:41
    experience and this metadata is
  • 00:11:43
    collected as a part of our endtoend data
  • 00:11:47
    process so when you onboard the data
  • 00:11:50
    when you create your project all the way
  • 00:11:52
    through access enablement we're
  • 00:11:54
    collecting the data along the way we
  • 00:11:56
    collect the metadata at every level of
  • 00:11:59
    our project hierarchy starting with
  • 00:12:01
    projects then data sets tables views and
  • 00:12:04
    columns we have tag templates for each
  • 00:12:06
    of those levels today we're collecting
  • 00:12:08
    roughly 70 Business metadata tags that
  • 00:12:12
    might not sound like a lot but it's it's
  • 00:12:15
    important data that we need to help
  • 00:12:16
    people discover the data and actually
  • 00:12:18
    dataplex lets you capture thousands of
  • 00:12:22
    tags most of this metadata is captured
  • 00:12:25
    manually so you can imagine that it's
  • 00:12:27
    kind of tedious time consuming and
  • 00:12:29
    resource
  • 00:12:31
    intensive but we're working with um
  • 00:12:35
    we're experimenting with Gen to see if
  • 00:12:37
    we can do some Automation in the
  • 00:12:39
    metadata enrichment space we have custom
  • 00:12:42
    user interfaces to allow our users to
  • 00:12:45
    input and extract the metadata from the
  • 00:12:48
    data
  • 00:12:49
    catalog now you might be wondering well
  • 00:12:51
    why do you need a custom user interface
  • 00:12:54
    dataplex has an interface via the
  • 00:12:56
    console however there's many reasons
  • 00:13:00
    why we decided to go this route but the
  • 00:13:01
    main one is we need to control the way
  • 00:13:05
    that metadata is input and the way it is
  • 00:13:08
    exposed to our users right now today in
  • 00:13:11
    dataplex which is perfect for a platform
  • 00:13:13
    team it shows you everything staging
  • 00:13:17
    tables temp tables that's great for the
  • 00:13:19
    platform team but your end users don't
  • 00:13:21
    want to see that and so we try to modify
  • 00:13:23
    the experience so that we can curate it
  • 00:13:25
    for
  • 00:13:27
    excellence okay
  • 00:13:30
    another reason why we created a custom
  • 00:13:33
    user experience is because we have so
  • 00:13:35
    many different personas that we're
  • 00:13:38
    trying to satisfy so last year we
  • 00:13:41
    launched our custom data Discovery
  • 00:13:43
    experience this was an exciting time for
  • 00:13:46
    us because we've been on quite a journey
  • 00:13:47
    trying to solve the data Discovery
  • 00:13:51
    challenge this experience makes use of
  • 00:13:54
    the data Plex
  • 00:13:55
    apis and we call it our data Discovery
  • 00:13:58
    hub
  • 00:14:00
    this interface currently has over 9,000
  • 00:14:03
    users in
  • 00:14:04
    growing our users are able to search the
  • 00:14:08
    data catalog and when the results come
  • 00:14:11
    back they're also able to see all the
  • 00:14:13
    metadata related to that data asset they
  • 00:14:16
    can also see the table and view schemas
  • 00:14:19
    as well we've enabled an export
  • 00:14:22
    capability that allows users to export
  • 00:14:25
    their search results the table schemas
  • 00:14:28
    and the metadata if they need to do any
  • 00:14:30
    offline analysis now I need to be clear
  • 00:14:33
    this is not an export of the data this
  • 00:14:36
    is export of the metadata right the
  • 00:14:38
    security folks are probably looking for
  • 00:14:40
    me
  • 00:14:41
    now and so our user interface also
  • 00:14:45
    creates or provides linkages to other
  • 00:14:48
    tools that we have we have our data
  • 00:14:51
    Activation Portal which allows users to
  • 00:14:54
    request access to the data that they've
  • 00:14:56
    discovered then we also have our data
  • 00:14:59
    quality dashboards so they can see the
  • 00:15:01
    completeness the timeliness and the
  • 00:15:03
    other measures for data
  • 00:15:09
    quality all
  • 00:15:13
    right all right so we are
  • 00:15:16
    currently uh oh sorry guys all right so
  • 00:15:20
    one of the key challenges for our data
  • 00:15:24
    Discovery activity is that um the user
  • 00:15:28
    experience is really important and our
  • 00:15:30
    challenge was that we have all these
  • 00:15:32
    different user personas we have data
  • 00:15:35
    analysts data stewards data scientists
  • 00:15:38
    software analysts data engineers and
  • 00:15:42
    then you have your business users all
  • 00:15:43
    these people have a need to discover
  • 00:15:45
    data but their needs are a little bit
  • 00:15:47
    different and so if I return to a
  • 00:15:49
    business user this big long list of
  • 00:15:53
    tables and columns they're not going to
  • 00:15:54
    know what to do with that right and so
  • 00:15:57
    to solve this problem we've engaged
  • 00:15:59
    project product designers to help us go
  • 00:16:02
    through this full effort of
  • 00:16:05
    understanding our personas what do they
  • 00:16:07
    want what are they thinking about when
  • 00:16:09
    they search for data how could we return
  • 00:16:11
    the results such that they have an
  • 00:16:13
    appreciation for what those um what the
  • 00:16:16
    data is and what it can do for them
  • 00:16:19
    we've done user interviews sessions
  • 00:16:21
    where we sit down with them and watch
  • 00:16:23
    them use the tool we've done surveys and
  • 00:16:25
    all of that has resulted in excellent
  • 00:16:27
    feedback that we're using to inform how
  • 00:16:30
    we modify our product another challenge
  • 00:16:33
    is ensuring that we have high quality
  • 00:16:35
    relevant
  • 00:16:37
    metadata if your columns don't have
  • 00:16:39
    descriptions how can someone search and
  • 00:16:42
    find that your column has the data that
  • 00:16:43
    they need to solve their business
  • 00:16:46
    problem however I already mentioned that
  • 00:16:48
    populating metadata is time consuming
  • 00:16:50
    and resource intensive and so how do you
  • 00:16:53
    solve that problem well one thing that
  • 00:16:55
    we've done is we created a tool that
  • 00:16:58
    allows data teams to update their
  • 00:17:02
    descriptions in bulk using their data
  • 00:17:05
    dictionaries as input and so now they
  • 00:17:07
    already have a data dictionary they can
  • 00:17:10
    upload that it goes through terraform
  • 00:17:12
    and it updates their descriptions behind
  • 00:17:13
    the scenes so that was a great
  • 00:17:16
    enhancement that we
  • 00:17:18
    enabled so what's next for data
  • 00:17:21
    Discovery at
  • 00:17:22
    Ford we are currently piloting a data
  • 00:17:25
    Marketplace experience which allows
  • 00:17:28
    users to search find and access data
  • 00:17:32
    well request access to data in the same
  • 00:17:34
    user experience today it's separate
  • 00:17:37
    experiences but we're merging them right
  • 00:17:40
    and so this was a highly requested item
  • 00:17:42
    from our users that we're trying to
  • 00:17:45
    satisfy so far the pilot is going really
  • 00:17:48
    well and we're getting excellent
  • 00:17:49
    feedback that will inform our our
  • 00:17:53
    product and now we're also looking to to
  • 00:17:56
    launch some additional enhancements to
  • 00:17:57
    the experience so pre previously it was
  • 00:17:59
    just a simple search experience but now
  • 00:18:01
    it's search that will have Best Bets
  • 00:18:05
    enabled and that's what we're calling it
  • 00:18:07
    Best Bets Best Bets means um it's a
  • 00:18:11
    keyword focused activity where we have
  • 00:18:14
    popular data sets keywords that describe
  • 00:18:16
    them when users go in to do their search
  • 00:18:18
    if they hit one of those keywords those
  • 00:18:21
    items will bubble to the top of the
  • 00:18:22
    search we're also allowing our users to
  • 00:18:25
    add comments and ratings for the data so
  • 00:18:29
    if you already have access to the data
  • 00:18:30
    you can find it in the catalog and say
  • 00:18:33
    this data set was awesome it helped me
  • 00:18:34
    do X Y or
  • 00:18:36
    Z try it out right and then if they
  • 00:18:38
    thumbs it up that also influences where
  • 00:18:41
    it shows up in the search results the
  • 00:18:44
    next one is around data
  • 00:18:47
    collections so it's kind of like a
  • 00:18:49
    storefront for the data at Ford we have
  • 00:18:51
    a number of different subject areas and
  • 00:18:54
    they want to see their data together
  • 00:18:56
    right they don't want to see it mixed up
  • 00:18:58
    with every every body else's data and so
  • 00:19:00
    we're looking to create these
  • 00:19:02
    collections or storefronts so those
  • 00:19:04
    different subject areas can say you want
  • 00:19:07
    to use my data go to this place in the
  • 00:19:09
    data catalog to find it
  • 00:19:12
    okay so we're also experimenting with
  • 00:19:15
    Gen I'm sure you've been hearing a lot
  • 00:19:17
    about gen during the conference this
  • 00:19:19
    week and so we're looking at using gen
  • 00:19:22
    to assist with metadata enrichment
  • 00:19:25
    automating it as well as helping to
  • 00:19:28
    gener generate business descriptions for
  • 00:19:30
    the data I already mentioned that it's
  • 00:19:32
    time you know time consuming to do so
  • 00:19:34
    manually but how cool would it be if you
  • 00:19:37
    could just throw gen at a table and say
  • 00:19:40
    what's in this table and it figures it
  • 00:19:42
    out because they can tell what's in the
  • 00:19:44
    table from the data and it will describe
  • 00:19:45
    it and you move on now you might have to
  • 00:19:48
    verify that the descriptions are decent
  • 00:19:50
    but once you get that confidence I think
  • 00:19:52
    it'll be a powerful data Discovery
  • 00:19:56
    experience all right and then we're also
  • 00:19:58
    looking at doing an
  • 00:19:59
    llm and we're training it on
  • 00:20:02
    documentation about the data as well as
  • 00:20:05
    the data catalog
  • 00:20:06
    itself now you're taking data Discovery
  • 00:20:09
    to another level right because you can
  • 00:20:11
    do natural language queries ask it
  • 00:20:14
    questions and it's going to combine that
  • 00:20:16
    documentation with that metadata and
  • 00:20:19
    give you a really good answer about what
  • 00:20:20
    data is available to solve your business
  • 00:20:23
    problems lastly we're really looking
  • 00:20:26
    forward to implementing data plexes
  • 00:20:29
    business terms and glossery this will
  • 00:20:32
    allow us to give common understanding to
  • 00:20:35
    those key terms that are the same across
  • 00:20:37
    the company so if you have 50 different
  • 00:20:41
    data sources and they all mention part
  • 00:20:42
    number do we have to Define part number
  • 00:20:45
    50 times in 50 different ways no but the
  • 00:20:48
    B business terms will allow us to have
  • 00:20:50
    that be consistent across all the
  • 00:20:53
    data so we've been on this
  • 00:20:56
    journey for a little over two years
  • 00:20:59
    years and it's been challenging but
  • 00:21:02
    rewarding at the same time especially
  • 00:21:04
    with this data Marketplace launch and so
  • 00:21:07
    I want to show appreciation to Google my
  • 00:21:10
    team and also all the teams involved for
  • 00:21:13
    their engagement and collaboration
  • 00:21:15
    because it's it's been fun I was um in
  • 00:21:18
    another session and the gentleman said
  • 00:21:20
    data governance was
  • 00:21:22
    boring really we're having so much fun
  • 00:21:26
    trying to figure out how to solve this
  • 00:21:28
    data Lex problem or not dataplex problem
  • 00:21:30
    but data data Discovery problem so I'll
  • 00:21:33
    ask you all who here thinks that data
  • 00:21:36
    Discovery is
  • 00:21:39
    easy I don't see a single hand raised
  • 00:21:42
    and thank you for validating us it's not
  • 00:21:46
    easy my manager's in here too it's not
  • 00:21:49
    easy all right so with that I'm going to
  • 00:21:53
    hand it over to Steve from Orange and
  • 00:21:55
    he's going to tell us how they've
  • 00:21:58
    enabled data Discovery with
  • 00:22:09
    gcp thanks Cynthia and we're on a very
  • 00:22:12
    very similar Journey so orange we're one
  • 00:22:16
    of the largest telecoms providers in the
  • 00:22:18
    world we also sell a lot of IT services
  • 00:22:21
    across 26 countries so we were
  • 00:22:23
    originally France Telecom and then we
  • 00:22:26
    acquired operations in many other
  • 00:22:28
    countries ranging from Belgium and Spain
  • 00:22:31
    and Poland as well as uh African
  • 00:22:34
    countries ranging from Sagal and Ivory
  • 00:22:37
    Coast to to the Democratic Republic of
  • 00:22:39
    Congo so we have an enormously diverse
  • 00:22:42
    set of challenges with almost 300
  • 00:22:45
    million customers across those countries
  • 00:22:48
    and the company is really a very proud
  • 00:22:51
    technological company a lot of the
  • 00:22:53
    reasons why you have power saving modes
  • 00:22:56
    in 5G today is because Orange cared
  • 00:22:58
    about the impact uh of these
  • 00:23:01
    Technologies on the environment um for
  • 00:23:03
    for decades and we're also at the
  • 00:23:05
    Forefront of a lot of AI work I'm I'm
  • 00:23:08
    really lucky to have a dedicated AI
  • 00:23:11
    research team that came from France
  • 00:23:14
    Telecom labs and so in my central team
  • 00:23:17
    we have not only all the data
  • 00:23:18
    engineering data science and ml
  • 00:23:20
    engineering but also as I mentioned the
  • 00:23:22
    pure research team and we're really
  • 00:23:24
    focused in three domains uh and we use
  • 00:23:27
    superpower inter changeably with AI so
  • 00:23:30
    we say that we're trying to superpower
  • 00:23:32
    our employees daily lives we're trying
  • 00:23:35
    to superpower all of our networks and
  • 00:23:37
    we're trying to superpower our customer
  • 00:23:39
    experiences and as Cynthia was
  • 00:23:41
    describing there's many challenges in
  • 00:23:44
    providing these kinds of services at
  • 00:23:46
    scale so before we started to use Google
  • 00:23:50
    Cloud we had data in organizational
  • 00:23:55
    silos that were mapped to the physical
  • 00:23:58
    infrastructure Ure so each of these
  • 00:23:59
    teams within the countries like the
  • 00:24:01
    network team the finance team they had
  • 00:24:03
    built and maintained their own data
  • 00:24:05
    infrastructures that led to these silos
  • 00:24:07
    that map to these cult cultural and and
  • 00:24:10
    uh and operational silos that we faced
  • 00:24:13
    and across these 26 countries the the
  • 00:24:17
    data infrastructure that we had was
  • 00:24:18
    incredibly
  • 00:24:19
    heterogeneous and mostly had been um
  • 00:24:22
    self-integrated uh and managed and so
  • 00:24:25
    that the the level of complexity of
  • 00:24:26
    maintaining that infrastructure and the
  • 00:24:29
    skills necessary for the teams to manage
  • 00:24:31
    that infrastructure were extremely
  • 00:24:33
    complex and and that a lot of the time
  • 00:24:35
    that was taken by the data engineering
  • 00:24:37
    teams was just managed keeping the
  • 00:24:39
    lights on on our
  • 00:24:41
    infrastructure and data governance was
  • 00:24:43
    also managed uh very very manually uh
  • 00:24:47
    with these very basic systems and we had
  • 00:24:49
    many different security and Regulatory
  • 00:24:52
    risk that we had to mitigate um through
  • 00:24:54
    these systems and as Cynthia was saying
  • 00:24:57
    a lot of our Executives didn't really
  • 00:25:00
    see data governance as strategic they
  • 00:25:04
    saw it as something that was just like a
  • 00:25:06
    Regulatory Compliance requirement they
  • 00:25:08
    didn't see it as
  • 00:25:10
    enabling uh the our ability to reach AI
  • 00:25:13
    at scale and so that was really
  • 00:25:15
    preventing us from taking advantage of
  • 00:25:17
    these enormous volumes of data that we
  • 00:25:18
    generate across the business to generate
  • 00:25:21
    value from
  • 00:25:22
    that so we established a few years ago
  • 00:25:26
    now this vision of a data democracy
  • 00:25:29
    where we make this data widely available
  • 00:25:32
    within each country by breaking these
  • 00:25:34
    silos that we have between the
  • 00:25:37
    organizations by having very rich data
  • 00:25:40
    Discovery and to do this at scale we use
  • 00:25:43
    policy as code to not only enforce
  • 00:25:45
    access control but also things um like
  • 00:25:48
    all of the data processes that we have
  • 00:25:50
    for maintaining quality through the
  • 00:25:52
    pipeline and that allowed us to really
  • 00:25:55
    use standard cicd techniques and tooling
  • 00:25:58
    to dramatically improve the way that we
  • 00:25:59
    manage our data and also using AI itself
  • 00:26:03
    to identify anomalies in the pipelines
  • 00:26:06
    has been very very useful and so now our
  • 00:26:10
    CEOs also because of the tsunami of AI
  • 00:26:13
    they're really seeing data uh itself as
  • 00:26:15
    being really foundational and crucial to
  • 00:26:17
    the business and the thing that we did
  • 00:26:19
    to encourage that was we set two things
  • 00:26:23
    one was we set a a uniform way to
  • 00:26:25
    measure value on use cases across all 26
  • 00:26:28
    countries and that's widely available on
  • 00:26:31
    a dashboard that every CEO and employee
  • 00:26:35
    can see so they can see which use cases
  • 00:26:37
    are generating a lot of value but also
  • 00:26:40
    we have other operational kpis that
  • 00:26:43
    relate to our data migrations and data
  • 00:26:45
    quality that are also public and so what
  • 00:26:48
    that did did was it created a
  • 00:26:50
    competitive dynamic between our
  • 00:26:54
    CEOs which was really effective and it
  • 00:26:56
    also LED for individual people people in
  • 00:26:58
    the company to see which countries were
  • 00:27:00
    being really successful at different
  • 00:27:02
    parts of their Journey towards AI at
  • 00:27:04
    scale and encourage them to learn from
  • 00:27:06
    one another and so we took inspiration
  • 00:27:09
    from data mesh to then build a set of
  • 00:27:11
    data products but our approach to data
  • 00:27:14
    products is that we have a centralized
  • 00:27:16
    team my team that defines architecture
  • 00:27:20
    with Partners like Google and that
  • 00:27:22
    allows us to have uniform infrastructure
  • 00:27:25
    that's provided to each of these teams
  • 00:27:26
    that's generating data
  • 00:27:28
    and then they're responsible for
  • 00:27:31
    maintaining the freshness and the
  • 00:27:32
    documentation and the quality of that
  • 00:27:34
    data and and and and the level of
  • 00:27:37
    automation that we're providing uh
  • 00:27:38
    enables this really rich um set of
  • 00:27:41
    outputs and value that's getting
  • 00:27:42
    generated by the use cases in these
  • 00:27:44
    countries so this is what it looks like
  • 00:27:47
    there's really three pillars the first
  • 00:27:50
    is the data products that relate to data
  • 00:27:52
    management and data quality and also
  • 00:27:55
    role-based um access with policy code so
  • 00:27:59
    one of the things that's really powerful
  • 00:28:01
    uh with datax and big query in
  • 00:28:03
    Partnership is that you can have very
  • 00:28:06
    clear role-based access to data but also
  • 00:28:08
    on a column level uh which is really
  • 00:28:10
    powerful because there's certain users
  • 00:28:13
    that we have that we don't want them to
  • 00:28:14
    have be able to see the really sensitive
  • 00:28:16
    data but we want to enable them to use
  • 00:28:18
    that data for data
  • 00:28:21
    operations the second part is the
  • 00:28:23
    self-service platform that we built
  • 00:28:25
    based on gitlab and leverages this great
  • 00:28:28
    interaction between big query datax and
  • 00:28:30
    vertex so for me it's this Golden
  • 00:28:34
    Triangle of the ability to use the best
  • 00:28:38
    we think the best data infrastructure in
  • 00:28:39
    the world on bigquery with vertex where
  • 00:28:42
    we get not only the best of the Google
  • 00:28:45
    first-party models and
  • 00:28:46
    tools also through the model Garden we
  • 00:28:49
    get leading State of-the-art openweight
  • 00:28:53
    models as well as state-of-the-art open
  • 00:28:56
    source tooling for manag ing the model
  • 00:28:58
    life cycle and I've never seen a pace of
  • 00:29:03
    innovation in my entire career than what
  • 00:29:05
    we see in open source tooling as well as
  • 00:29:08
    open source and open weight llms so
  • 00:29:11
    being able to manage that as policy is
  • 00:29:15
    code between all of the business
  • 00:29:17
    decision makers whether they be the the
  • 00:29:20
    the the data Governors our engineers and
  • 00:29:23
    the business owners has allowed us to
  • 00:29:24
    really operate this at a much higher
  • 00:29:26
    scale than was possible before
  • 00:29:29
    and then lastly we Federate all of that
  • 00:29:32
    with governance to harmonize not only
  • 00:29:34
    access to the data and the documentation
  • 00:29:36
    but also the cataloges and the forms of
  • 00:29:39
    Discovery so let me talk next about what
  • 00:29:41
    we want to do
  • 00:29:43
    next we've built this early
  • 00:29:46
    implementation of data mesh on top of
  • 00:29:49
    the Google platform and what we've seen
  • 00:29:52
    is that the dataplex tool is incredibly
  • 00:29:54
    useful for us for things like data
  • 00:29:57
    catalog
  • 00:29:58
    autod DQ data loss
  • 00:30:01
    prevention and the other thing that's
  • 00:30:03
    been great about working with the datax
  • 00:30:05
    team in addition to the fact that
  • 00:30:07
    they've been extremely reactive to
  • 00:30:09
    understanding our challenges and
  • 00:30:11
    providing us great interaction with the
  • 00:30:14
    the the roadmap and influencing the
  • 00:30:16
    engineering team but also they've been
  • 00:30:18
    really good about providing open apis to
  • 00:30:21
    our other partners to allow them to have
  • 00:30:23
    Rich synchronization between their
  • 00:30:26
    tooling environment and datax itself s
  • 00:30:28
    so for example uh in the case of cbra we
  • 00:30:31
    use cbra today across the company to
  • 00:30:34
    manage data that's not on gcp that's on
  • 00:30:36
    a lot of existing um
  • 00:30:38
    infrastructure um that hasn't been
  • 00:30:40
    migrated yet so having the ability to
  • 00:30:42
    have really rich interaction between our
  • 00:30:45
    existing data governance infrastructure
  • 00:30:47
    and what we're building datax has been
  • 00:30:49
    extremely
  • 00:30:51
    powerful so where we want to go is we
  • 00:30:54
    have this vision of using AI itself
  • 00:30:58
    um to have a Marketplace for both the
  • 00:31:01
    data that's within bigquery but also
  • 00:31:03
    within vertex and so the idea that we
  • 00:31:05
    can use natural language as a way for
  • 00:31:08
    anyone in the company within this
  • 00:31:10
    Marketplace to query what data is
  • 00:31:12
    available to have very very quick
  • 00:31:15
    business intelligence visualizations of
  • 00:31:17
    that data and be able to answer really
  • 00:31:20
    direct simple questions and have a
  • 00:31:22
    dialogue with the data even before they
  • 00:31:24
    engage a data scientist or a data
  • 00:31:26
    engineer uh really unlocks an enormous
  • 00:31:29
    amount of value in the company and and
  • 00:31:32
    we think that it's a fundamental shift
  • 00:31:34
    in the technological interaction with
  • 00:31:36
    computers where you can use natural
  • 00:31:38
    language as if at at the at the level of
  • 00:31:41
    power that previously you needed to be a
  • 00:31:43
    programmer to achieve and and then
  • 00:31:46
    lastly using AI to detect anomalies in
  • 00:31:49
    our pipelines uh to help us fill uh
  • 00:31:52
    where we have gaps in our data and
  • 00:31:54
    otherwise to make sure that the data
  • 00:31:56
    that we're generating is of high quality
  • 00:31:58
    because it's clear that without
  • 00:32:00
    extremely high quality data we're not
  • 00:32:02
    going to have high quality outputs from
  • 00:32:04
    our AI systems and then that applies not
  • 00:32:07
    only to systems where we're just doing
  • 00:32:09
    inference on existing large language
  • 00:32:11
    models but it's also very true when we
  • 00:32:14
    try to fine-tune models so them for them
  • 00:32:17
    to be much smaller uh and to and operate
  • 00:32:20
    much faster so again having extremely
  • 00:32:23
    high quality data we're we managing the
  • 00:32:25
    lineage of that data and that's really
  • 00:32:28
    easily accessible to the the teams that
  • 00:32:29
    are working on the AI fine-tuning uh has
  • 00:32:32
    been really transformative and so this
  • 00:32:34
    data democracy for us is all about
  • 00:32:37
    having this data easily accessible in
  • 00:32:40
    extremely high quality that's well
  • 00:32:42
    documented including by having
  • 00:32:44
    generative AI generate uh gaps in
  • 00:32:47
    documentation and identify uh uh missing
  • 00:32:50
    elements and having that integrated
  • 00:32:53
    extremely well into the workflow of our
  • 00:32:55
    employees and we think that this data
  • 00:32:58
    democracy will unlock unlock an enormous
  • 00:33:00
    amount of value across the company
  • 00:33:03
    because the amount of data that we're
  • 00:33:04
    generating today that's been very hard
  • 00:33:06
    to manage in the past now with this more
  • 00:33:09
    uniform infrastructure that's not only
  • 00:33:11
    available for us on public Cloud but
  • 00:33:14
    also we've been working very closely
  • 00:33:16
    with Google over the last few years to
  • 00:33:18
    have on premise data infrastructure
  • 00:33:20
    which we announced today so we have a
  • 00:33:22
    GDC Edge uh infrastructure that we can
  • 00:33:25
    deploy in our own data centers in each
  • 00:33:27
    country that also has uh data management
  • 00:33:31
    and AI capability so it gives us this
  • 00:33:33
    really rich environment between hybrid
  • 00:33:36
    between on-prem and public Cloud because
  • 00:33:39
    we have to respond not only to very
  • 00:33:41
    varying regulatory requirements across
  • 00:33:43
    our countries that change often
  • 00:33:46
    unpredictably um but also we have
  • 00:33:48
    commercial constraints because the
  • 00:33:50
    amount of data that's coming off our
  • 00:33:51
    network is enormous so for example just
  • 00:33:55
    the network Telemetry data which which
  • 00:33:57
    is the data we use to operate the
  • 00:33:59
    network it's over a paby a day so having
  • 00:34:02
    something sophisticated on premise to
  • 00:34:04
    allow us to filter that data before we
  • 00:34:07
    send to public cloud and to do that in a
  • 00:34:09
    way that maintains quality and this
  • 00:34:11
    policy of code U mechanism is is
  • 00:34:13
    extremely
  • 00:34:15
    transformative so let me bring Lou back
  • 00:34:17
    up to talk about what's on the road
  • 00:34:23
    map
  • 00:34:26
    thanks all right thank you so much Steve
  • 00:34:29
    and Cynthia for sharing your use cases
  • 00:34:32
    and perspective those are really really
  • 00:34:34
    wonderful insights and I think we can
  • 00:34:37
    all resonate with just how critical it
  • 00:34:40
    is to have this platform with
  • 00:34:42
    self-served Discovery and well-governed
  • 00:34:45
    data it's not easy but that's what we're
  • 00:34:48
    here for so next let's take a look at
  • 00:34:51
    what are the new launches we're very
  • 00:34:53
    excited to announce this
  • 00:34:56
    time first and foremost most everything
  • 00:34:58
    in datax starts from having this unified
  • 00:35:01
    metadata across distributed data and
  • 00:35:05
    that's exactly where automated
  • 00:35:07
    cataloging comes in we have worked very
  • 00:35:10
    closely with various gcp services and
  • 00:35:13
    products in order to ingest that
  • 00:35:16
    metadata to harvest metadata and index
  • 00:35:19
    metadata for search and based on this
  • 00:35:23
    you will be able to discover your assets
  • 00:35:25
    across Analytics data laks databases Ai
  • 00:35:30
    and bi
  • 00:35:32
    Services you're also able to enrich and
  • 00:35:35
    organize this data to track lineage to
  • 00:35:38
    enforce governance policies and really
  • 00:35:41
    having this solid foundation for data to
  • 00:35:44
    AI
  • 00:35:46
    governance already data plaque supports
  • 00:35:49
    a rich set of data sources such as big
  • 00:35:52
    quy and pups sub and today we're super
  • 00:35:55
    excited to announce a host of new new
  • 00:35:57
    Integrations as you can see
  • 00:36:00
    here first are the vertex related
  • 00:36:02
    launches we're very excited to be
  • 00:36:04
    announcing the ga of automated
  • 00:36:07
    cataloging for vertex models and data
  • 00:36:10
    sets and also the preview of automated
  • 00:36:13
    cataloging for vertex AI features with
  • 00:36:16
    those Integrations in place as soon as
  • 00:36:18
    you create a new artifact in vertex AI
  • 00:36:21
    they will be made searchable in datax in
  • 00:36:24
    near real time and this is really
  • 00:36:27
    critical because we truly believe that
  • 00:36:30
    data and AI should be managed and
  • 00:36:33
    governed in a consistent and coherent
  • 00:36:36
    way next are operational databases
  • 00:36:39
    including the ga of big table
  • 00:36:42
    integration spanner integration as well
  • 00:36:45
    as the preview of automated metadata
  • 00:36:48
    cataloging from cloud SQL and it's super
  • 00:36:51
    important to have this coverage for
  • 00:36:53
    operational databases as well to really
  • 00:36:56
    provide and to a
  • 00:36:59
    visibility next we're also actively
  • 00:37:01
    working on looker integration and it's a
  • 00:37:04
    launch that's coming soon so please stay
  • 00:37:06
    tuned and with all of those launches in
  • 00:37:09
    place our goal is to really provide a
  • 00:37:12
    powerful metadata Foundation to you to
  • 00:37:15
    enable automated metadata Discovery
  • 00:37:18
    management and
  • 00:37:21
    governance next is lineage so datax
  • 00:37:25
    already provides the ability for you to
  • 00:37:28
    automatically track and visualize
  • 00:37:30
    lineage as your data artifact flow
  • 00:37:33
    through your distributed data
  • 00:37:35
    landscape now this capability also work
  • 00:37:38
    nicely with other datax features such as
  • 00:37:41
    data quality checks where as soon as a
  • 00:37:44
    data quality issue is discovered you
  • 00:37:47
    will be able to trace upstream and
  • 00:37:49
    downstream in order to understand what
  • 00:37:51
    is the root cause and impact of a
  • 00:37:54
    particular data quality
  • 00:37:56
    breach now with lineage parsing there's
  • 00:37:59
    already native integration with services
  • 00:38:02
    like bigquery data proc and composer and
  • 00:38:06
    we also have datax API and open lineage
  • 00:38:09
    integration to really provide that
  • 00:38:12
    extensibility and today we're really
  • 00:38:14
    excited to announce the lineage support
  • 00:38:17
    for vertex AI pipelines really allowing
  • 00:38:20
    for this end and traceability from data
  • 00:38:23
    processing to data analytics to machine
  • 00:38:26
    learning training and deployment and
  • 00:38:29
    providing you with this endtoend picture
  • 00:38:32
    that is critical for data to AI
  • 00:38:34
    governance and
  • 00:38:37
    compliance now at the same time in
  • 00:38:39
    addition to extending the type of data
  • 00:38:41
    sources being covered we're also
  • 00:38:43
    enhancing the granularity of lineage
  • 00:38:45
    tracking we're very excited to introduce
  • 00:38:48
    the preview for column level lineage in
  • 00:38:53
    Bor oh hey thank
  • 00:38:56
    you
  • 00:38:59
    so ever since introducing table lineage
  • 00:39:01
    in B Cory as well as other services we
  • 00:39:04
    have seen strong customer enthusiasm
  • 00:39:07
    adoption thanks to all of you and we're
  • 00:39:09
    also getting very strong demand for the
  • 00:39:12
    next level granularity which is what
  • 00:39:15
    we're very excited to bring to you today
  • 00:39:17
    so now you're able to perform root cause
  • 00:39:19
    analysis and impact analysis at the
  • 00:39:22
    column level in addition to at the table
  • 00:39:25
    level and imagine when you have a column
  • 00:39:28
    that's identified to contain personal
  • 00:39:30
    identifiable information this is where
  • 00:39:32
    column level lineage really shines right
  • 00:39:35
    where you're able to then control its
  • 00:39:37
    propagation and then be able to comply
  • 00:39:39
    with different
  • 00:39:42
    regulations now there's also more ease
  • 00:39:45
    of use features that we're launching
  • 00:39:48
    together with this so for example
  • 00:39:50
    there's the ability to help you pull up
  • 00:39:52
    all the upstreams and all the
  • 00:39:54
    downstreams of a particular node in the
  • 00:39:56
    lineage graph
  • 00:39:57
    there's also the ability to filter by
  • 00:40:00
    different transformation types to make
  • 00:40:02
    lineage graph more consumable and
  • 00:40:04
    there's also the ability to export
  • 00:40:06
    lineage for offline analysis so all of
  • 00:40:09
    this is to enhance our user experience
  • 00:40:13
    and to make it easier to work with data
  • 00:40:15
    Plex
  • 00:40:17
    lineage next are two gen powered Gemini
  • 00:40:21
    launches from dataplex so first of all
  • 00:40:24
    we know that searching over metadata is
  • 00:40:27
    a really critical experience with datax
  • 00:40:30
    and it's really at the core of what we
  • 00:40:32
    do here at dataplex now in addition to
  • 00:40:35
    doing keyword search with dataplex
  • 00:40:37
    you're able to just ask us a question in
  • 00:40:40
    natural language and datax will be able
  • 00:40:43
    to interpret your intent and be able to
  • 00:40:46
    retrieve the most relevant search
  • 00:40:49
    results this can really go along long
  • 00:40:51
    way to lower this entry barrier as we
  • 00:40:53
    have discussed earlier and to really
  • 00:40:55
    democratize the experience of data
  • 00:40:58
    Discovery to your entire
  • 00:41:01
    organization now once the data is
  • 00:41:03
    discovered there's another really
  • 00:41:05
    exciting gen power features from data
  • 00:41:07
    Plex to help which is Data
  • 00:41:10
    Insights now a lot of us working with
  • 00:41:13
    data must have experienced the cold star
  • 00:41:16
    problem now which is once you find a
  • 00:41:19
    valuable data asset you're sometimes not
  • 00:41:21
    sure what is the best SQL queries to
  • 00:41:24
    write in order to really extract that
  • 00:41:26
    meaning ful Insight from the data so
  • 00:41:29
    that's exactly word data Insight is here
  • 00:41:32
    to help it would automatically generate
  • 00:41:35
    and suggest SQL queries as well as a
  • 00:41:38
    list of questions you can ask of a table
  • 00:41:41
    in natural language and it will provide
  • 00:41:44
    validated SQL cars to you as well that
  • 00:41:47
    is ready to run in big car
  • 00:41:50
    studio so this could really help give
  • 00:41:53
    you a jump start into your analysis
  • 00:41:55
    journey and to really help help
  • 00:41:57
    accelerate time to Insight for all of
  • 00:42:00
    us next is data governance our favorite
  • 00:42:04
    topic so as we know metadata is the core
  • 00:42:07
    of everything we do here at data plx
  • 00:42:10
    right so we're constantly thinking in
  • 00:42:13
    addition to help you better discover and
  • 00:42:15
    better understand this data can we also
  • 00:42:18
    make metadata more actionable to help
  • 00:42:20
    you drive active actions in terms of
  • 00:42:24
    data governance
  • 00:42:25
    operations so this is exactly the
  • 00:42:27
    motivation for governance rules where we
  • 00:42:30
    start from the metadata you already have
  • 00:42:33
    in datax whether it's technical metadata
  • 00:42:35
    or business metadata and then you will
  • 00:42:39
    be able to Define and enforce governance
  • 00:42:42
    policies at scale with the help of
  • 00:42:44
    dataplex
  • 00:42:45
    so here's how it works first of all you
  • 00:42:48
    start by writing a search query in
  • 00:42:50
    dataplex to identify all the entries and
  • 00:42:53
    Fields that are relevant for a
  • 00:42:56
    particular governance policy to be
  • 00:42:58
    applied and then you can Define your
  • 00:43:00
    policy in the form of governance rules
  • 00:43:03
    with the help of data Plex and then data
  • 00:43:05
    Flex will help you apply and enforce
  • 00:43:08
    this policy across your distributed data
  • 00:43:11
    landscape with proper monitoring
  • 00:43:13
    included so in summary what we're
  • 00:43:16
    providing here is a single pain of glass
  • 00:43:18
    for you to indicate and enforce your
  • 00:43:21
    governance intendet scale across
  • 00:43:23
    different types of data no matter where
  • 00:43:25
    they're stored
  • 00:43:27
    now as you can imagine the possibility
  • 00:43:30
    of governance rules is really endless
  • 00:43:33
    right the rules could be about access
  • 00:43:34
    control could be about data life cycle
  • 00:43:36
    management could also be about running
  • 00:43:38
    data quality checks and many
  • 00:43:40
    more so today to start this journey
  • 00:43:43
    we're very excited to announce the
  • 00:43:45
    initial launch of governance rules
  • 00:43:47
    starting from fine grin Access Control
  • 00:43:49
    across big query and GCS so that instead
  • 00:43:53
    of having to configure governance
  • 00:43:55
    policies one table at a top time or one
  • 00:43:57
    column at a time you can now leverage
  • 00:44:00
    data Plex to apply them automatically
  • 00:44:02
    for you at scale and this would work
  • 00:44:05
    across big Cory and Google Cloud Storage
  • 00:44:09
    assets as
  • 00:44:10
    described so the goal here is to really
  • 00:44:12
    help you streamline the governance
  • 00:44:14
    operation and to really minimize any
  • 00:44:16
    potential risk to your security
  • 00:44:20
    posture last but not least we're very
  • 00:44:22
    excited to announce the latest key
  • 00:44:24
    launches driven by the partnership
  • 00:44:27
    between dataplex and
  • 00:44:30
    calbra specifically this is the preview
  • 00:44:33
    of metadata sync from dataplex to
  • 00:44:35
    calibra including technical metadata
  • 00:44:38
    business metadata as well as table level
  • 00:44:41
    lineage from
  • 00:44:42
    bikari so for this joint effort our goal
  • 00:44:45
    here is to really provide this unified
  • 00:44:49
    data Discovery experience spanning
  • 00:44:51
    multicloud and hybrid Cloud
  • 00:44:54
    environments and this is only the
  • 00:44:56
    beginning there are more exciting and
  • 00:44:59
    deeper Integrations that are being
  • 00:45:01
    planned and being worked on and
  • 00:45:04
    ultimately our goal is to provide the
  • 00:45:07
    flexibility of options and to be able to
  • 00:45:09
    help you combine the both Best of Both
  • 00:45:12
    Worlds for our
  • 00:45:14
    customers so with that thank you so much
  • 00:45:17
    for joining our session today thank you
  • 00:45:19
    so much Cynthia and Steve for the
  • 00:45:21
    wonderful insights you have shared thank
  • 00:45:23
    you everybody for
  • 00:45:25
    coming
  • 00:45:29
    w
标签
  • Data Governance
  • AI
  • Data Discovery
  • Data Quality
  • Google Cloud
  • DataX
  • Ford
  • Orange
  • Metadata
  • Data Management