Data governance in the AI era
Zusammenfassung
TLDRThe session on data governance emphasizes the vital role of quality data in successfully leveraging AI technologies. With speakers Cynthia Gums from Ford and Steve Jared from Orange, insights are shared on the challenges faced companies today such as dark data and governance complexities. Solutions like DataX from Google Cloud are highlighted, showcasing capabilities like automated cataloging and intelligent data management, which help organizations better manage their data. Furthermore, the session discusses the evolving approaches to data discovery and governance within both organizations through user-centric interfaces and AI enhancements.
Mitbringsel
- ๐ Data governance is critical in the age of AI.
- ๐ 66% of organizations report having dark data.
- ๐ DataX automates data governance at scale.
- ๐ Quality data is vital for effective AI outputs.
- ๐ Ford and Orange share their data governance journeys.
- ๐ ๏ธ Automated cataloging enhances data discovery.
- ๐ Data democracy improves data accessibility.
- โ๏ธ Governance rules simplify compliance management.
- ๐ค AI aids in metadata enrichment and quality checks.
- ๐ Continuous user feedback improves data platforms.
Zeitleiste
- 00:00:00 - 00:05:00
The session on data governance in the age of AI features speakers from Ford and Orange, discussing the importance of data governance as AI technology evolves. The agenda includes an introduction, case studies, and updates on data governance tools.
- 00:05:00 - 00:10:00
Data is essential for AI, but many organizations struggle with 'dark data' and data quality issues. A significant percentage of organizations report that much of their data is not utilized, leading to challenges in data governance due to the complexity of data landscapes.
- 00:10:00 - 00:15:00
Google Cloud's DataX is introduced as a solution for automating data governance and management. It integrates with various services to provide unified metadata, centralized security, and intelligent data management features, helping organizations build trust in their data.
- 00:15:00 - 00:20:00
Cynthia from Ford discusses her role in data discovery and classification, emphasizing the importance of data governance in creating a single source of truth. Ford's data platform, powered by Google Cloud, aims to organize data sources and improve data accessibility.
- 00:20:00 - 00:25:00
Ford's data governance strategy includes using DataX for capturing metadata and implementing data lineage to understand data origins and life cycles. They are also working on automating metadata enrichment to enhance data discovery.
- 00:25:00 - 00:30:00
Cynthia highlights the challenges of user experience in data discovery, noting the need for tailored interfaces for different user personas. Ford has developed a custom data discovery hub to improve user engagement and data accessibility.
- 00:30:00 - 00:35:00
Steve from Orange shares their journey in data governance, emphasizing the need to break down data silos and improve data accessibility across their operations in 26 countries. They aim to create a data democracy to enhance data utilization.
- 00:35:00 - 00:40:00
Orange's approach includes using policy as code for data governance, enabling better management of data quality and access control. They have established a centralized team to define architecture and support data product development across regions.
- 00:40:00 - 00:45:30
The session concludes with updates on new features in DataX, including automated cataloging, lineage tracking, and governance rules, aimed at enhancing data discovery and governance across organizations.
Mind Map
Video-Fragen und Antworten
What is data governance?
Data governance refers to the management of data availability, usability, integrity, and security in an organization.
What are some challenges in data governance?
Challenges include dark data (data that is not discovered or used), data quality issues, and the complexity of enterprise data landscapes.
What is DataX?
DataX is a tool developed by Google Cloud for automating data governance and management at scale.
How does Ford use DataX?
Ford utilizes DataX for capturing metadata, enhancing data discovery, and maintaining data quality across its platform.
What innovations are being implemented in data discovery?
Innovations include automated cataloging, natural language querying, and enhanced metadata management.
What is the importance of metadata?
Metadata provides context and enables effective data search, discovery, and governance.
How does Orange approach data governance?
Orange adopts a data democracy approach, utilizing policy as code to manage access and maintain data quality across its operations.
What role does AI play in data governance?
AI assists in automating data classification, anomaly detection, and enhancing the overall data discovery process.
What are governance rules in DataX?
Governance rules allow organizations to define and enforce data governance policies at scale based on existing metadata.
What are the next steps for data discovery at Ford?
Ford is piloting a data marketplace experience to enhance user access and streamline data requests.
Weitere Video-Zusammenfassungen anzeigen
bab.sistem saraf - neurotransmiter - biologi sma kelas materi pembahsan
MODEL ARCHITECTURES | SISTEM TERDISTRIBUSI
Scientific Research is a Token of Humankind's Survival || Class 11 English Summary in NepaliโGurubaa
TAIBO II Manฬanera 02/05/25
How I manifested $200k+ from nothing (full story)
How to GENERATE CONVERSATION between Synthesis Sources (2 Ways)
- 00:00:00[Music]
- 00:00:11good morning everyone thank you so much
- 00:00:13for coming to this breakout session on
- 00:00:16data governance in the age of
- 00:00:20AI we're very excited to have with us
- 00:00:23two customer speakers today we have
- 00:00:26Cynthia gums who is the manager of
- 00:00:29global data insights and analytics at
- 00:00:31Ford Driving key initiatives around the
- 00:00:34new data Factory at Ford we also have
- 00:00:37Steve Jared who is the chief AI officer
- 00:00:41at Orange leading Ai and data strategy
- 00:00:45for orange across 26 countries and my
- 00:00:49name is louan I'm a product manager for
- 00:00:51data Plex here at Google Cloud so we're
- 00:00:55very excited to be sharing with you how
- 00:00:58we think about data governance in this
- 00:01:00age of AI and what do our Journeys each
- 00:01:04look
- 00:01:06like so here's our agenda for today
- 00:01:09we're going to start with an
- 00:01:10introduction and a product overview
- 00:01:13followed by case studies from Ford and
- 00:01:16orange and then we'll talk about what's
- 00:01:19new what's upcoming in
- 00:01:24datax so as all of us have experienced
- 00:01:27recently generative AI is really this
- 00:01:30paradigm shift that is
- 00:01:33revolutionizing how we operate as
- 00:01:36businesses whether it is generating
- 00:01:38creative content whether it is working
- 00:01:40with complex data whether it's improving
- 00:01:43your customer experiences or even
- 00:01:46training your own large language models
- 00:01:48for Enterprise use cases the impact of
- 00:01:51AI is really profound and all
- 00:01:57encompassing at the same time we know
- 00:02:00that data is the fuel that feeds into
- 00:02:03the engine of AI it is really the
- 00:02:06critical foundation for training and
- 00:02:09grounding your models and in return this
- 00:02:13rapid growth in AI Innovation is really
- 00:02:17creating an accelerating demand for data
- 00:02:21that is well governed high quality and
- 00:02:24easy to
- 00:02:27discover so given the strong need and
- 00:02:30Northstar Vision what are the challenges
- 00:02:33that companies are actually facing in
- 00:02:37reality well first and foremost we have
- 00:02:40the challenge of dark data which I'm
- 00:02:43sure most of you could resonate with in
- 00:02:46fact what we know that
- 00:02:4966% of organizations have reported that
- 00:02:52at least half of their data is dark
- 00:02:56which means that it is data that is not
- 00:02:58even discovered or used in the first
- 00:03:03place and even if you're able to
- 00:03:05discover and use that data there's still
- 00:03:08a lot of questions about data quality
- 00:03:11whether this data is valid whether this
- 00:03:14data is
- 00:03:16trustworthy we learned from our survey
- 00:03:18that only
- 00:03:2044% of data leaders are fully confident
- 00:03:24in the quality of their organization's
- 00:03:26data and as we all know local quality
- 00:03:30data would only result in low quality
- 00:03:33output which you really cannot trust for
- 00:03:36any inside generation or
- 00:03:41decision-making now the reason why
- 00:03:44managing and governing your data it's so
- 00:03:47difficult it's due to the complexity of
- 00:03:50the Enterprise data
- 00:03:52landscape as you can see here on the
- 00:03:54diagram data is really coming in from
- 00:03:57various different sources the are stored
- 00:04:00and processed in different Services
- 00:04:04whether it's data warehouse data lakes
- 00:04:06or
- 00:04:07databases they reside in different
- 00:04:10formats and they're used by different
- 00:04:13personas across different
- 00:04:16workflows now please raise your hand if
- 00:04:19this complex situation ever seemed
- 00:04:21familiar to
- 00:04:23you yes I see a lot of hands
- 00:04:26raised definitely that's what we see all
- 00:04:29the time as well so this challenge is
- 00:04:33exactly what's keeping us busy here at
- 00:04:35Google cloud and as you may have learned
- 00:04:38from this conference so far we're really
- 00:04:40evolving bigquery into a unified data
- 00:04:44and AI governance platform a unified
- 00:04:47data and AI platform and this platform
- 00:04:49is designed with data governance as a
- 00:04:52central builting consideration that is
- 00:04:55contextual and pervasive across the
- 00:04:58different layers of the text
- 00:05:01deck now at the heart of providing this
- 00:05:05unified data and AI governance is datax
- 00:05:09which is our native offering for
- 00:05:11automating data governance and data
- 00:05:13management at
- 00:05:15scale there are several key value
- 00:05:17propositions of datax we have seen truly
- 00:05:21resonating with our customers based on
- 00:05:25interactions first and foremost dataplex
- 00:05:27deeply integrates with various products
- 00:05:30and services to really provide this
- 00:05:33UniFi metadata across distributed data
- 00:05:36and based on this you're able to perform
- 00:05:38search across different projects across
- 00:05:42different regions and across different
- 00:05:44data
- 00:05:46silos and based on that you're also able
- 00:05:49to further enrich and organize your data
- 00:05:52as needed so that's number one number
- 00:05:56two on top of this wealth of metadata
- 00:05:58datax offers centralized security and
- 00:06:02governance
- 00:06:04features this really allows you to
- 00:06:06easily manage your data governance
- 00:06:09policies based on understanding of the
- 00:06:12metadata
- 00:06:13context and last but not least datax has
- 00:06:16a rich set of features around
- 00:06:19intelligent data management from
- 00:06:21tracking data lineage to assessing data
- 00:06:25profile and to automating data quality
- 00:06:28checks so really helping you build
- 00:06:30better trust in your data and helping
- 00:06:33you optimize data related
- 00:06:38Roi now since GA launch back in 2022
- 00:06:42datax has been widely adopted by
- 00:06:46customers across different geographies
- 00:06:49and different industry verticals as of
- 00:06:52now over
- 00:06:5395% of the top data analytics customers
- 00:06:57at Google Cloud are all already using
- 00:07:00data Plex for managing and governing
- 00:07:02their data at
- 00:07:04scale So today we're very excited to be
- 00:07:07hearing from two of them Ford and orange
- 00:07:11so please join me in first welcoming
- 00:07:14Cynthia from Ford to talk about her
- 00:07:17journey with data
- 00:07:20[Applause]
- 00:07:28governance
- 00:07:33all right good day how's everybody doing
- 00:07:38today so my name is Cynthia gums and I'm
- 00:07:40responsible for data Discovery and
- 00:07:42classification at Ford Motor Company
- 00:07:45welcome to my TED
- 00:07:48talk now I'm just
- 00:07:53kidding and
- 00:07:57so while this may not be a t talk I
- 00:08:01promise you that this is an important
- 00:08:02topic and I'm really honored to have the
- 00:08:05opportunity to speak with you today
- 00:08:07about it
- 00:08:09okay so before we start o that's loud
- 00:08:14before we start I'm curious about who's
- 00:08:16in the crowd today so how many of you
- 00:08:20consider yourselves a data governance
- 00:08:22professional you can just wave your hand
- 00:08:25all right there's a lot of you out there
- 00:08:27and if you didn't raise your hand how
- 00:08:29many of you have an appreciation for
- 00:08:30what data governance
- 00:08:33is okay so most of you awesome so I as
- 00:08:37it and data professionals I'm sure
- 00:08:39you've had the
- 00:08:40opportunity to explain to nonata or
- 00:08:43non-te people what you do for a living
- 00:08:46right so when you tell someone I work in
- 00:08:49it Information Technology they look at
- 00:08:52you and they think oh you must be doing
- 00:08:54tech support and then they start calling
- 00:08:56you to help them with their Wi-Fi right
- 00:08:59then you you tell them well I work in
- 00:09:00the data analytics department and then
- 00:09:03they think oh she just creates charts
- 00:09:05and things all day she doesn't really do
- 00:09:08a whole lot of anything because that's
- 00:09:09what data analytics is right then you
- 00:09:12tell them well I'm in data
- 00:09:14governance and now they think you have
- 00:09:17the magic key to give everybody access
- 00:09:19to data data governance is so much more
- 00:09:21than that and then for me when I say I
- 00:09:23work in data Discovery and
- 00:09:25classification they have no clue what
- 00:09:27I'm talking about so then I have to
- 00:09:29explain to them I am responsible for
- 00:09:33making sure people can find the data
- 00:09:35they need to solve business problems and
- 00:09:39then I might give an analogy about a
- 00:09:41shopping experience or being in a
- 00:09:44library and then they get it okay so
- 00:09:47data governance is a broad topic today
- 00:09:50I'm just going to scratch the surface of
- 00:09:51it but I'm really going to dig a little
- 00:09:53bit deeper into Data Discovery so our
- 00:09:56Ford data platform is powered by Google
- 00:09:59cloud and we believe that data is
- 00:10:02awesome valuable and worthy of respect
- 00:10:07our platform objective is to establish a
- 00:10:10single source of truth of data to enable
- 00:10:12data fusion and
- 00:10:15responsible data
- 00:10:17usage our platform helps data Engineers
- 00:10:21organize many many data sources across
- 00:10:24every aspect of our business we have a
- 00:10:27very complex environment
- 00:10:29and we have a number of key capabilities
- 00:10:32that allow us to meet our objective data
- 00:10:35governance is a foundational capability
- 00:10:37for our
- 00:10:40platform as a part of our migration to
- 00:10:43Google Cloud we have enabled datax data
- 00:10:46catalog to capture Technical and
- 00:10:48business metadata about our projects and
- 00:10:51data sets each of our governance teams
- 00:10:55benefits from data capabilities but the
- 00:10:58main one is tag templates for the
- 00:11:00collection of business metadata we also
- 00:11:03benefit from dataplex apis to help
- 00:11:07expose that metadata in our custom user
- 00:11:09interfaces as well as our backend
- 00:11:13processes additionally our data quality
- 00:11:15team recently launched data lineage and
- 00:11:18data lineage allows us to understand
- 00:11:20where the data came from and its life
- 00:11:22cycle we also use lineage to understand
- 00:11:26how to troubleshoot the data how to
- 00:11:28identify dependencies and also for data
- 00:11:33Discovery so as I previously previously
- 00:11:36mentioned we use Tag templates as the
- 00:11:38foundation for our data catalog
- 00:11:41experience and this metadata is
- 00:11:43collected as a part of our endtoend data
- 00:11:47process so when you onboard the data
- 00:11:50when you create your project all the way
- 00:11:52through access enablement we're
- 00:11:54collecting the data along the way we
- 00:11:56collect the metadata at every level of
- 00:11:59our project hierarchy starting with
- 00:12:01projects then data sets tables views and
- 00:12:04columns we have tag templates for each
- 00:12:06of those levels today we're collecting
- 00:12:08roughly 70 Business metadata tags that
- 00:12:12might not sound like a lot but it's it's
- 00:12:15important data that we need to help
- 00:12:16people discover the data and actually
- 00:12:18dataplex lets you capture thousands of
- 00:12:22tags most of this metadata is captured
- 00:12:25manually so you can imagine that it's
- 00:12:27kind of tedious time consuming and
- 00:12:29resource
- 00:12:31intensive but we're working with um
- 00:12:35we're experimenting with Gen to see if
- 00:12:37we can do some Automation in the
- 00:12:39metadata enrichment space we have custom
- 00:12:42user interfaces to allow our users to
- 00:12:45input and extract the metadata from the
- 00:12:48data
- 00:12:49catalog now you might be wondering well
- 00:12:51why do you need a custom user interface
- 00:12:54dataplex has an interface via the
- 00:12:56console however there's many reasons
- 00:13:00why we decided to go this route but the
- 00:13:01main one is we need to control the way
- 00:13:05that metadata is input and the way it is
- 00:13:08exposed to our users right now today in
- 00:13:11dataplex which is perfect for a platform
- 00:13:13team it shows you everything staging
- 00:13:17tables temp tables that's great for the
- 00:13:19platform team but your end users don't
- 00:13:21want to see that and so we try to modify
- 00:13:23the experience so that we can curate it
- 00:13:25for
- 00:13:27excellence okay
- 00:13:30another reason why we created a custom
- 00:13:33user experience is because we have so
- 00:13:35many different personas that we're
- 00:13:38trying to satisfy so last year we
- 00:13:41launched our custom data Discovery
- 00:13:43experience this was an exciting time for
- 00:13:46us because we've been on quite a journey
- 00:13:47trying to solve the data Discovery
- 00:13:51challenge this experience makes use of
- 00:13:54the data Plex
- 00:13:55apis and we call it our data Discovery
- 00:13:58hub
- 00:14:00this interface currently has over 9,000
- 00:14:03users in
- 00:14:04growing our users are able to search the
- 00:14:08data catalog and when the results come
- 00:14:11back they're also able to see all the
- 00:14:13metadata related to that data asset they
- 00:14:16can also see the table and view schemas
- 00:14:19as well we've enabled an export
- 00:14:22capability that allows users to export
- 00:14:25their search results the table schemas
- 00:14:28and the metadata if they need to do any
- 00:14:30offline analysis now I need to be clear
- 00:14:33this is not an export of the data this
- 00:14:36is export of the metadata right the
- 00:14:38security folks are probably looking for
- 00:14:40me
- 00:14:41now and so our user interface also
- 00:14:45creates or provides linkages to other
- 00:14:48tools that we have we have our data
- 00:14:51Activation Portal which allows users to
- 00:14:54request access to the data that they've
- 00:14:56discovered then we also have our data
- 00:14:59quality dashboards so they can see the
- 00:15:01completeness the timeliness and the
- 00:15:03other measures for data
- 00:15:09quality all
- 00:15:13right all right so we are
- 00:15:16currently uh oh sorry guys all right so
- 00:15:20one of the key challenges for our data
- 00:15:24Discovery activity is that um the user
- 00:15:28experience is really important and our
- 00:15:30challenge was that we have all these
- 00:15:32different user personas we have data
- 00:15:35analysts data stewards data scientists
- 00:15:38software analysts data engineers and
- 00:15:42then you have your business users all
- 00:15:43these people have a need to discover
- 00:15:45data but their needs are a little bit
- 00:15:47different and so if I return to a
- 00:15:49business user this big long list of
- 00:15:53tables and columns they're not going to
- 00:15:54know what to do with that right and so
- 00:15:57to solve this problem we've engaged
- 00:15:59project product designers to help us go
- 00:16:02through this full effort of
- 00:16:05understanding our personas what do they
- 00:16:07want what are they thinking about when
- 00:16:09they search for data how could we return
- 00:16:11the results such that they have an
- 00:16:13appreciation for what those um what the
- 00:16:16data is and what it can do for them
- 00:16:19we've done user interviews sessions
- 00:16:21where we sit down with them and watch
- 00:16:23them use the tool we've done surveys and
- 00:16:25all of that has resulted in excellent
- 00:16:27feedback that we're using to inform how
- 00:16:30we modify our product another challenge
- 00:16:33is ensuring that we have high quality
- 00:16:35relevant
- 00:16:37metadata if your columns don't have
- 00:16:39descriptions how can someone search and
- 00:16:42find that your column has the data that
- 00:16:43they need to solve their business
- 00:16:46problem however I already mentioned that
- 00:16:48populating metadata is time consuming
- 00:16:50and resource intensive and so how do you
- 00:16:53solve that problem well one thing that
- 00:16:55we've done is we created a tool that
- 00:16:58allows data teams to update their
- 00:17:02descriptions in bulk using their data
- 00:17:05dictionaries as input and so now they
- 00:17:07already have a data dictionary they can
- 00:17:10upload that it goes through terraform
- 00:17:12and it updates their descriptions behind
- 00:17:13the scenes so that was a great
- 00:17:16enhancement that we
- 00:17:18enabled so what's next for data
- 00:17:21Discovery at
- 00:17:22Ford we are currently piloting a data
- 00:17:25Marketplace experience which allows
- 00:17:28users to search find and access data
- 00:17:32well request access to data in the same
- 00:17:34user experience today it's separate
- 00:17:37experiences but we're merging them right
- 00:17:40and so this was a highly requested item
- 00:17:42from our users that we're trying to
- 00:17:45satisfy so far the pilot is going really
- 00:17:48well and we're getting excellent
- 00:17:49feedback that will inform our our
- 00:17:53product and now we're also looking to to
- 00:17:56launch some additional enhancements to
- 00:17:57the experience so pre previously it was
- 00:17:59just a simple search experience but now
- 00:18:01it's search that will have Best Bets
- 00:18:05enabled and that's what we're calling it
- 00:18:07Best Bets Best Bets means um it's a
- 00:18:11keyword focused activity where we have
- 00:18:14popular data sets keywords that describe
- 00:18:16them when users go in to do their search
- 00:18:18if they hit one of those keywords those
- 00:18:21items will bubble to the top of the
- 00:18:22search we're also allowing our users to
- 00:18:25add comments and ratings for the data so
- 00:18:29if you already have access to the data
- 00:18:30you can find it in the catalog and say
- 00:18:33this data set was awesome it helped me
- 00:18:34do X Y or
- 00:18:36Z try it out right and then if they
- 00:18:38thumbs it up that also influences where
- 00:18:41it shows up in the search results the
- 00:18:44next one is around data
- 00:18:47collections so it's kind of like a
- 00:18:49storefront for the data at Ford we have
- 00:18:51a number of different subject areas and
- 00:18:54they want to see their data together
- 00:18:56right they don't want to see it mixed up
- 00:18:58with every every body else's data and so
- 00:19:00we're looking to create these
- 00:19:02collections or storefronts so those
- 00:19:04different subject areas can say you want
- 00:19:07to use my data go to this place in the
- 00:19:09data catalog to find it
- 00:19:12okay so we're also experimenting with
- 00:19:15Gen I'm sure you've been hearing a lot
- 00:19:17about gen during the conference this
- 00:19:19week and so we're looking at using gen
- 00:19:22to assist with metadata enrichment
- 00:19:25automating it as well as helping to
- 00:19:28gener generate business descriptions for
- 00:19:30the data I already mentioned that it's
- 00:19:32time you know time consuming to do so
- 00:19:34manually but how cool would it be if you
- 00:19:37could just throw gen at a table and say
- 00:19:40what's in this table and it figures it
- 00:19:42out because they can tell what's in the
- 00:19:44table from the data and it will describe
- 00:19:45it and you move on now you might have to
- 00:19:48verify that the descriptions are decent
- 00:19:50but once you get that confidence I think
- 00:19:52it'll be a powerful data Discovery
- 00:19:56experience all right and then we're also
- 00:19:58looking at doing an
- 00:19:59llm and we're training it on
- 00:20:02documentation about the data as well as
- 00:20:05the data catalog
- 00:20:06itself now you're taking data Discovery
- 00:20:09to another level right because you can
- 00:20:11do natural language queries ask it
- 00:20:14questions and it's going to combine that
- 00:20:16documentation with that metadata and
- 00:20:19give you a really good answer about what
- 00:20:20data is available to solve your business
- 00:20:23problems lastly we're really looking
- 00:20:26forward to implementing data plexes
- 00:20:29business terms and glossery this will
- 00:20:32allow us to give common understanding to
- 00:20:35those key terms that are the same across
- 00:20:37the company so if you have 50 different
- 00:20:41data sources and they all mention part
- 00:20:42number do we have to Define part number
- 00:20:4550 times in 50 different ways no but the
- 00:20:48B business terms will allow us to have
- 00:20:50that be consistent across all the
- 00:20:53data so we've been on this
- 00:20:56journey for a little over two years
- 00:20:59years and it's been challenging but
- 00:21:02rewarding at the same time especially
- 00:21:04with this data Marketplace launch and so
- 00:21:07I want to show appreciation to Google my
- 00:21:10team and also all the teams involved for
- 00:21:13their engagement and collaboration
- 00:21:15because it's it's been fun I was um in
- 00:21:18another session and the gentleman said
- 00:21:20data governance was
- 00:21:22boring really we're having so much fun
- 00:21:26trying to figure out how to solve this
- 00:21:28data Lex problem or not dataplex problem
- 00:21:30but data data Discovery problem so I'll
- 00:21:33ask you all who here thinks that data
- 00:21:36Discovery is
- 00:21:39easy I don't see a single hand raised
- 00:21:42and thank you for validating us it's not
- 00:21:46easy my manager's in here too it's not
- 00:21:49easy all right so with that I'm going to
- 00:21:53hand it over to Steve from Orange and
- 00:21:55he's going to tell us how they've
- 00:21:58enabled data Discovery with
- 00:22:09gcp thanks Cynthia and we're on a very
- 00:22:12very similar Journey so orange we're one
- 00:22:16of the largest telecoms providers in the
- 00:22:18world we also sell a lot of IT services
- 00:22:21across 26 countries so we were
- 00:22:23originally France Telecom and then we
- 00:22:26acquired operations in many other
- 00:22:28countries ranging from Belgium and Spain
- 00:22:31and Poland as well as uh African
- 00:22:34countries ranging from Sagal and Ivory
- 00:22:37Coast to to the Democratic Republic of
- 00:22:39Congo so we have an enormously diverse
- 00:22:42set of challenges with almost 300
- 00:22:45million customers across those countries
- 00:22:48and the company is really a very proud
- 00:22:51technological company a lot of the
- 00:22:53reasons why you have power saving modes
- 00:22:56in 5G today is because Orange cared
- 00:22:58about the impact uh of these
- 00:23:01Technologies on the environment um for
- 00:23:03for decades and we're also at the
- 00:23:05Forefront of a lot of AI work I'm I'm
- 00:23:08really lucky to have a dedicated AI
- 00:23:11research team that came from France
- 00:23:14Telecom labs and so in my central team
- 00:23:17we have not only all the data
- 00:23:18engineering data science and ml
- 00:23:20engineering but also as I mentioned the
- 00:23:22pure research team and we're really
- 00:23:24focused in three domains uh and we use
- 00:23:27superpower inter changeably with AI so
- 00:23:30we say that we're trying to superpower
- 00:23:32our employees daily lives we're trying
- 00:23:35to superpower all of our networks and
- 00:23:37we're trying to superpower our customer
- 00:23:39experiences and as Cynthia was
- 00:23:41describing there's many challenges in
- 00:23:44providing these kinds of services at
- 00:23:46scale so before we started to use Google
- 00:23:50Cloud we had data in organizational
- 00:23:55silos that were mapped to the physical
- 00:23:58infrastructure Ure so each of these
- 00:23:59teams within the countries like the
- 00:24:01network team the finance team they had
- 00:24:03built and maintained their own data
- 00:24:05infrastructures that led to these silos
- 00:24:07that map to these cult cultural and and
- 00:24:10uh and operational silos that we faced
- 00:24:13and across these 26 countries the the
- 00:24:17data infrastructure that we had was
- 00:24:18incredibly
- 00:24:19heterogeneous and mostly had been um
- 00:24:22self-integrated uh and managed and so
- 00:24:25that the the level of complexity of
- 00:24:26maintaining that infrastructure and the
- 00:24:29skills necessary for the teams to manage
- 00:24:31that infrastructure were extremely
- 00:24:33complex and and that a lot of the time
- 00:24:35that was taken by the data engineering
- 00:24:37teams was just managed keeping the
- 00:24:39lights on on our
- 00:24:41infrastructure and data governance was
- 00:24:43also managed uh very very manually uh
- 00:24:47with these very basic systems and we had
- 00:24:49many different security and Regulatory
- 00:24:52risk that we had to mitigate um through
- 00:24:54these systems and as Cynthia was saying
- 00:24:57a lot of our Executives didn't really
- 00:25:00see data governance as strategic they
- 00:25:04saw it as something that was just like a
- 00:25:06Regulatory Compliance requirement they
- 00:25:08didn't see it as
- 00:25:10enabling uh the our ability to reach AI
- 00:25:13at scale and so that was really
- 00:25:15preventing us from taking advantage of
- 00:25:17these enormous volumes of data that we
- 00:25:18generate across the business to generate
- 00:25:21value from
- 00:25:22that so we established a few years ago
- 00:25:26now this vision of a data democracy
- 00:25:29where we make this data widely available
- 00:25:32within each country by breaking these
- 00:25:34silos that we have between the
- 00:25:37organizations by having very rich data
- 00:25:40Discovery and to do this at scale we use
- 00:25:43policy as code to not only enforce
- 00:25:45access control but also things um like
- 00:25:48all of the data processes that we have
- 00:25:50for maintaining quality through the
- 00:25:52pipeline and that allowed us to really
- 00:25:55use standard cicd techniques and tooling
- 00:25:58to dramatically improve the way that we
- 00:25:59manage our data and also using AI itself
- 00:26:03to identify anomalies in the pipelines
- 00:26:06has been very very useful and so now our
- 00:26:10CEOs also because of the tsunami of AI
- 00:26:13they're really seeing data uh itself as
- 00:26:15being really foundational and crucial to
- 00:26:17the business and the thing that we did
- 00:26:19to encourage that was we set two things
- 00:26:23one was we set a a uniform way to
- 00:26:25measure value on use cases across all 26
- 00:26:28countries and that's widely available on
- 00:26:31a dashboard that every CEO and employee
- 00:26:35can see so they can see which use cases
- 00:26:37are generating a lot of value but also
- 00:26:40we have other operational kpis that
- 00:26:43relate to our data migrations and data
- 00:26:45quality that are also public and so what
- 00:26:48that did did was it created a
- 00:26:50competitive dynamic between our
- 00:26:54CEOs which was really effective and it
- 00:26:56also LED for individual people people in
- 00:26:58the company to see which countries were
- 00:27:00being really successful at different
- 00:27:02parts of their Journey towards AI at
- 00:27:04scale and encourage them to learn from
- 00:27:06one another and so we took inspiration
- 00:27:09from data mesh to then build a set of
- 00:27:11data products but our approach to data
- 00:27:14products is that we have a centralized
- 00:27:16team my team that defines architecture
- 00:27:20with Partners like Google and that
- 00:27:22allows us to have uniform infrastructure
- 00:27:25that's provided to each of these teams
- 00:27:26that's generating data
- 00:27:28and then they're responsible for
- 00:27:31maintaining the freshness and the
- 00:27:32documentation and the quality of that
- 00:27:34data and and and and the level of
- 00:27:37automation that we're providing uh
- 00:27:38enables this really rich um set of
- 00:27:41outputs and value that's getting
- 00:27:42generated by the use cases in these
- 00:27:44countries so this is what it looks like
- 00:27:47there's really three pillars the first
- 00:27:50is the data products that relate to data
- 00:27:52management and data quality and also
- 00:27:55role-based um access with policy code so
- 00:27:59one of the things that's really powerful
- 00:28:01uh with datax and big query in
- 00:28:03Partnership is that you can have very
- 00:28:06clear role-based access to data but also
- 00:28:08on a column level uh which is really
- 00:28:10powerful because there's certain users
- 00:28:13that we have that we don't want them to
- 00:28:14have be able to see the really sensitive
- 00:28:16data but we want to enable them to use
- 00:28:18that data for data
- 00:28:21operations the second part is the
- 00:28:23self-service platform that we built
- 00:28:25based on gitlab and leverages this great
- 00:28:28interaction between big query datax and
- 00:28:30vertex so for me it's this Golden
- 00:28:34Triangle of the ability to use the best
- 00:28:38we think the best data infrastructure in
- 00:28:39the world on bigquery with vertex where
- 00:28:42we get not only the best of the Google
- 00:28:45first-party models and
- 00:28:46tools also through the model Garden we
- 00:28:49get leading State of-the-art openweight
- 00:28:53models as well as state-of-the-art open
- 00:28:56source tooling for manag ing the model
- 00:28:58life cycle and I've never seen a pace of
- 00:29:03innovation in my entire career than what
- 00:29:05we see in open source tooling as well as
- 00:29:08open source and open weight llms so
- 00:29:11being able to manage that as policy is
- 00:29:15code between all of the business
- 00:29:17decision makers whether they be the the
- 00:29:20the the data Governors our engineers and
- 00:29:23the business owners has allowed us to
- 00:29:24really operate this at a much higher
- 00:29:26scale than was possible before
- 00:29:29and then lastly we Federate all of that
- 00:29:32with governance to harmonize not only
- 00:29:34access to the data and the documentation
- 00:29:36but also the cataloges and the forms of
- 00:29:39Discovery so let me talk next about what
- 00:29:41we want to do
- 00:29:43next we've built this early
- 00:29:46implementation of data mesh on top of
- 00:29:49the Google platform and what we've seen
- 00:29:52is that the dataplex tool is incredibly
- 00:29:54useful for us for things like data
- 00:29:57catalog
- 00:29:58autod DQ data loss
- 00:30:01prevention and the other thing that's
- 00:30:03been great about working with the datax
- 00:30:05team in addition to the fact that
- 00:30:07they've been extremely reactive to
- 00:30:09understanding our challenges and
- 00:30:11providing us great interaction with the
- 00:30:14the the roadmap and influencing the
- 00:30:16engineering team but also they've been
- 00:30:18really good about providing open apis to
- 00:30:21our other partners to allow them to have
- 00:30:23Rich synchronization between their
- 00:30:26tooling environment and datax itself s
- 00:30:28so for example uh in the case of cbra we
- 00:30:31use cbra today across the company to
- 00:30:34manage data that's not on gcp that's on
- 00:30:36a lot of existing um
- 00:30:38infrastructure um that hasn't been
- 00:30:40migrated yet so having the ability to
- 00:30:42have really rich interaction between our
- 00:30:45existing data governance infrastructure
- 00:30:47and what we're building datax has been
- 00:30:49extremely
- 00:30:51powerful so where we want to go is we
- 00:30:54have this vision of using AI itself
- 00:30:58um to have a Marketplace for both the
- 00:31:01data that's within bigquery but also
- 00:31:03within vertex and so the idea that we
- 00:31:05can use natural language as a way for
- 00:31:08anyone in the company within this
- 00:31:10Marketplace to query what data is
- 00:31:12available to have very very quick
- 00:31:15business intelligence visualizations of
- 00:31:17that data and be able to answer really
- 00:31:20direct simple questions and have a
- 00:31:22dialogue with the data even before they
- 00:31:24engage a data scientist or a data
- 00:31:26engineer uh really unlocks an enormous
- 00:31:29amount of value in the company and and
- 00:31:32we think that it's a fundamental shift
- 00:31:34in the technological interaction with
- 00:31:36computers where you can use natural
- 00:31:38language as if at at the at the level of
- 00:31:41power that previously you needed to be a
- 00:31:43programmer to achieve and and then
- 00:31:46lastly using AI to detect anomalies in
- 00:31:49our pipelines uh to help us fill uh
- 00:31:52where we have gaps in our data and
- 00:31:54otherwise to make sure that the data
- 00:31:56that we're generating is of high quality
- 00:31:58because it's clear that without
- 00:32:00extremely high quality data we're not
- 00:32:02going to have high quality outputs from
- 00:32:04our AI systems and then that applies not
- 00:32:07only to systems where we're just doing
- 00:32:09inference on existing large language
- 00:32:11models but it's also very true when we
- 00:32:14try to fine-tune models so them for them
- 00:32:17to be much smaller uh and to and operate
- 00:32:20much faster so again having extremely
- 00:32:23high quality data we're we managing the
- 00:32:25lineage of that data and that's really
- 00:32:28easily accessible to the the teams that
- 00:32:29are working on the AI fine-tuning uh has
- 00:32:32been really transformative and so this
- 00:32:34data democracy for us is all about
- 00:32:37having this data easily accessible in
- 00:32:40extremely high quality that's well
- 00:32:42documented including by having
- 00:32:44generative AI generate uh gaps in
- 00:32:47documentation and identify uh uh missing
- 00:32:50elements and having that integrated
- 00:32:53extremely well into the workflow of our
- 00:32:55employees and we think that this data
- 00:32:58democracy will unlock unlock an enormous
- 00:33:00amount of value across the company
- 00:33:03because the amount of data that we're
- 00:33:04generating today that's been very hard
- 00:33:06to manage in the past now with this more
- 00:33:09uniform infrastructure that's not only
- 00:33:11available for us on public Cloud but
- 00:33:14also we've been working very closely
- 00:33:16with Google over the last few years to
- 00:33:18have on premise data infrastructure
- 00:33:20which we announced today so we have a
- 00:33:22GDC Edge uh infrastructure that we can
- 00:33:25deploy in our own data centers in each
- 00:33:27country that also has uh data management
- 00:33:31and AI capability so it gives us this
- 00:33:33really rich environment between hybrid
- 00:33:36between on-prem and public Cloud because
- 00:33:39we have to respond not only to very
- 00:33:41varying regulatory requirements across
- 00:33:43our countries that change often
- 00:33:46unpredictably um but also we have
- 00:33:48commercial constraints because the
- 00:33:50amount of data that's coming off our
- 00:33:51network is enormous so for example just
- 00:33:55the network Telemetry data which which
- 00:33:57is the data we use to operate the
- 00:33:59network it's over a paby a day so having
- 00:34:02something sophisticated on premise to
- 00:34:04allow us to filter that data before we
- 00:34:07send to public cloud and to do that in a
- 00:34:09way that maintains quality and this
- 00:34:11policy of code U mechanism is is
- 00:34:13extremely
- 00:34:15transformative so let me bring Lou back
- 00:34:17up to talk about what's on the road
- 00:34:23map
- 00:34:26thanks all right thank you so much Steve
- 00:34:29and Cynthia for sharing your use cases
- 00:34:32and perspective those are really really
- 00:34:34wonderful insights and I think we can
- 00:34:37all resonate with just how critical it
- 00:34:40is to have this platform with
- 00:34:42self-served Discovery and well-governed
- 00:34:45data it's not easy but that's what we're
- 00:34:48here for so next let's take a look at
- 00:34:51what are the new launches we're very
- 00:34:53excited to announce this
- 00:34:56time first and foremost most everything
- 00:34:58in datax starts from having this unified
- 00:35:01metadata across distributed data and
- 00:35:05that's exactly where automated
- 00:35:07cataloging comes in we have worked very
- 00:35:10closely with various gcp services and
- 00:35:13products in order to ingest that
- 00:35:16metadata to harvest metadata and index
- 00:35:19metadata for search and based on this
- 00:35:23you will be able to discover your assets
- 00:35:25across Analytics data laks databases Ai
- 00:35:30and bi
- 00:35:32Services you're also able to enrich and
- 00:35:35organize this data to track lineage to
- 00:35:38enforce governance policies and really
- 00:35:41having this solid foundation for data to
- 00:35:44AI
- 00:35:46governance already data plaque supports
- 00:35:49a rich set of data sources such as big
- 00:35:52quy and pups sub and today we're super
- 00:35:55excited to announce a host of new new
- 00:35:57Integrations as you can see
- 00:36:00here first are the vertex related
- 00:36:02launches we're very excited to be
- 00:36:04announcing the ga of automated
- 00:36:07cataloging for vertex models and data
- 00:36:10sets and also the preview of automated
- 00:36:13cataloging for vertex AI features with
- 00:36:16those Integrations in place as soon as
- 00:36:18you create a new artifact in vertex AI
- 00:36:21they will be made searchable in datax in
- 00:36:24near real time and this is really
- 00:36:27critical because we truly believe that
- 00:36:30data and AI should be managed and
- 00:36:33governed in a consistent and coherent
- 00:36:36way next are operational databases
- 00:36:39including the ga of big table
- 00:36:42integration spanner integration as well
- 00:36:45as the preview of automated metadata
- 00:36:48cataloging from cloud SQL and it's super
- 00:36:51important to have this coverage for
- 00:36:53operational databases as well to really
- 00:36:56provide and to a
- 00:36:59visibility next we're also actively
- 00:37:01working on looker integration and it's a
- 00:37:04launch that's coming soon so please stay
- 00:37:06tuned and with all of those launches in
- 00:37:09place our goal is to really provide a
- 00:37:12powerful metadata Foundation to you to
- 00:37:15enable automated metadata Discovery
- 00:37:18management and
- 00:37:21governance next is lineage so datax
- 00:37:25already provides the ability for you to
- 00:37:28automatically track and visualize
- 00:37:30lineage as your data artifact flow
- 00:37:33through your distributed data
- 00:37:35landscape now this capability also work
- 00:37:38nicely with other datax features such as
- 00:37:41data quality checks where as soon as a
- 00:37:44data quality issue is discovered you
- 00:37:47will be able to trace upstream and
- 00:37:49downstream in order to understand what
- 00:37:51is the root cause and impact of a
- 00:37:54particular data quality
- 00:37:56breach now with lineage parsing there's
- 00:37:59already native integration with services
- 00:38:02like bigquery data proc and composer and
- 00:38:06we also have datax API and open lineage
- 00:38:09integration to really provide that
- 00:38:12extensibility and today we're really
- 00:38:14excited to announce the lineage support
- 00:38:17for vertex AI pipelines really allowing
- 00:38:20for this end and traceability from data
- 00:38:23processing to data analytics to machine
- 00:38:26learning training and deployment and
- 00:38:29providing you with this endtoend picture
- 00:38:32that is critical for data to AI
- 00:38:34governance and
- 00:38:37compliance now at the same time in
- 00:38:39addition to extending the type of data
- 00:38:41sources being covered we're also
- 00:38:43enhancing the granularity of lineage
- 00:38:45tracking we're very excited to introduce
- 00:38:48the preview for column level lineage in
- 00:38:53Bor oh hey thank
- 00:38:56you
- 00:38:59so ever since introducing table lineage
- 00:39:01in B Cory as well as other services we
- 00:39:04have seen strong customer enthusiasm
- 00:39:07adoption thanks to all of you and we're
- 00:39:09also getting very strong demand for the
- 00:39:12next level granularity which is what
- 00:39:15we're very excited to bring to you today
- 00:39:17so now you're able to perform root cause
- 00:39:19analysis and impact analysis at the
- 00:39:22column level in addition to at the table
- 00:39:25level and imagine when you have a column
- 00:39:28that's identified to contain personal
- 00:39:30identifiable information this is where
- 00:39:32column level lineage really shines right
- 00:39:35where you're able to then control its
- 00:39:37propagation and then be able to comply
- 00:39:39with different
- 00:39:42regulations now there's also more ease
- 00:39:45of use features that we're launching
- 00:39:48together with this so for example
- 00:39:50there's the ability to help you pull up
- 00:39:52all the upstreams and all the
- 00:39:54downstreams of a particular node in the
- 00:39:56lineage graph
- 00:39:57there's also the ability to filter by
- 00:40:00different transformation types to make
- 00:40:02lineage graph more consumable and
- 00:40:04there's also the ability to export
- 00:40:06lineage for offline analysis so all of
- 00:40:09this is to enhance our user experience
- 00:40:13and to make it easier to work with data
- 00:40:15Plex
- 00:40:17lineage next are two gen powered Gemini
- 00:40:21launches from dataplex so first of all
- 00:40:24we know that searching over metadata is
- 00:40:27a really critical experience with datax
- 00:40:30and it's really at the core of what we
- 00:40:32do here at dataplex now in addition to
- 00:40:35doing keyword search with dataplex
- 00:40:37you're able to just ask us a question in
- 00:40:40natural language and datax will be able
- 00:40:43to interpret your intent and be able to
- 00:40:46retrieve the most relevant search
- 00:40:49results this can really go along long
- 00:40:51way to lower this entry barrier as we
- 00:40:53have discussed earlier and to really
- 00:40:55democratize the experience of data
- 00:40:58Discovery to your entire
- 00:41:01organization now once the data is
- 00:41:03discovered there's another really
- 00:41:05exciting gen power features from data
- 00:41:07Plex to help which is Data
- 00:41:10Insights now a lot of us working with
- 00:41:13data must have experienced the cold star
- 00:41:16problem now which is once you find a
- 00:41:19valuable data asset you're sometimes not
- 00:41:21sure what is the best SQL queries to
- 00:41:24write in order to really extract that
- 00:41:26meaning ful Insight from the data so
- 00:41:29that's exactly word data Insight is here
- 00:41:32to help it would automatically generate
- 00:41:35and suggest SQL queries as well as a
- 00:41:38list of questions you can ask of a table
- 00:41:41in natural language and it will provide
- 00:41:44validated SQL cars to you as well that
- 00:41:47is ready to run in big car
- 00:41:50studio so this could really help give
- 00:41:53you a jump start into your analysis
- 00:41:55journey and to really help help
- 00:41:57accelerate time to Insight for all of
- 00:42:00us next is data governance our favorite
- 00:42:04topic so as we know metadata is the core
- 00:42:07of everything we do here at data plx
- 00:42:10right so we're constantly thinking in
- 00:42:13addition to help you better discover and
- 00:42:15better understand this data can we also
- 00:42:18make metadata more actionable to help
- 00:42:20you drive active actions in terms of
- 00:42:24data governance
- 00:42:25operations so this is exactly the
- 00:42:27motivation for governance rules where we
- 00:42:30start from the metadata you already have
- 00:42:33in datax whether it's technical metadata
- 00:42:35or business metadata and then you will
- 00:42:39be able to Define and enforce governance
- 00:42:42policies at scale with the help of
- 00:42:44dataplex
- 00:42:45so here's how it works first of all you
- 00:42:48start by writing a search query in
- 00:42:50dataplex to identify all the entries and
- 00:42:53Fields that are relevant for a
- 00:42:56particular governance policy to be
- 00:42:58applied and then you can Define your
- 00:43:00policy in the form of governance rules
- 00:43:03with the help of data Plex and then data
- 00:43:05Flex will help you apply and enforce
- 00:43:08this policy across your distributed data
- 00:43:11landscape with proper monitoring
- 00:43:13included so in summary what we're
- 00:43:16providing here is a single pain of glass
- 00:43:18for you to indicate and enforce your
- 00:43:21governance intendet scale across
- 00:43:23different types of data no matter where
- 00:43:25they're stored
- 00:43:27now as you can imagine the possibility
- 00:43:30of governance rules is really endless
- 00:43:33right the rules could be about access
- 00:43:34control could be about data life cycle
- 00:43:36management could also be about running
- 00:43:38data quality checks and many
- 00:43:40more so today to start this journey
- 00:43:43we're very excited to announce the
- 00:43:45initial launch of governance rules
- 00:43:47starting from fine grin Access Control
- 00:43:49across big query and GCS so that instead
- 00:43:53of having to configure governance
- 00:43:55policies one table at a top time or one
- 00:43:57column at a time you can now leverage
- 00:44:00data Plex to apply them automatically
- 00:44:02for you at scale and this would work
- 00:44:05across big Cory and Google Cloud Storage
- 00:44:09assets as
- 00:44:10described so the goal here is to really
- 00:44:12help you streamline the governance
- 00:44:14operation and to really minimize any
- 00:44:16potential risk to your security
- 00:44:20posture last but not least we're very
- 00:44:22excited to announce the latest key
- 00:44:24launches driven by the partnership
- 00:44:27between dataplex and
- 00:44:30calbra specifically this is the preview
- 00:44:33of metadata sync from dataplex to
- 00:44:35calibra including technical metadata
- 00:44:38business metadata as well as table level
- 00:44:41lineage from
- 00:44:42bikari so for this joint effort our goal
- 00:44:45here is to really provide this unified
- 00:44:49data Discovery experience spanning
- 00:44:51multicloud and hybrid Cloud
- 00:44:54environments and this is only the
- 00:44:56beginning there are more exciting and
- 00:44:59deeper Integrations that are being
- 00:45:01planned and being worked on and
- 00:45:04ultimately our goal is to provide the
- 00:45:07flexibility of options and to be able to
- 00:45:09help you combine the both Best of Both
- 00:45:12Worlds for our
- 00:45:14customers so with that thank you so much
- 00:45:17for joining our session today thank you
- 00:45:19so much Cynthia and Steve for the
- 00:45:21wonderful insights you have shared thank
- 00:45:23you everybody for
- 00:45:25coming
- 00:45:29w
- Data Governance
- AI
- Data Discovery
- Data Quality
- Google Cloud
- DataX
- Ford
- Orange
- Metadata
- Data Management