IPL Data Exploration SQL Portfolio Project - Part 1 | Analytics | Ashutosh Kumar
概要
TLDRThe video focuses on analyzing an IPL (Indian Premier League) data set spanning matches from 2008 to 2020 using SQL. The presenter demonstrates how to solve various data analysis problems such as counting matches per season, identifying top-performing players, and understanding broader statistical trends. Challenges in data import due to size were resolved by breaking datasets into smaller parts and aggregating results using SQL functions like union, GROUP BY, and rank. The presenter answers viewer questions about improving SQL skills, practicing data analysis, and preparing for job interviews in data roles. Viewers are encouraged to practice SQL on platforms like HackerRank and engage with practical exercises to solidify their skills.
収穫
- 👉 Understanding the structure and purpose of IPL data sets from 2008-2020.
- 🎯 Techniques to solve SQL data analysis problems.
- 💡 Utilizing SQL functions: SELECT, COUNT, GROUP BY, UNION.
- 🔥 Analyzing player performance and match statistics.
- ⏸️ Handling large datasets by breaking and reassembling them.
- 🧩 Addressing data import challenges and solutions.
- 📊 Importance of SQL skills for data analytics roles.
- 🚀 Practical SQL exercises on platforms like HackerRank.
- 🛠️ Solutions to common SQL interview questions.
- 🔍 Data cleaning insights using real-world datasets.
タイムライン
- 00:00:00 - 00:05:00
The speaker introduces the topic of IPL datasets available for download, where they have shared links in the community tab for access. The main focus will be on solving various data points using these datasets, specifically two main datasets containing IPL matches data from 2008 to 2020. The speaker explains the dataset's structure, including details such as match ID, city, date, player of the match, venue, and match results.
- 00:05:00 - 00:10:00
The speaker discusses two datasets: one with general match information and the second one detailing ball-by-ball data, which contains about two lakh entries. They highlight the challenge of handling large datasets in SQL Server Management Studio (SSMS) due to row limits and describe their method of splitting the data into smaller segments to import successfully. They encourage the audience to try importing directly but offers a solution if issues arise.
- 00:10:00 - 00:15:00
The speaker confirms their audio is working and shares links to the dataset for the audience to download, emphasizing they are still available in the community tab. They address a question about SQL competencies needed for interviews, suggesting that the content in their beginner to advanced SQL playlists is sufficient for analytics roles.
- 00:15:00 - 00:20:00
The speaker outlines how they divided the ball-by-ball dataset into smaller datasets for easier import into SQL, noting issues with row limits. They explain using a union operation to combine the smaller datasets back into one table within the SQL environment, successfully preparing the data for analysis.
- 00:20:00 - 00:25:00
The speaker prepares to start querying the datasets to solve specific problems, like counting matches per season, by utilizing SQL functions such as SELECT and GROUP BY. They reflect on the necessity of understanding dataset details before conducting any analysis and adjusting methodologies as required.
- 00:25:00 - 00:30:00
The speaker begins solving a data exploration task: counting the number of IPL matches played per season. They describe extracting the year from match dates to achieve this. This process involves SQL operations that filter and group data, then use SELECT commands to display results in a structured format.
- 00:30:00 - 00:35:00
The speaker solves for the player with the most 'Player of the Match' awards using SQL commands to count occurrences in the dataset, filtering and sorting results to highlight the player with the most awards. They demonstrate using SELECT and ORDER BY in SQL queries to find these statistics efficiently.
- 00:35:00 - 00:40:00
The speaker attempts to find the player with the most 'Player of the Match' awards per season using SQL. They run into issues with query syntax, particularly with ORDER BY in subqueries, but resolve these to achieve sorted data demonstrating the use of ranking functions in getting desired outputs.
- 00:40:00 - 00:45:00
The speaker continues solving dataset queries, moving to finding the team with the most wins in IPL history, using simple group-by logic over the 'winner' column to list teams and their win counts. They refine sorting results by descending order to show top team performers.
- 00:45:00 - 00:50:00
Attention is given to finding the locations where the most matches were played, illustrating the query process by grouping data based on venues and counting occurrences to identify top venues. They navigate SQL limitations on sorting and query structure to obtain a ranked list of venues.
- 00:50:00 - 00:55:00
The speaker extracts batsman performance data from ball-by-ball details, counting runs and sixes to identify players with top scoring. They instruct on joining tables and aggregating necessary performance data, applying SQL functions to extract, sort, and present summarized cricket statistics.
- 00:55:00 - 01:00:00
The task involves calculating the percentage of total runs scored by each batsman. The speaker describes using SQL to join summary predicates and compare individual performance against cumulative data, emphasizing understanding SQL's aggregation capabilities for statistical summaries.
- 01:00:00 - 01:05:00
The query process extends to counting sixes scored by players, using a similar logic to previous scoring counts, filtering based on run values associated with each play action. The speaker builds complex counting and sorting operations to identify top six-hitters throughout recorded matches.
- 01:05:00 - 01:10:00
The speaker demonstrates calculating the highest strike rate among batsmen with over 3000 runs, highlighting rankings in SQL processes. They emphasize handling of diverse data operations including filtering with sub-query conditionals and using calculated metrics for performance evaluation.
- 01:10:00 - 01:24:16
In closing, the speaker addresses questions on dataset sources and the preparation for competition in data roles, while giving advice on improving SQL skills through practice and problem-solving on platforms like HackerRank and LeetCode. They discuss future live streams focusing on remaining queries and encourage sharing of alternative solutions.
マインドマップ
よくある質問
What is the main content of the IPL data set discussed in the video?
India's Premier League (IPL) data analysis, focusing on matches and player statistics from 2008 to 2020.
What skills does the video teach?
Using SQL queries to solve various data analysis problems related to an IPL data set.
How does the video approach solving data analysis problems?
By SQL queries such as SELECT, GROUP BY, COUNT, and by analyzing player performance and match stats from the data.
How does the presenter suggest improving SQL skills?
Participants should practice SQL queries on platforms like HackerRank or LeetCode and understand the concepts from the video tutorials.
What other exercises guide viewers in building their SQL proficiency?
Working through common SQL interview problems and analytical exercises such as those discussed in the video.
How can the learned skills be utilized in real-life projects?
As practice projects for SQL and data analysis, participants can create dashboards using the insights obtained from data sets.
Where can one practice data cleaning skills?
Data cleaning exercises can be practiced using Kaggle or by addressing inconsistencies like NA values in a dataset.
What challenges might viewers face while learning SQL?
These include practicing on coding platforms, understanding advanced functions and fixing syntax errors in SQL scripts.
What time frame and data does the video explore?
2008 to 2020 IPL matches, focused on details such as player stats, venues, and match outcomes.
What kind of data sets are utilized in the video?
The data sets are substantial, consisting of match information and ball-by-ball details for analysis exercises.
ビデオをもっと見る
340 Pound Foster Mom Punishes Son By Laying On Him Until He Dies
Emergency Planning Training from SafetyVideos.com
47 Ronin – How to Waste a Movie | Anatomy of a Failure
Creative thinking - how to get out of the box and generate ideas: Giovanni Corazza at TEDxRoma
How to Generate Ideas | the CREATIVE Mindset
Elon Musk: Idea Generation
- 00:00:01so friends today we'll be discussing on
- 00:00:03to the ipl data set
- 00:00:06so
- 00:00:06i've got few data sets
- 00:00:09onto the ipl
- 00:00:11uh from google
- 00:00:13i've attached few links
- 00:00:16into the
- 00:00:17community tab also
- 00:00:20i have given the link so you can
- 00:00:22download the data sets from the link
- 00:00:25which i have mentioned into the
- 00:00:27community tab and
- 00:00:29here we'll be solving
- 00:00:31multiple data points
- 00:00:34which i have gathered for this purpose
- 00:00:38so you'll be seeing
- 00:00:40multiple sheets multiple links which are
- 00:00:42provided into the
- 00:00:44community tab post for this particular
- 00:00:47live stream
- 00:00:48i will simply explain what all these
- 00:00:51data sets are all about
- 00:00:54and then we'll be proceeding ahead from
- 00:00:56there so mainly i have two
- 00:00:59data sets as you can see here
- 00:01:02uh
- 00:01:03so from the name you can just pick up
- 00:01:06so these two are main data sets ipl
- 00:01:09matches
- 00:01:10uh 2008 to 2020. so basically both of
- 00:01:14these sheets or all of these sheets they
- 00:01:17contain the data from 2008 to 2020
- 00:01:22so
- 00:01:22only
- 00:01:23these particular date ranges data are
- 00:01:26present into all of these sheets so into
- 00:01:29this sheet you can see here i'll just
- 00:01:31quickly explain what is this data all
- 00:01:33about
- 00:01:34so into the column a you can see i have
- 00:01:37the id so this can be considered as the
- 00:01:40id of the different matches which are
- 00:01:42which would have been played
- 00:01:44into the column b i have the
- 00:01:47city so into which city this particular
- 00:01:49match was being played into the column c
- 00:01:52i have the date date on which this match
- 00:01:55was being played the player of match of
- 00:01:58this particular match
- 00:02:00the venue in which the match was
- 00:02:02conducted uh whether this venue was a
- 00:02:05neutral venue or not
- 00:02:06the team won uh similarly the team to
- 00:02:10who won the toss
- 00:02:13and what was the decision which was
- 00:02:15taken by the team who had won the toss
- 00:02:19similarly the winner who was the winner
- 00:02:21of this particular match
- 00:02:23so into a row you can see
- 00:02:25uh the data is present for just a match
- 00:02:29also we have the result like what was
- 00:02:31the result of this particular match so
- 00:02:34as you can see the results it was based
- 00:02:36on to the wickets or it was based on to
- 00:02:38the run so and also the result margin so
- 00:02:41you can see 140 so this particular match
- 00:02:44was won by kolkata night riders by 140
- 00:02:47runs so this particular data is present
- 00:02:49or whether this was an eliminator or not
- 00:02:51so eliminator it simply means that
- 00:02:54either it would have been a semi-final
- 00:02:56or a final match
- 00:02:58uh in method you can see na is present
- 00:03:00but will be coming to this later what is
- 00:03:03present into this method into the column
- 00:03:06p you can see the umpire one and the
- 00:03:07empire two information is present here
- 00:03:10so i hope the data set is all clear so
- 00:03:13you can
- 00:03:14just mention it
- 00:03:16whether the data set is clear or not
- 00:03:20and then we'll be moving ahead from here
- 00:03:26so
- 00:03:27so this was the very first data set
- 00:03:29coming to the second data set you can
- 00:03:32see
- 00:03:33so this data set is ipl ball by ball
- 00:03:362008 to 2020 now this particular data
- 00:03:40set is massive
- 00:03:41massive in the sense like
- 00:03:43this particular sheet it contains a lot
- 00:03:45of data so you can just see the count
- 00:03:47so the count is close to two lakhs so
- 00:03:50two lakh rows is contained into this
- 00:03:52particular sheet similarly you can see
- 00:03:54into the column i have the id of the
- 00:03:55match into the column b i have the
- 00:03:57inning so uh there are two innings so
- 00:04:00whether this innings whether this
- 00:04:02information is from the innings one or
- 00:04:04whether this information is from the
- 00:04:06innings to
- 00:04:07and also the over is mentioned here so
- 00:04:10which ball like
- 00:04:12so this is into the information so the
- 00:04:15information which is presented to the
- 00:04:16column d you can see this is the ball so
- 00:04:19this is the fifth ball of the sixth over
- 00:04:21so this information is present here
- 00:04:23similarly the batsman name
- 00:04:26uh the non striker the baller name
- 00:04:30like for this particular ball who was
- 00:04:33the
- 00:04:34non-striker who was the batsman who was
- 00:04:36the baller so what was the batsman runs
- 00:04:39what was the extra runs which were
- 00:04:41considered
- 00:04:42what was the total runs
- 00:04:44uh non-boundary runs is wicked so was
- 00:04:47this uh did they get a wicket or not
- 00:04:51into this particular ball
- 00:04:53so we have all such information which is
- 00:04:55present here similarly we have the
- 00:04:56batting team bowling team and all these
- 00:04:59other informations into this particular
- 00:05:01data set now tell me ashutosh why do we
- 00:05:04have so many different data sets which
- 00:05:06we need to download
- 00:05:08so the thing was as you can see like
- 00:05:11there is a lot of
- 00:05:12count of rows which is present here
- 00:05:15so i'll just see
- 00:05:18so if anybody of you can please let me
- 00:05:20know if my odd i'm audible or not
- 00:05:31okay i'll just check this
- 00:05:35i'll copy the video link
- 00:05:41i'll paste this here
- 00:05:49i'll paste this here
- 00:05:51yes i am audible
- 00:05:54great
- 00:05:58place this year
- 00:06:00yes i am audible
- 00:06:01[Music]
- 00:06:04great
- 00:06:07oh i think i'm not able to see the chat
- 00:06:10here so
- 00:06:12no worries so can you tell me how much
- 00:06:15sql is enough for interviews so
- 00:06:18the number of content which i have
- 00:06:20covered till now into my uh zero to one
- 00:06:24advanced sequel playlist and also on to
- 00:06:26the advanced sequel playlist so both of
- 00:06:29these playlists they are kind of enough
- 00:06:32for cracking the interviews for any
- 00:06:35analytics purpose so you could check out
- 00:06:38both of these
- 00:06:39playlists and all of these questions
- 00:06:42which i take into account into all of
- 00:06:44different playlists they are
- 00:06:47kind of you can say
- 00:06:49pretty much whatever questions they have
- 00:06:50been asked into the interviews
- 00:06:53okay link to the data set you can find
- 00:06:56into the community tab so you can go to
- 00:06:59community tab of my channel and you can
- 00:07:01download it from there
- 00:07:03i have pasted all the links of the data
- 00:07:06sets
- 00:07:07which i am using okay let me just
- 00:07:10provide once more so i'll just open my
- 00:07:12github
- 00:07:16so
- 00:07:17i would give this link here
- 00:07:21you can just
- 00:07:22i would tell you what all want to
- 00:07:24download and
- 00:07:26you can download it from here
- 00:07:29please open this particular link and the
- 00:07:31name of the files
- 00:07:34or else wait i'll just give it
- 00:07:42one by one so there are
- 00:07:44six files which we need to download to
- 00:07:46understand the problem statement
- 00:07:48basically
- 00:07:49so i'm just providing it here
- 00:07:54i'll just provide into the
- 00:07:57chart
- 00:08:14please download the datasets from here
- 00:08:18let me know if you are facing any issues
- 00:08:20while you're downloading these datasets
- 00:08:34and i'll be providing the links of all
- 00:08:36of these data sets into the description
- 00:08:39box of this video also so
- 00:08:41no need to worry
- 00:08:43about that
- 00:08:48yeah i think it's
- 00:08:51fine
- 00:09:00okay so
- 00:09:02i would go back to the video
- 00:09:04do you have any questions
- 00:09:10i hope the data
- 00:09:12side is quite clear now
- 00:09:14yeah hello turning point
- 00:09:20so i've just explained the data sets
- 00:09:22i would uh now explain like why i
- 00:09:24provided so many data sets so
- 00:09:26why data analysts are not getting job
- 00:09:30uh can you please elaborate on this
- 00:09:35i don't think so they are not getting
- 00:09:37job
- 00:09:40i think practice is the key so we need
- 00:09:42to practice a lot on to hacker rank
- 00:09:45lead code
- 00:09:49we need to practice the different
- 00:09:50questions
- 00:09:51i think we can get the job
- 00:09:59okay i'll just go back to the sheet and
- 00:10:02i'll come back to the questions after
- 00:10:04this so
- 00:10:05uh i hope the data set is quite clear
- 00:10:07now so the data set is quite simple
- 00:10:11if you would see now why i had
- 00:10:14to break or why do i need to have so
- 00:10:17many sheets so the thing was as you
- 00:10:19could see like there are close to two
- 00:10:21lakh rows which are present into this
- 00:10:24particular data set
- 00:10:25now when i was trying to import this
- 00:10:27particular data set uh into my ssms so
- 00:10:31it was throwing an error so basically it
- 00:10:32was not throwing an error so the number
- 00:10:35of rows which you can see here is 2 lakh
- 00:10:36so
- 00:10:37when i was doing select star or select
- 00:10:40count star from this particular data set
- 00:10:42i was not getting the exact count like
- 00:10:45you can see here one nine three four six
- 00:10:47nine is the total number of data entries
- 00:10:49which are present into this particular
- 00:10:50sheet i was not getting this particular
- 00:10:52enter rather the entry was quite less
- 00:10:55i think it was close to around six
- 00:10:57thousand so i thought of or i think it
- 00:11:00was close to sixty thousand so i thought
- 00:11:01of breaking this particular sheet
- 00:11:05into sixty thousand data points or sixty
- 00:11:07thousand rows and then i did break this
- 00:11:10particular sheet into four different
- 00:11:13sheets as you can see here so 16 to 3 is
- 00:11:16one lakh 80 000 and the rest of the
- 00:11:20rows i think i've just
- 00:11:22occupied into one of the sheets here so
- 00:11:24you can see here i have
- 00:11:261
- 00:11:27four hundred and seventy two rows here
- 00:11:29which is present here so this is how i
- 00:11:31have uh accommodated all of the
- 00:11:34different uh sheets here or you can see
- 00:11:37now or you can see the entire sheet here
- 00:11:40by breaking the entire data set into
- 00:11:42multiple small small
- 00:11:44data sets which could be accommodated
- 00:11:46into my sql server management studio and
- 00:11:50then uh i just imported into into my
- 00:11:52ssms so
- 00:11:54now you would just ask me like
- 00:11:56did i follow any procedure to break this
- 00:11:58particular data set it was no i i just
- 00:12:01simply took the very first 60 000 rows i
- 00:12:04copy pasted the data into a separate
- 00:12:06sheet then i moved to the next 60 000
- 00:12:08rows i just copy pasted the data into a
- 00:12:10different sheet so this is how i have
- 00:12:13done simply and then i've just made
- 00:12:15these particular different sheets so
- 00:12:18uh you can just see from your end if you
- 00:12:20are able to just import it
- 00:12:23just at one go if you're not able to do
- 00:12:25so break this data set into sheets uh
- 00:12:29two sheets or three sheet space basis on
- 00:12:31your
- 00:12:32problem which you are getting and then
- 00:12:34you can import it into the sql server
- 00:12:36management studio
- 00:12:38i have imported from before i have all
- 00:12:41the
- 00:12:42different you can see tables
- 00:12:46i'll just go to tables here
- 00:12:48so you can see here i have imported all
- 00:12:50of
- 00:12:51these different data sets ipl 1 ipl 2
- 00:12:54ipl 3 apple 4 and ipl so into this
- 00:12:57particular ipl it contains the
- 00:13:01information about the matches from 2008
- 00:13:04to 2020
- 00:13:06and into ipl one i will do apple three
- 00:13:09ipl for it basically i broke uh this
- 00:13:13particular sheet which contained uh the
- 00:13:15ball by boiling information and it
- 00:13:18contained around 2 lakh rows i've just
- 00:13:20broke down this particular sheet into
- 00:13:21multiple sheets into multiple data sets
- 00:13:24and then i have
- 00:13:26copied the data into different sheets
- 00:13:28and i've presented it here imported it
- 00:13:30here sequentially
- 00:13:32so okay let me know if you have any
- 00:13:34questions
- 00:13:36till this moment
- 00:13:39hi uh everybody whoever is joining where
- 00:13:42can we practice data cleaning
- 00:13:46okay
- 00:13:47so right now i don't have any such video
- 00:13:50onto data cleaning i can think of so
- 00:13:55i
- 00:13:55am planning to prepare our dedicated two
- 00:13:58or three videos on to data cleaning so i
- 00:14:01guess that would be enough and from
- 00:14:03there we you can practice
- 00:14:07and
- 00:14:08you can you can also practice from you
- 00:14:10can just simply download the data sets
- 00:14:12from kaggle
- 00:14:13and try to identify how i can clean this
- 00:14:17particular data set or how i can make
- 00:14:20use of this particular data set what are
- 00:14:22the problems which i could encounter
- 00:14:25when i'm doing certain data analysis
- 00:14:27tasks so
- 00:14:28how i can get rid of all of these uh
- 00:14:31kind of different data cleaning process
- 00:14:33or dirty data which i am having right
- 00:14:35now
- 00:14:36so from where did you get this data set
- 00:14:39i got this data set from google to be
- 00:14:42exact i got this data set from kaggle
- 00:14:45by the way i would just provide i've
- 00:14:47provided all the links of
- 00:14:50all these data sets here so you can just
- 00:14:51download it from here also
- 00:14:55my question was what our company really
- 00:14:58expecting from data analyst is cutthroat
- 00:15:01competition on yep that is true like
- 00:15:03competition is
- 00:15:05uh has increased a lot into
- 00:15:08the past one year or the two years so
- 00:15:12you are very much to
- 00:15:15what our company really expecting
- 00:15:19i guess the company they
- 00:15:22are expecting you to have expertise uh
- 00:15:25into the skills they are wanting so
- 00:15:28if they are asking sql python so
- 00:15:31you should be pretty much
- 00:15:33good at it
- 00:15:35and also they look into the aptitude
- 00:15:37part your analytical skills so
- 00:15:40uh
- 00:15:42they are pretty much focusing onto the
- 00:15:45aptitude part as well
- 00:15:46also your problem solving part so
- 00:15:50i think practice is the key again you
- 00:15:52should practice from
- 00:15:54a hacker ranked lead code
- 00:15:56all of these different websites so that
- 00:15:58might help
- 00:16:02we should uh move forward now so
- 00:16:09so you can see here
- 00:16:11i have
- 00:16:13so
- 00:16:15i just need to get two data sets
- 00:16:18as you can see here
- 00:16:21this is one of the data set and this is
- 00:16:23pretty much clear this is into this
- 00:16:26dbo dot ipl
- 00:16:27table but
- 00:16:29all of these four
- 00:16:31data sets you can see it contained
- 00:16:33the same data but the data has been
- 00:16:35split into multiple sheets so
- 00:16:37let us do one thing we can use the union
- 00:16:40operator
- 00:16:41to join all of these different data sets
- 00:16:44into a single data set
- 00:16:46and we can just have
- 00:16:48a single i mean a single table which
- 00:16:52could contain all of these data sets so
- 00:16:54i've just used the union operator you
- 00:16:56can see here
- 00:16:57first of all i have created a
- 00:17:00temporary table so the temporal name is
- 00:17:03ipls app so i'll just renamed it
- 00:17:05for now ipls
- 00:17:07and i've just mentioned your create
- 00:17:09table ipls and i've mentioned all the
- 00:17:11different columns which i require and
- 00:17:14also the data type of all of these
- 00:17:16different columns and then i am
- 00:17:19after this i have created a table i am
- 00:17:21inserting into this particular
- 00:17:24table the different data sets which i
- 00:17:26have split so you can see here i am
- 00:17:30telling select star from tutorial dot
- 00:17:32dbo dot ipl one
- 00:17:34so here i have just imported and i'm
- 00:17:37doing a union so first of all the ipl
- 00:17:40one so the data which was contained into
- 00:17:43the ipl one sheet it has been imported
- 00:17:47after this i'm doing a union so it will
- 00:17:49join vertically so basically we could
- 00:17:51also join uh multiple tables or data
- 00:17:54sets vertically so union is used for
- 00:17:56that case so
- 00:17:57after i have got the ipl1 dataset i am
- 00:18:00just joining it vertically with ipl 2
- 00:18:03similarly for ipl3 and ipl4
- 00:18:06now if you would see if i would do a
- 00:18:08simple select count star from ipls so
- 00:18:11this is the exact number of rows which
- 00:18:13are present into my
- 00:18:15uh
- 00:18:16the second data set so i would just
- 00:18:18count the number of rows i think it must
- 00:18:20be exact one nine three four six nine
- 00:18:24so i think it's close
- 00:18:26yeah so it is exactly the same
- 00:18:29the number of sheets or the number of
- 00:18:32rows which are presented to our second
- 00:18:34sheet
- 00:18:35so this is how we have uh solved a
- 00:18:39particular problem so why i did this
- 00:18:41because the entire data said it was not
- 00:18:44i was not able to import it completely
- 00:18:46into my ssms only 60 000 rows at once or
- 00:18:50close to around 80 000 rows was being
- 00:18:52imported so i just simply break down the
- 00:18:55entire data set which had two lakh rows
- 00:18:58into multiple sheets and each of the
- 00:19:00sheet i just kept around 60 000 rows so
- 00:19:04all of these sheets you can see i peel
- 00:19:06one apple to appeal three i built four
- 00:19:07so if i would copy this and i would just
- 00:19:12simply
- 00:19:13do a select count star for
- 00:19:15each of these sheets here
- 00:19:21you will
- 00:19:22get to see it is
- 00:19:26exactly the same ipl one
- 00:19:32till two
- 00:19:35with three
- 00:19:38pl4 so
- 00:19:40just select this and execute this
- 00:19:43invalid
- 00:19:44okay
- 00:19:57i forgot to paste this also
- 00:20:06just run this again
- 00:20:11so you can see here close to 59
- 00:20:14999 rows are present here and if i would
- 00:20:16simply
- 00:20:17sum up this
- 00:20:19we will be getting the count uh the
- 00:20:22number of rows which are present into
- 00:20:24our
- 00:20:25the master data set so i could just
- 00:20:27simply call this as a master data set
- 00:20:29and i've just broken this master data
- 00:20:31set into multiple data sets so
- 00:20:34you if you could just see from your end
- 00:20:36if you are able to get the entire data
- 00:20:38set at a single go
- 00:20:41if not you can just simply do the same
- 00:20:43process which i am doing right now
- 00:20:46and then
- 00:20:47let me
- 00:20:49see if
- 00:20:50you have any doubts
- 00:21:00oh from where do you practice domain
- 00:21:02knowledge and develop analytical skills
- 00:21:06uh
- 00:21:07so basically the
- 00:21:09analytical skills
- 00:21:11it is close to the problem solving
- 00:21:13skills so the more number of problems
- 00:21:15you are solving the better you get at
- 00:21:18your analytical skills so
- 00:21:20you'll be getting different viewpoint
- 00:21:22like okay this particular problem can be
- 00:21:24solved by this process also or this
- 00:21:27particular problem could have been
- 00:21:28solved by the second process so you'll
- 00:21:30be getting getting multiple points
- 00:21:32multiple views when you are solving a
- 00:21:34lot of different problems
- 00:21:37so that is what i think you can develop
- 00:21:39your analytical skills
- 00:21:41and regarding the domain knowledge uh i
- 00:21:44think that is not pretty much important
- 00:21:47but you are having some kind of domain
- 00:21:49knowledge like for example
- 00:21:51if we are
- 00:21:53dealing with the stock market data set
- 00:21:55so we know okay
- 00:21:57there are certain certain sharp points
- 00:21:59so we can smooth in the data set by
- 00:22:01using the moving average thing so these
- 00:22:03are kind of uh different domain
- 00:22:05knowledge
- 00:22:06is required
- 00:22:09but
- 00:22:10practicing domain knowledge
- 00:22:12i don't think it is pretty much that
- 00:22:14important
- 00:22:15important part is you are able to solve
- 00:22:17the problem into the interviews that is
- 00:22:20pretty much fine if not you need to
- 00:22:22practice a lot
- 00:22:24so you should
- 00:22:26practice a lot of questions i should say
- 00:22:29let us move to the ssms again
- 00:22:33okay so what are the different problems
- 00:22:35which we are going to solve now before
- 00:22:37that let us see
- 00:22:38once i'll do a select star from
- 00:22:44ipls and
- 00:22:46the second
- 00:22:47table is select star from
- 00:22:53ipl okay
- 00:22:56i would execute this statement
- 00:23:03okay
- 00:23:12execute so
- 00:23:14okay
- 00:23:21okay you can see here both of these data
- 00:23:23sets they have been imported finally
- 00:23:25into our ssms uh both of these excel
- 00:23:29sheets i have imported finally so
- 00:23:33one of the sheet it contains the details
- 00:23:35of the matches which have been
- 00:23:37played
- 00:23:38and the other sheet it contains the
- 00:23:40information about the ball by ball
- 00:23:42information for each of the matches
- 00:23:44which have been
- 00:23:46been played okay
- 00:23:47so let us move to the problem set so i
- 00:23:50have a lot of different problems which i
- 00:23:52could have think of or
- 00:23:54i would find onto the google so i would
- 00:23:57just simply note down here into this
- 00:23:58particular sheet so i've just prepared
- 00:24:00this particular sheet
- 00:24:01let us see how many problems which we
- 00:24:04are able to solve today
- 00:24:06and then
- 00:24:08the rest of the problems we can solve on
- 00:24:10a different day so
- 00:24:12you could just simply take a screenshot
- 00:24:14of this
- 00:24:15for now
- 00:24:16you can practice from your end also
- 00:24:19and then or you if you want to follow
- 00:24:22along this video with me you can do that
- 00:24:24also right now
- 00:24:26all the data sets which i have provided
- 00:24:29into the
- 00:24:30chat box so you can just download the
- 00:24:32data set from there
- 00:24:34let us start
- 00:24:35so the very first
- 00:24:38question you can see here which we need
- 00:24:40to find is the number of matches which
- 00:24:43have been played per season so we need
- 00:24:46to find this
- 00:24:48now how to go about this like
- 00:24:51we need to count the number of matches
- 00:24:53which have been played
- 00:24:55into per season
- 00:24:56now
- 00:24:57uh i have not solved all of these
- 00:25:00questions i've just prepared the data
- 00:25:01set i've just prepared
- 00:25:05the
- 00:25:06problems from before i haven't solved it
- 00:25:09yet
- 00:25:10i wonder if i have if we have the season
- 00:25:14information or not
- 00:25:17um okay
- 00:25:21okay we have the dates
- 00:25:23and
- 00:25:24in a single a single season would be
- 00:25:27played in a single day
- 00:25:29year so we could just simply count the
- 00:25:32number of times a year is coming so for
- 00:25:34example 2008 so i would just simply grab
- 00:25:38the year from all of these
- 00:25:40dates and then i would simply count the
- 00:25:44number of times
- 00:25:46this particular year is coming
- 00:25:48and that would simply give the number of
- 00:25:50matches which have been played
- 00:25:52into this season
- 00:25:55okay let us simply use this particular
- 00:25:58logic to get the total number of matches
- 00:26:00which have been played into the
- 00:26:02each of the season
- 00:26:04because i could not find a column where
- 00:26:07the season information is present here
- 00:26:10it is just the date on which the match
- 00:26:13was played so i'm just using this
- 00:26:15particular logic that
- 00:26:17uh the entire matches in a particular
- 00:26:19season would be played into
- 00:26:22the same year so i would just simply
- 00:26:25grab the year information from all of
- 00:26:27these dates
- 00:26:28and i would simply count the number of
- 00:26:30distinct let's say the matches id which
- 00:26:33is coming
- 00:26:34and that would simply give the number of
- 00:26:36matches which have been played into the
- 00:26:38each of the season now
- 00:26:40uh i don't know how much accurate this
- 00:26:42particular data set is
- 00:26:44i haven't verified from my end but i
- 00:26:47think i've downloaded it from kaggle so
- 00:26:48it must be pretty much
- 00:26:51very much correct data
- 00:26:52at least 98 of the data might be pretty
- 00:26:55much correct so let us start writing the
- 00:26:58query for this
- 00:27:00you could also write from your end and
- 00:27:03you can paste
- 00:27:04okay
- 00:27:07sorry
- 00:27:08is enough for getting job
- 00:27:10yes
- 00:27:11it is pretty much enough for getting job
- 00:27:14uh the advanced sequel pianist i would
- 00:27:16request uh you know it so please
- 00:27:21go through all the videos which i have
- 00:27:23been posting around so
- 00:27:25you should
- 00:27:27practice a lot of different questions on
- 00:27:29to lead code hacker rank hacker earth
- 00:27:32to strengthen your concepts
- 00:27:34and that is pretty much enough for the
- 00:27:36concepts which i'm covering so i'll be
- 00:27:38covering a lot of different concepts uh
- 00:27:40from now like afterwards also but
- 00:27:43whatever is present into my playlist
- 00:27:46right now it is pretty much enough what
- 00:27:48do you think we must select domain at
- 00:27:50the start of the career just jump into
- 00:27:52any domain and then
- 00:27:58i think you should pretty much jump just
- 00:28:01right now just learn these skills
- 00:28:04and then you can just think of
- 00:28:06changing domains if you're not liking
- 00:28:09and some
- 00:28:10projects to practice and build my
- 00:28:12portfolio
- 00:28:13so i have prepared a portfolio project
- 00:28:16onto the indian census data so it is uh
- 00:28:20i've broken down the video or into two
- 00:28:23parts part one and part two so you can
- 00:28:25watch that
- 00:28:26and pretty much have covered a lot of
- 00:28:28different concepts so you
- 00:28:30can get to learn a lot of different
- 00:28:32concepts
- 00:28:33different concepts onto the sequel the
- 00:28:35advanced window function the rank
- 00:28:37function so i've tried to covered a lot
- 00:28:39of different concepts from the indian
- 00:28:41sensors data so you can watch that video
- 00:28:44uh i don't know
- 00:28:47i think you will get to learn a lot from
- 00:28:49that particular video so i also have
- 00:28:51prepared a video on the shark tank india
- 00:28:53data set you can watch that video also
- 00:28:58okay i think
- 00:29:00we could just move
- 00:29:02to our screen ssms so the thing which we
- 00:29:06need is the total number of matches
- 00:29:07which have been played into each of the
- 00:29:09season and the logic which i have
- 00:29:11created here is i would just simply
- 00:29:14get like all the matches would be played
- 00:29:16into a single year like a single season
- 00:29:18would take place into a year so from
- 00:29:21this particular date column you can see
- 00:29:22i would just simply grab the
- 00:29:24the year and i would just simply count
- 00:29:27the distinct match ids which are coming
- 00:29:30so that would serve my purpose so i
- 00:29:31would just write here select
- 00:29:36and here from
- 00:29:40date
- 00:29:44and id
- 00:29:45from
- 00:29:49tutorial.dbo.ipl so i will just copy
- 00:29:50paste here
- 00:29:57so you can see i'm getting the year as
- 00:29:59well as i'm getting the
- 00:30:01id of the matches so i would simply
- 00:30:02count the number of
- 00:30:06now i think uh number of
- 00:30:09different ids which are coming up for
- 00:30:12each of the year so
- 00:30:14i just put this into a sub query so this
- 00:30:16was pretty much simple
- 00:30:18i just tried to gathered a lot of
- 00:30:20different metrics which i could solve
- 00:30:22into this live session onto this
- 00:30:24particular data set
- 00:30:26even if the title you can see this is a
- 00:30:28simpler data exploration project so
- 00:30:32just count select here comma
- 00:30:36of
- 00:30:37distinct so
- 00:30:40discounts number of distinct id which is
- 00:30:42coming here
- 00:30:45from
- 00:30:46this
- 00:30:49and at the last we need to
- 00:30:52group by on to the basis of the year
- 00:30:57that's it
- 00:30:58let us execute the code
- 00:31:00so friends you can see here for each of
- 00:31:03the year i am just getting the number of
- 00:31:05matches which have been played so into
- 00:31:072010 60 matches were played 2011 73
- 00:31:11matches have been played so you can just
- 00:31:14see the uh the logic which i have
- 00:31:16applied here to solve this particular
- 00:31:19problem so number of matches
- 00:31:22i would just rename this column
- 00:31:26just execute the code again
- 00:31:29so yeah this is pretty much
- 00:31:32the thing which we had
- 00:31:35been asked into the problem so i just
- 00:31:38needed to get the number of matches
- 00:31:40which have been played into each of the
- 00:31:42season
- 00:31:44so this is the result which we are
- 00:31:46getting so in the year 2008
- 00:31:49uh 58 matches were played 2957 matches
- 00:31:52were played so this is the information
- 00:31:55which we are getting let us move to the
- 00:31:57second problem
- 00:32:01this second problem says most player of
- 00:32:04match
- 00:32:05so these are all very much simple so
- 00:32:07here also
- 00:32:09which is the data set which we require
- 00:32:13number of with the most player of
- 00:32:15matches
- 00:32:16latest
- 00:32:17most player of match we want okay
- 00:32:21so
- 00:32:22batsman uh
- 00:32:26all by matches here also we'll be using
- 00:32:29this sheet
- 00:32:33and which column is it
- 00:32:36player of match you can see into the
- 00:32:37column d i have player of the match so
- 00:32:40we could just simply count the number of
- 00:32:42times the player is coming into this
- 00:32:44particular column because for each row a
- 00:32:46single match information is present here
- 00:32:49no duplicate information is present here
- 00:32:52so we could just simply count the number
- 00:32:54of times this particular player is
- 00:32:56coming into this particular row and the
- 00:32:59player who is coming the most number of
- 00:33:01times that simply means that
- 00:33:03the highest
- 00:33:06man of the match has been won by that
- 00:33:08player so i would just simply run this
- 00:33:11code again once
- 00:33:14and we'll be executing onto the column d
- 00:33:16so i will just write here select
- 00:33:18layer
- 00:33:20of
- 00:33:22match comma count of
- 00:33:26the same column
- 00:33:29layer
- 00:33:30of
- 00:33:32match
- 00:33:34dom
- 00:33:38i'll just copy this
- 00:33:40paste this
- 00:33:42and at the last i'll
- 00:33:44do the same thing i'll group by on to
- 00:33:46the basis of the
- 00:33:48layer of match
- 00:33:50so i hope you're getting this point
- 00:33:55i would simply execute this statement
- 00:33:58so you can see here for each of the
- 00:34:00player i'm getting the number of times
- 00:34:02they have got the player of match so
- 00:34:04i would now do this
- 00:34:08rename this column so this would be
- 00:34:11uh just give any name number
- 00:34:16man of match or just right here man of
- 00:34:18match and at the last i need to get
- 00:34:24the
- 00:34:25player who has scored the most
- 00:34:27or who have won the most player of the
- 00:34:29match so i would order by on the basis
- 00:34:31of this particular column into the
- 00:34:33descending order let us see if this
- 00:34:35produces the result or not
- 00:34:38so you can see here a b is he has won
- 00:34:40the man of the match award 23 times the
- 00:34:44highest number of times and then
- 00:34:46the rest of the player you can see here
- 00:34:49the information is present here so this
- 00:34:51is how we can solve this particular
- 00:34:53problem like getting the player who has
- 00:34:56won the most player of the match in
- 00:34:59just
- 00:35:02hey thanks an image
- 00:35:06okay you're proud okay that's
- 00:35:09so much nice of you
- 00:35:14oh hey
- 00:35:19let us
- 00:35:21go back to the
- 00:35:23ssms
- 00:35:27the third problem which we need to do
- 00:35:30here is
- 00:35:31let us move to the third problem so the
- 00:35:33third problem says most player of the
- 00:35:35match per season
- 00:35:38so
- 00:35:38into the season one which player has got
- 00:35:43the highest player of the match into
- 00:35:45this season to which player has got the
- 00:35:47highest player of the match we want to
- 00:35:50get this particular information let's
- 00:35:51see how to do this
- 00:35:53so
- 00:35:54um okay
- 00:35:58so
- 00:35:59now this time i'll be uh using
- 00:36:02another column to group by
- 00:36:05and i would simply get the
- 00:36:10number of player of match won by each of
- 00:36:13the player in each year
- 00:36:15so
- 00:36:16i'll just do this here year
- 00:36:19from date i would extract the year from
- 00:36:21date
- 00:36:22and i will present this
- 00:36:24into the
- 00:36:29this particular column itself i'll
- 00:36:31execute the code
- 00:36:34because it's not been painted
- 00:36:40okay
- 00:36:41my bad so i'll just
- 00:36:44ctrl x
- 00:36:59i'll execute the code again so you can
- 00:37:01see here for example chris gayle into
- 00:37:04the year 2011
- 00:37:05he has won the player of the match six
- 00:37:09times
- 00:37:10similarly chris gayle into the year 2012
- 00:37:13he has won the player of the match five
- 00:37:15times so this is the information which
- 00:37:17we wanted now for each of the season i
- 00:37:20want to know like which player has won
- 00:37:22the highest player of the match so uh
- 00:37:25from here you must have understood what
- 00:37:26is the thing which you need to do here
- 00:37:28so we'll be applying the rank function
- 00:37:29here to do
- 00:37:32provide the ranking so i'll do your
- 00:37:33select
- 00:37:35and i'll be partitioning on to the basis
- 00:37:37of the year column so
- 00:37:39i would rename this column as here
- 00:37:43so do here
- 00:37:46select
- 00:37:50layer of match
- 00:37:54comma
- 00:37:57year comma count
- 00:38:00okay this is man of match
- 00:38:03and i'll be using the rank functions
- 00:38:05rank over
- 00:38:06i'll be partitioning my data set or the
- 00:38:09result which i've got
- 00:38:11onto the basis of the year column and
- 00:38:15for each year
- 00:38:17i'll be
- 00:38:22ranking on the basis of the man of the
- 00:38:24match column into the descending order
- 00:38:26so that the player who has won the
- 00:38:28highest number of man of the match into
- 00:38:30a particular year would get the rank as
- 00:38:32one so i'll be giving this
- 00:38:35and this is the rank column from
- 00:38:42let us execute the code
- 00:38:53we rank or partition by year ordered by
- 00:38:56man of the match
- 00:38:58the order clause clauses invalid in
- 00:39:00views
- 00:39:06there's something wrong we have done
- 00:39:08here i'll just execute this particular
- 00:39:10code
- 00:39:16so player of match and into the year the
- 00:39:20number of man of the match they've got
- 00:39:29partition by year and ordered by
- 00:39:34man of the match
- 00:39:38i think it's pretty much correct
- 00:39:41why is it showing error
- 00:39:43use
- 00:39:44[Music]
- 00:39:55okay uh
- 00:39:58rank our partition by okay let me just
- 00:40:01simply remove this
- 00:40:03let's see if we are getting any output
- 00:40:05here or not
- 00:40:14okay
- 00:40:16so i think the pretty much the concept
- 00:40:18which we are applying here is correct
- 00:40:21but
- 00:40:21there might be certain syntax error
- 00:40:23which we are getting right now
- 00:40:25we need to correct that
- 00:40:30okay
- 00:40:32i'll do a control x
- 00:40:43let us
- 00:40:47do it again so
- 00:40:48just provide this into a sub query
- 00:40:56right here select
- 00:40:58player
- 00:41:01of match
- 00:41:09comma year of
- 00:41:13man of match
- 00:41:18our parties
- 00:41:24so i'll be partitioning my data set onto
- 00:41:26the basis of the year
- 00:41:29and
- 00:41:33order by onto the basis of
- 00:41:51okay let me try
- 00:41:52this
- 00:41:54for the final time
- 00:41:55or
- 00:41:56unless we'll be moving to the next
- 00:41:58question
- 00:41:59just throwing error order by clause is
- 00:42:01invalid in views inline function derived
- 00:42:04tables
- 00:42:06okay oh
- 00:42:11i think we could
- 00:42:15oh
- 00:42:17we cubed use a temporary table to do
- 00:42:20this because
- 00:42:23we cannot
- 00:42:25use the order by clause
- 00:42:28as it is throwing an error like the
- 00:42:29order by clause is invalid in views
- 00:42:31inline functions
- 00:42:33so
- 00:42:33[Music]
- 00:42:39i think i would remove this
- 00:42:42because
- 00:42:43of the
- 00:42:45and it works pretty much fine it was all
- 00:42:48because into the sub query you cannot
- 00:42:50use the order by clause and that is what
- 00:42:53it was throwing the error so
- 00:42:55if so i just read the error which i was
- 00:42:58getting here into a lot of details and i
- 00:43:02could get the answer so you can see here
- 00:43:04order by clause is invalid into the sub
- 00:43:06query so into the sub query you can see
- 00:43:08i was using the order by clause and this
- 00:43:10is pretty much invalid to you so i
- 00:43:13removed this
- 00:43:15and i got the rank now after i did get
- 00:43:19the rank
- 00:43:22okay one more error
- 00:43:30so i'll just execute the statement so
- 00:43:32you can see here i'm getting the rank
- 00:43:33here
- 00:43:34now i'm only connect so you can if you
- 00:43:36would see the result closely
- 00:43:39the rank is being provided on to the
- 00:43:42basic number of
- 00:43:44highest number of man of the matches
- 00:43:45which have been won now we are only
- 00:43:47concerned with the highest man of the
- 00:43:49match
- 00:43:51award winner so
- 00:43:53i would simply filter out the data
- 00:43:55select star from this
- 00:43:58and i would simply filter out the data
- 00:44:00where
- 00:44:03rank is equal to 1
- 00:44:06and that would pretty much give the data
- 00:44:09set of all the
- 00:44:11players who have won the highest number
- 00:44:13of man of the match award in each of the
- 00:44:15year so you can see
- 00:44:18i'm getting the result here
- 00:44:21shawn marsh into the year 2008 won the
- 00:44:24highest number of man of the match award
- 00:44:25that was the five chris gayle similarly
- 00:44:28for all the players you can see i'm
- 00:44:29getting the result
- 00:44:31let me see
- 00:44:32for
- 00:44:34yeah this can be considered as a
- 00:44:36portfolio project so
- 00:44:38this is simple data exploration which
- 00:44:40i'm doing right now
- 00:44:42and you can use this particular result
- 00:44:45whichever i'm getting to build simply
- 00:44:48beautiful dashboards and include into
- 00:44:50your portfolio
- 00:44:52so you can do it definitely
- 00:44:56moving on to the third question
- 00:45:02moving on to the
- 00:45:04four fourth question so most wins by any
- 00:45:08team
- 00:45:09so
- 00:45:10here we need to solve or find the
- 00:45:14team which has won the highest number of
- 00:45:17matches till date
- 00:45:20so which of the sheet will be using it
- 00:45:23here
- 00:45:26so here you can see here i have a winner
- 00:45:27column and simply i'll be using the same
- 00:45:29concept i'll be grouping by on to the
- 00:45:31basis of the winner column
- 00:45:34and that would do the thing
- 00:45:38so i would just copy this particular
- 00:45:42table name
- 00:45:44for we
- 00:45:52select star from
- 00:45:58and i'll be using this particular winner
- 00:45:59column as you can see here winner
- 00:46:02so i would just simply count winner
- 00:46:05comma count of
- 00:46:07winner
- 00:46:11and at the last i'll be grouping by on
- 00:46:15to the basis of the winner
- 00:46:17that's it let us
- 00:46:21execute the command
- 00:46:23so you can see here i am getting
- 00:46:25the number of times
- 00:46:28number of matches which have been worn
- 00:46:30by each of the team i am getting the
- 00:46:32information here
- 00:46:34these are kind of the dirty data so you
- 00:46:35can see here na is present four times so
- 00:46:38these are certain data's which we need
- 00:46:41to study beforehand before we are doing
- 00:46:43the analysis and we should definitely
- 00:46:46remove all of these dirty data because
- 00:46:48na does not any is no team only so
- 00:46:52definitely that is a dirty data or it
- 00:46:54might be a certain data point
- 00:46:57which is kind of exception so might be
- 00:46:59the result was not decided for four
- 00:47:01matches
- 00:47:02so that is why we are getting any and we
- 00:47:04are reading four so
- 00:47:06we these are kind of the data cleaning
- 00:47:08points which you need to take into
- 00:47:10account while we're doing the data
- 00:47:11analysis but for now
- 00:47:13this is a simple data exploration
- 00:47:14project with the help of this simple sql
- 00:47:16query so
- 00:47:17that is the motive of this particular
- 00:47:19live session
- 00:47:20let us move to the
- 00:47:21uh next question so top five venues
- 00:47:25where match is played
- 00:47:27okay so top five venues
- 00:47:31here you can see when you is present
- 00:47:34here so the same concept i will be using
- 00:47:37i would
- 00:47:39be
- 00:47:39removing this winner i'll be
- 00:47:42pasting this venue column
- 00:47:48when new
- 00:47:50and execute
- 00:47:52so you can see here for each of the
- 00:47:54stadiums i am getting the number of
- 00:47:57uh matches which have been played into
- 00:47:59each of the venue i am getting the
- 00:48:00information
- 00:48:02so friends uh this is pretty much clear
- 00:48:05but here they wanted top five venues so
- 00:48:08i into the ssms limit function
- 00:48:11does not support so let me see if using
- 00:48:14the top
- 00:48:16this particular thing we are getting the
- 00:48:18result or not okay we are getting the
- 00:48:21result but yeah
- 00:48:24group by
- 00:48:25and order y
- 00:48:27we need to do
- 00:48:29because you can see i'm not getting the
- 00:48:32top five result or the top five venues
- 00:48:34were matched highest number of matches
- 00:48:36were played
- 00:48:38here we need to do into the descending
- 00:48:41order here they're pointing into the
- 00:48:43ascending order so friends into the eden
- 00:48:46gardens you can see 77 matches highest
- 00:48:49number of matches are being played into
- 00:48:51the history of the ipl
- 00:48:53so this is the data set just till 2020
- 00:48:55so uh till 2020 data is present here so
- 00:48:59you can see here indian gardens
- 00:49:00photoshop kotlin all these different
- 00:49:02stadiums the highest number of matches
- 00:49:04they have been played till now
- 00:49:06let us move to the next data point here
- 00:49:10most runs by any batsman
- 00:49:12this is also i think most run so this
- 00:49:16particular sheet it just contains the
- 00:49:18information of the you could say
- 00:49:21uh the matches but for getting the runs
- 00:49:23and
- 00:49:24the batsman or the boiler information we
- 00:49:26need to take into account the second
- 00:49:28sheet the sheet which contains ball by
- 00:49:30ball information so from this particular
- 00:49:32sheet we'll be
- 00:49:34getting the data of the most number of
- 00:49:36fronts
- 00:49:37which has been scored by
- 00:49:39the batsman so let me just repeat
- 00:49:42or look at the question once again so
- 00:49:44most runs
- 00:49:45by any batsman i need to see this
- 00:49:49most runs by any batsman so
- 00:49:52oh
- 00:49:53batsman and onto the total runs column
- 00:49:56i'll be doing a group by
- 00:49:58now so
- 00:49:59i'll do a select
- 00:50:01or rather
- 00:50:03oh first of all let's see select star
- 00:50:05from
- 00:50:07hashtag ipls
- 00:50:10okay great uh from here i'll do
- 00:50:14select bat
- 00:50:17men comma count of
- 00:50:22total
- 00:50:25runs
- 00:50:31from uh
- 00:50:32i'll just provide this hashtag ibls
- 00:50:38and group by
- 00:50:41that's when i'll do
- 00:50:45i'll execute the code
- 00:50:48so you can see here i'm getting the
- 00:50:49details of the total number of runs
- 00:50:51which have been scored by each of the
- 00:50:53batsman what is the result which we
- 00:50:55wanted so we wanted most runs by any
- 00:50:58batsman
- 00:50:59so we need to
- 00:51:01do the same thing which we have done
- 00:51:03into the previous problem so i'll do a
- 00:51:06simple order by
- 00:51:08count of
- 00:51:14total
- 00:51:17runs into the descending order
- 00:51:20and here i'll just mention top
- 00:51:23one
- 00:51:25i'll execute the code
- 00:51:27and you can see vr kohli is the highest
- 00:51:30run scorer into the the history of the
- 00:51:32ipl
- 00:51:33till 2020 we have the data set so
- 00:51:36this is the answer
- 00:51:38let's move to the next problem
- 00:51:41okay let me see if any more questions
- 00:51:43are there
- 00:51:45okay
- 00:51:47so percentage of total runs
- 00:51:50scored by each batsman so
- 00:51:54what does this problem it is saying so
- 00:51:56we need to get the percentage of the
- 00:51:58total runs scored by each of the
- 00:52:00batteries so let's say
- 00:52:02if there are five players
- 00:52:05let's say the player ids are one two
- 00:52:08three four and five
- 00:52:10and the total number of runs which have
- 00:52:12been scored into these three of the ipl
- 00:52:14is hundred
- 00:52:16so
- 00:52:17these hundred runs definitely would have
- 00:52:18been scored by these five players only
- 00:52:20let's see if there are five players into
- 00:52:22the ipl so i would just redistribute
- 00:52:25this 100 runs here 20 30
- 00:52:28ready 20 okay 10 and then this would be
- 00:52:3125
- 00:52:3225 so i guess from this particular
- 00:52:35problem which they mentioned a
- 00:52:37percentage of total runs scored by each
- 00:52:39batsman it simply means they want a
- 00:52:41percentage value of the runs scored by
- 00:52:43the batsman divided by the total runs
- 00:52:46which have been scored into the history
- 00:52:47of the ipl so they basically want this
- 00:52:50particular kind of number
- 00:52:53and into the percentage terms
- 00:52:56so this is the kind of
- 00:52:59value they want
- 00:53:01from this particular problem let us see
- 00:53:03how to solve this
- 00:53:06so basically we would require to get the
- 00:53:08total runs which have been scored
- 00:53:11into the
- 00:53:13ipl and then we
- 00:53:15need to divide this number by the runs
- 00:53:17which have been scored by each of the
- 00:53:18batsmen
- 00:53:19to get the percentage value
- 00:53:23let us see how to do this
- 00:53:27so we had had a column of the total
- 00:53:31number of runs scored by
- 00:53:33each of the batsmen oh okay
- 00:53:37i would just remove this top one batsman
- 00:53:39count of runs
- 00:53:41group by batsman and
- 00:53:43pretty much fine
- 00:53:46i'll just execute this statement
- 00:53:49so you can see here
- 00:53:52this is the information of the batsman
- 00:53:54and the runs which have been scored by
- 00:53:56the batsman
- 00:53:59okay so
- 00:54:00into
- 00:54:02another column we need to somehow get
- 00:54:04the total number of runs which have been
- 00:54:05scored till date in ipl
- 00:54:09there are many ways to do this but a
- 00:54:11simple way would be using the window
- 00:54:12function so i would do a sum over runs
- 00:54:17and over
- 00:54:24i would do this order by
- 00:54:28runs
- 00:54:29rows
- 00:54:31between
- 00:54:34unbounded
- 00:54:37reseeding and
- 00:54:39unbounded following now why i'm doing
- 00:54:41this
- 00:54:42you could check my
- 00:54:44running some video to understand this
- 00:54:46particular window function into lot of
- 00:54:48more detail
- 00:54:50i hope this would work so somehow runs
- 00:54:52over order by runs rosebud unbounded
- 00:54:55preceding and unbounded following
- 00:55:01um okay let me simply run this code
- 00:55:12count of total runs
- 00:55:15hold v
- 00:55:17we
- 00:55:23just execute the code here
- 00:55:32oh it is saying hashtag total runs is
- 00:55:36invalid into the
- 00:55:38a
- 00:55:40i would need to remove this particular
- 00:55:43column from here it cannot be used here
- 00:55:45obviously
- 00:55:47it should be made into a sub query and
- 00:55:50then i can use this
- 00:55:52so i'll just put this into a sub query
- 00:55:58i'll do a select
- 00:55:59star comma
- 00:56:03paste this here
- 00:56:30let us execute the code
- 00:56:32so yes we are getting the count of the
- 00:56:34total run so
- 00:56:36is this the total number of runs which
- 00:56:38have been scored
- 00:56:39into the history of ipl
- 00:56:41let us verify this
- 00:56:45so i would just paste this code
- 00:56:48execute the code
- 00:56:50we must know like the answer which we
- 00:56:52are getting is correct or not like the
- 00:56:54total number of runs into the history of
- 00:56:56ipl which have been scored is
- 00:56:58which is coming right now it is correct
- 00:57:00on or not so we should definitely check
- 00:57:03this so i'll do a select
- 00:57:06sum over
- 00:57:10total runs
- 00:57:13from
- 00:57:15let's see if we are getting the correct
- 00:57:17answer yes
- 00:57:19the answer is pretty much correct
- 00:57:20whichever we are getting one nine three
- 00:57:22four six seven
- 00:57:30okay
- 00:57:38okay
- 00:57:39this should be sum of total runs
- 00:57:44i was just doing wrong
- 00:57:47from before i'm so sorry for that so
- 00:57:52now this is pretty much correct i'll
- 00:57:54just execute the code
- 00:57:58so
- 00:58:00i've done wrong in from the previous
- 00:58:02into the previous problem also so that
- 00:58:04should be count
- 00:58:06that should not be counted should be sum
- 00:58:08of total runs
- 00:58:11this is the correct answer
- 00:58:13what we are getting right now
- 00:58:16okay so
- 00:58:20batsman and total runs we have got
- 00:58:23and and now simply it is very much
- 00:58:25simple we
- 00:58:28just need to
- 00:58:30divide the total runs
- 00:58:32column with the runs
- 00:58:34which are getting so i would just total
- 00:58:38runs
- 00:58:44would do this
- 00:58:49let us execute the code
- 00:58:56so you can see here i am getting
- 00:58:59into decimal points
- 00:59:14and you can shorten this number which
- 00:59:16you are getting to certain number of
- 00:59:18decimal points obviously
- 00:59:20but the pretty much the concept uh
- 00:59:23remains the same so
- 00:59:25this is how we can solve or get the
- 00:59:27percentage runs which have been scored
- 00:59:29by each of the player next one is most
- 00:59:33success by any batsman so if you would
- 00:59:35look closely into this particular data
- 00:59:37set
- 00:59:38uh
- 00:59:39we have a column here batsman runs so
- 00:59:42the number of runs which was scored into
- 00:59:44this particular ball by this particular
- 00:59:45batsman it is mentioned here so wherever
- 00:59:47six is coming that simply means that a
- 00:59:49six was scored into that particular ball
- 00:59:52so we just need to count the number of
- 00:59:54times and this particular
- 00:59:56uh
- 00:59:57six is coming corresponding to a batsman
- 00:59:59and that would
- 01:00:01give us the total number of sixes which
- 01:00:04uh
- 01:00:05or the most success hits by hit by any
- 01:00:08batsman so this is how we need to do
- 01:00:11this so i would simply
- 01:00:14okay before that let me check the chat
- 01:00:20uh so hi
- 01:00:21please attach the link of the data sets
- 01:00:23and the questions which
- 01:00:25you solved so
- 01:00:27i have attached the link into the chat
- 01:00:30box here you can see here if you scroll
- 01:00:32up into the chat box so i've attached
- 01:00:34the link also you know i would just
- 01:00:37mention all the links into my into the
- 01:00:40description box of this video
- 01:00:42after ending this particular live stream
- 01:00:44so i'll just mention all the links of
- 01:00:45the
- 01:00:46uh data sets which i'm using currently
- 01:00:50and also the questions so these are the
- 01:00:52questions which i'll be solving so i'll
- 01:00:54be taking multiple live streams so
- 01:00:57really solving all of these different
- 01:00:59problems with the live stream only so
- 01:01:01you can just take a screenshot of this
- 01:01:03particular screen and then
- 01:01:05solve it beforehand or
- 01:01:08how you would like so you could just do
- 01:01:11it
- 01:01:11as per your comfort level so
- 01:01:14the
- 01:01:15question which we are solving right now
- 01:01:17is the most successful batsman so i've
- 01:01:19explained the concept which i'll be
- 01:01:21using here so
- 01:01:23let us run the
- 01:01:26statement so i'll just write a select
- 01:01:28star from hashtag
- 01:01:32pls
- 01:01:34let's see the output which is coming out
- 01:01:39and we need to count the
- 01:01:41number of times
- 01:01:43sixes so
- 01:01:45each row contains information for each
- 01:01:48ball
- 01:01:49which was played so whenever you are
- 01:01:52seeing the batsman runs is equal to six
- 01:01:55that means a six was code
- 01:01:57so first of all let us just filter out
- 01:01:59the data for
- 01:02:01the balls on which the six was good so i
- 01:02:03just write a select star from
- 01:02:07ipls where
- 01:02:11batsman
- 01:02:15runs is equal to 6
- 01:02:18so
- 01:02:18let us execute the code here
- 01:02:21so you can see i'm just getting
- 01:02:22information of the balls where a 6 was
- 01:02:25called now it's pretty much simple from
- 01:02:26here so
- 01:02:27i would present this into a sub query
- 01:02:30and after this i would write a select
- 01:02:34bats
- 01:02:35men comma count of
- 01:02:39let's say batsman only combats
- 01:02:43men
- 01:02:44from
- 01:02:46this and at the last i'll be grouping by
- 01:02:50on to the basis of
- 01:02:52the batsman so how many times this
- 01:02:54batsman is coming into the batsman
- 01:02:56column so those many number of sixes
- 01:02:58must have been scored by the batsman so
- 01:03:00you can see here uh matthew 8 and 44
- 01:03:03sixes
- 01:03:05uh similarly for all the batsman which
- 01:03:07we are getting here right now so what is
- 01:03:08the problem which they wanted us to
- 01:03:10solve so they wanted us to solve the
- 01:03:12most success so after this i'll be
- 01:03:15ordering by on to the basis of
- 01:03:18the
- 01:03:19count of batsmen
- 01:03:21and this should be
- 01:03:23into the descending order and at the
- 01:03:26above i'll be using the top functions of
- 01:03:28top one batsman
- 01:03:31i'll again execute the code here
- 01:03:34so you can see chris gayle has scored
- 01:03:36349 sixes
- 01:03:38similarly the next question is very much
- 01:03:41similar
- 01:03:44most force which have been scored now
- 01:03:47you can just argue like
- 01:03:49six was okay but
- 01:03:51there might be a case where four runs
- 01:03:53have
- 01:03:54come from a particular ball and the
- 01:03:57let's say
- 01:03:58that was not a boundary that was this
- 01:04:01externs which were accommodated into the
- 01:04:04particular
- 01:04:07ball so you can see here
- 01:04:09i have a column which is the non
- 01:04:12boundary and here
- 01:04:13the runs are present here
- 01:04:16but we are not getting a pretty much
- 01:04:17deep information
- 01:04:19on to whether a 4 was like 4 which was
- 01:04:23mentioned here is a 4 which has been
- 01:04:25scored
- 01:04:26into that particular ball or not so
- 01:04:29we should take into account the batsman
- 01:04:32runs
- 01:04:34and whenever the batsman runs is equal
- 01:04:36to 4 that means that a 4 was scored from
- 01:04:39the batsman the extra runs column or the
- 01:04:41total runs columns
- 01:04:43we need not to take into account we
- 01:04:45should only be concerned for for the
- 01:04:47batsman runs columns
- 01:04:49i hope the difference is pretty much
- 01:04:50clear
- 01:04:51into all of these three columns like
- 01:04:53what is the main difference
- 01:04:55i hope that is pretty much clear so we
- 01:04:56are only concerned with the batsman run
- 01:04:59so i would take this into account
- 01:05:02and instead of
- 01:05:04six i would mention here four
- 01:05:07and similar information will be getting
- 01:05:09for the highest number of fours which
- 01:05:10have been scored by any player so you
- 01:05:12can see
- 01:05:13cigarette the one has code 591 force so
- 01:05:16this is pretty much information which we
- 01:05:18wanted
- 01:05:20moving ahead onto the next problem and
- 01:05:23it's pretty much interesting the problem
- 01:05:24says 3 000 runs
- 01:05:26club
- 01:05:28the highest strike rate so this
- 01:05:30particular problem it mainly says the
- 01:05:32players who scored more than 3 3000 runs
- 01:05:36who is having the highest strike rate
- 01:05:39among all of these players
- 01:05:42so we'll be solving this particular
- 01:05:43problem let us see how to solve this but
- 01:05:45before that
- 01:05:46let me see if the chart is there okay no
- 01:05:50so three thousand runs so first of all
- 01:05:52we are only concerned with the players
- 01:05:53who have scored
- 01:05:55three thousand runs along with that if
- 01:05:57we somehow get the number of balls which
- 01:05:59they have faced
- 01:06:01to score the three thousand runs
- 01:06:05i think our problem is pretty much clear
- 01:06:07then
- 01:06:08so first of all let us just
- 01:06:11filter out or get the
- 01:06:14total number of runs which have been
- 01:06:15scored by the players so i would just do
- 01:06:18this
- 01:06:19my pls and group by batsman
- 01:06:26instead of total runs i guess we should
- 01:06:29use
- 01:06:30the batsman runs
- 01:06:38yes
- 01:06:39after executing you can see
- 01:06:42i'm getting the
- 01:06:44batsman run so
- 01:06:47also if i could get the total number of
- 01:06:50balls which they have faced
- 01:06:52so if you would look closely into
- 01:06:55this particular sheet
- 01:06:57each of the row that contains
- 01:07:00information for a particular ball
- 01:07:02so the number of times the batsman name
- 01:07:04is coming into this particular column
- 01:07:06that is column e that means those number
- 01:07:08of balls they have face so we could just
- 01:07:10simply count the number of times the
- 01:07:12batsman name is coming into the batsman
- 01:07:13column
- 01:07:14so that is simply means that fee will be
- 01:07:17getting the total number of balls which
- 01:07:18have been faced by the batsman
- 01:07:21i would mention this into the group by
- 01:07:24clause also
- 01:07:26that okay no need why we need to mention
- 01:07:29this
- 01:07:30i just execute this statement
- 01:07:39okay count of batsman this is the total
- 01:07:42balls
- 01:07:45ctrl x ctrl v
- 01:07:50and this is the total balls
- 01:07:53faced
- 01:07:57[Music]
- 01:07:59mention this
- 01:08:01execute the command once again and
- 01:08:05you can see we are getting the result
- 01:08:07now i'll be using this into a sub query
- 01:08:10and from this particular sub query i'll
- 01:08:13be getting
- 01:08:14the strike rate so i'll just do a select
- 01:08:16bad spin so strike rate is basically
- 01:08:18total number of runs divided by the
- 01:08:20total number of balls which was faced so
- 01:08:24batsman
- 01:08:26runs divided by the total
- 01:08:28[Music]
- 01:08:32balls which have been faced and i would
- 01:08:34rename this column as the
- 01:08:36strike rate column prompt
- 01:08:41okay let us execute the command
- 01:08:46i should
- 01:08:50oh okay so you can see you have zeros we
- 01:08:53are getting zeros also
- 01:08:55i think to remove these zeros i need to
- 01:08:58multiply either the denominator or the
- 01:09:00numerator
- 01:09:02with one point oh
- 01:09:06so that we can get into the decimal
- 01:09:08numbers
- 01:09:10i'll put this into bracket and also i
- 01:09:12need to multiply this with 100
- 01:09:16i'll execute the code so you can see
- 01:09:18here zero is still coming
- 01:09:21no worries but you can see here i'm
- 01:09:23getting the strike rate for the batsman
- 01:09:26now after this i need to
- 01:09:31find the
- 01:09:32player who has scored greater than 3 000
- 01:09:36runs with the highest strike rate
- 01:09:38as you can see here 3 000 runs club with
- 01:09:41the highest strike rate we need to solve
- 01:09:43a lot of many different problems
- 01:09:46but yeah uh so player who has scored the
- 01:09:51greater than 3 000 runs
- 01:09:54okay let me get this column also
- 01:09:58we'll control c
- 01:10:02all we
- 01:10:06execute the code
- 01:10:17okay great
- 01:10:19so i'll do here select
- 01:10:29batsman
- 01:10:33runs
- 01:10:36comma strike
- 01:10:40rate from
- 01:10:45where
- 01:10:48batsman runs should be greater than
- 01:10:51equal to 3 000
- 01:10:54okay
- 01:10:58where it should be where
- 01:11:05so you can see here we are getting all
- 01:11:07those batsman information who has scored
- 01:11:10greater than 3 000 runs now who is the
- 01:11:12player who is having the highest strike
- 01:11:14rate along with 3000 runs we need to get
- 01:11:17this information
- 01:11:19so after this we need to order by order
- 01:11:23by under the basis of what order right
- 01:11:25onto the basis of the strike rate
- 01:11:29and this should be into the descending
- 01:11:31order
- 01:11:33and from this i need to get the top
- 01:11:36one batsman
- 01:11:38i hope the logic is pretty much clear
- 01:11:41so you can see abwarz is the batsman
- 01:11:44with the highest strike rate into the
- 01:11:46history of the ipl with the players who
- 01:11:48have scored greater than 3 000 runs so
- 01:11:51this is the information which we wanted
- 01:11:53and we have got here
- 01:11:55so this was a pretty much interesting
- 01:11:56problem i guess the number 12 problem
- 01:12:00was pretty much interesting to solve
- 01:12:02similarly earlier we have done for
- 01:12:05uh batsman who scored greater than 3 000
- 01:12:08runs now
- 01:12:09we are
- 01:12:12concerned with the ballers with the
- 01:12:14lowest
- 01:12:15economy rate for
- 01:12:17the bollowers world at least 50 overs
- 01:12:21so we need to solve this so
- 01:12:23this is pretty much like the strike rate
- 01:12:25economy rate this are pretty much the
- 01:12:27domain knowledge obviously like you must
- 01:12:30pretty much know what does these terms
- 01:12:32mean but
- 01:12:33you could find the definitions onto
- 01:12:35google also
- 01:12:36let's say if you don't know anything xyz
- 01:12:39of cricket so you could simply google
- 01:12:42these terms and you can find the
- 01:12:43definition the mathematical definitions
- 01:12:45of these particular terms
- 01:12:48so economic it could be simply means
- 01:12:50like the total number of runs which was
- 01:12:52considered by the bowlers by the total
- 01:12:55number of
- 01:12:56balls that has been bowled by the
- 01:12:58ballers so we could just divide both of
- 01:13:00these numbers and we'll be getting the
- 01:13:02economy rate
- 01:13:04so let us see let us see how to solve
- 01:13:06this so
- 01:13:08we'll be getting the lowest economy rate
- 01:13:10i think this
- 01:13:12would
- 01:13:13okay let's see uh lowest economy rate
- 01:13:16for the boulder was bold at least 50
- 01:13:18hours
- 01:13:22how could we know uh the information of
- 01:13:24the
- 01:13:25ballers who have bowled at least 50
- 01:13:28hours
- 01:13:29so we can know this information
- 01:13:34by counting the number of times the
- 01:13:37bowler name is coming into this
- 01:13:38particular column and we know into over
- 01:13:42uh
- 01:13:43six
- 01:13:44balls are there and
- 01:13:47we are only concerned with the bowlers
- 01:13:48who have bowled at least 50 over so 6
- 01:13:51into 50 that means 300 balls a minimum
- 01:13:54of 6 300 balls should have been bowed by
- 01:13:57the bowlers here which
- 01:13:59we are only concerned with so
- 01:14:01first of all let us get this information
- 01:14:03so i'll just
- 01:14:05select
- 01:14:06i think first of all i would do a select
- 01:14:09star from
- 01:14:11simply forget the columns which we have
- 01:14:14so i'll just do a simple select star
- 01:14:16from
- 01:14:17stag ipls
- 01:14:19now this particular
- 01:14:21hashtag ipls you
- 01:14:23you can see i've just created a
- 01:14:24temporary table to accommodate all of
- 01:14:27these data sets oh okay so you can see
- 01:14:30here i have the
- 01:14:32bowler names are just right here bowler
- 01:14:34comma count of well like the number of
- 01:14:36times the boiler name is coming here so
- 01:14:39this means that
- 01:14:41those many balls have been bowled by
- 01:14:44these bullets
- 01:14:45and at the last i'll do a select from
- 01:14:49hashtag ipls
- 01:14:52and at the last
- 01:14:54i'll be obviously grouping by
- 01:14:57on to the basis of
- 01:14:59the bowler column so that i could just
- 01:15:01get the count of the total number of
- 01:15:02bowls that have been bowled and the
- 01:15:05number of fronts which have been
- 01:15:06considered so
- 01:15:08this bowler for example sr watson the
- 01:15:11total number of fronts which was
- 01:15:13considered into the fourth ball of the
- 01:15:1414th over
- 01:15:16so that simply means that uh total run
- 01:15:18so you can see here is zero so there is
- 01:15:20nothing like the runs which was
- 01:15:22considered by the bowler so i would just
- 01:15:24simply take into account the total runs
- 01:15:27that that is pretty much simple so that
- 01:15:29i would be doing right now
- 01:15:31so i'll be using this total runs column
- 01:15:33so i would do a sum of
- 01:15:37total runs that's it
- 01:15:40let us execute the code
- 01:15:43so you can see here the baller name
- 01:15:46and this is the total
- 01:15:50balls and this is total
- 01:15:54runs conceded
- 01:16:00if i would just simply run the code
- 01:16:02again
- 01:16:04total balls bold and the total
- 01:16:07runs which have been considered so you
- 01:16:09can find you can see we can get this
- 01:16:12particular information
- 01:16:14now we're only concerned with the
- 01:16:15bowlers who are bold at least 50 hours
- 01:16:17and we just discussed right now that in
- 01:16:19over there are six balls present so 50
- 01:16:22over means that 300 balls a minimum of
- 01:16:24300 balls should be
- 01:16:25bowed by uh should have been balled by
- 01:16:28the ballers so we'll be filtering out
- 01:16:30the data for this so i'll just present
- 01:16:32this into a sub query and
- 01:16:35i'll do
- 01:16:36a simple select
- 01:16:40and i'll do here
- 01:16:42like bowler comma
- 01:16:47we need to get the economy date so
- 01:16:48[Music]
- 01:16:50i pretty much assume that the formula is
- 01:16:52the total runs
- 01:16:54which have been conceded
- 01:16:59and
- 01:17:01my guess this should be
- 01:17:04divided by the
- 01:17:06total
- 01:17:10balls that have been bowled and you
- 01:17:13could pretty much convert the total
- 01:17:15number of balls which have been bowled
- 01:17:17into uh
- 01:17:19into in terms of over but i guess the
- 01:17:22answer that would remain pretty much the
- 01:17:24same
- 01:17:29economy rate
- 01:17:32from
- 01:17:35this
- 01:17:36this execute the code
- 01:17:39so you can see here i'm getting the
- 01:17:40economy date
- 01:17:42but here i need to mention here where
- 01:17:47total balls
- 01:17:51this should be greater than
- 01:17:54what so this should be greater than
- 01:17:58299 because
- 01:18:00okay 300 balls
- 01:18:02greater than 50 hours at least 50 hours
- 01:18:05should have been bald as you can see it
- 01:18:06was mentioned there
- 01:18:08so you can see here we are getting the
- 01:18:10economy rates
- 01:18:12and here i'll be ordering by so order by
- 01:18:16the
- 01:18:17what
- 01:18:18order by the economy rate
- 01:18:24definitely into the ascending order
- 01:18:25because for bowlers we want
- 01:18:28that they concede as many as less runs
- 01:18:32possible
- 01:18:33so
- 01:18:34the best baller is the boiler who has
- 01:18:35considered the less lesser and possible
- 01:18:38due to the more number of balls that the
- 01:18:41person has boiled or the bowler has bolt
- 01:18:43so this particular the economy uh
- 01:18:45particular column this should be as less
- 01:18:47as possible
- 01:18:48so i would also put this here top one
- 01:18:51now because why did i tell this because
- 01:18:54uh i would not write here descending
- 01:18:57order that simply means so i'll just
- 01:18:59simply execute the code so you can see
- 01:19:00rashid khan is the economy rate is the
- 01:19:03best among the bowlers who have bowled
- 01:19:05at least 50 hours now why is the economy
- 01:19:08rate i'm getting because i've just
- 01:19:10divided this number with the total balls
- 01:19:12you could just convert the balls into
- 01:19:14the overs and you can get the particular
- 01:19:17answer but the answer it will remain the
- 01:19:19same i think i guess yeah i think it
- 01:19:22will remain exactly the same
- 01:19:24so this is how we solve this particular
- 01:19:26problem
- 01:19:30okay let me
- 01:19:33move forward to the
- 01:19:35problem ahead so total number of matches
- 01:19:37played till 2020
- 01:19:41this is pretty much straightforward we
- 01:19:43just simply need to count the distinct
- 01:19:45ids which are present into any of the
- 01:19:47sheet
- 01:19:48or not any of the sheet i would
- 01:19:51take into account select
- 01:19:53count of
- 01:19:56distinct
- 01:19:58id
- 01:20:00from
- 01:20:03and i will take into account this
- 01:20:05particular sheet
- 01:20:09tutorial
- 01:20:10dot
- 01:20:20let's execute the code
- 01:20:22so you can see here we are getting the
- 01:20:24816 matches that have been played into
- 01:20:27the
- 01:20:27till 2020.
- 01:20:30the next question let us solve so this
- 01:20:33question will be the last question of
- 01:20:35this particular live session i will be
- 01:20:37conducting a multiple live session after
- 01:20:39this
- 01:20:40uh
- 01:20:41and i'll be announcing it primarily
- 01:20:43from before
- 01:20:44total number of matches win by each team
- 01:20:48so this is also pretty much simple
- 01:20:50so which of the column will be using
- 01:20:52this i guess we have solved
- 01:20:55this particular problem okay no worries
- 01:20:58i'll be using this particular table
- 01:21:01itself and to get the information of the
- 01:21:07team who has won the
- 01:21:10total number of matches wins by each
- 01:21:12team okay so here we
- 01:21:15want to get the information of the
- 01:21:18matches which have been won by each of
- 01:21:22the team
- 01:21:24we need not to get information of the
- 01:21:25total number of matches which have been
- 01:21:27played but the matches which have been
- 01:21:28won
- 01:21:30so
- 01:21:32we'll be using this winner column itself
- 01:21:35and we'll be doing
- 01:21:37so i'll do a select
- 01:21:41winner
- 01:21:42comma count of
- 01:21:48winner
- 01:21:52group by
- 01:21:55winner
- 01:21:57so i'll execute this and you can see we
- 01:21:59are getting the information of the total
- 01:22:01number of matches which have been won by
- 01:22:04each of the team so friends this is
- 01:22:06pretty much the 15 questions which we
- 01:22:08have solved uh into the live session for
- 01:22:11this particular live sessions i'll be
- 01:22:13solving so
- 01:22:14the next problems are kind of very much
- 01:22:16interesting some of the problems are
- 01:22:18very much interesting and we'll be
- 01:22:20solving this using some of the same
- 01:22:22advanced level sql functions which we
- 01:22:25have learned so
- 01:22:26uh some of the major questions which
- 01:22:28i've been getting is what is the best
- 01:22:29way to learn sql or from where to learn
- 01:22:31sql so you could just simply watch my
- 01:22:34zero to one add one sql course and also
- 01:22:37the advanced sequel playlist and also
- 01:22:39the sequel interview questions which i
- 01:22:42keep posting every now and then so
- 01:22:44pretty much you can go
- 01:22:45ahead practice from all of these data
- 01:22:48sets all of these problems which i post
- 01:22:50every now and then and i think that
- 01:22:52would be pretty much enough
- 01:22:54also practice problems under the hacker
- 01:22:56rank lead code
- 01:22:58and such kind of different sessions i
- 01:23:00think it would help you a lot and you
- 01:23:02can ask of problems you can just comment
- 01:23:05down your doubts whichever you are
- 01:23:06having or you have any alternate
- 01:23:08solutions for any problems you can just
- 01:23:11mention that into the comment box that
- 01:23:13can help me as well as all the other
- 01:23:15participants who are seeing this
- 01:23:16particular live session or will be
- 01:23:18watching
- 01:23:19this video ahead so that would pretty
- 01:23:21much help all of them so we'll be
- 01:23:23solving all these different problems so
- 01:23:25there are a lot of different problems we
- 01:23:26could just simply solve 15
- 01:23:29problems into this particular live
- 01:23:31session i'll be solving try to solve
- 01:23:34i'll be trying to solve much more
- 01:23:36different problems i had into the future
- 01:23:38live sessions uh so that we just don't
- 01:23:40waste time
- 01:23:42and i could accommodate
- 01:23:44so my plan was to solve a particular
- 01:23:47column of problems but i guess
- 01:23:50we could just simply solve 15 problems
- 01:23:53not even half but no worries i'll be
- 01:23:55keeping uh this live session every now
- 01:23:57and then and then i'll be informing you
- 01:23:59guys
- 01:24:00you can just pretty much participate
- 01:24:02just flow along with me or you can take
- 01:24:04a screenshot of this particular screen
- 01:24:06you can just practice from your end also
- 01:24:07and that would pretty much work for
- 01:24:09everybody so friends this was all
- 01:24:13about this particular live session
- IPL
- data analysis
- SQL
- matches
- statistics
- player performance
- data sets
- ball by ball
- rank function
- community tab