IPL Data Exploration SQL Portfolio Project - Part 1 | Analytics | Ashutosh Kumar

01:24:16
https://www.youtube.com/watch?v=r30CuXlPV28

Summary

TLDRThe video focuses on analyzing an IPL (Indian Premier League) data set spanning matches from 2008 to 2020 using SQL. The presenter demonstrates how to solve various data analysis problems such as counting matches per season, identifying top-performing players, and understanding broader statistical trends. Challenges in data import due to size were resolved by breaking datasets into smaller parts and aggregating results using SQL functions like union, GROUP BY, and rank. The presenter answers viewer questions about improving SQL skills, practicing data analysis, and preparing for job interviews in data roles. Viewers are encouraged to practice SQL on platforms like HackerRank and engage with practical exercises to solidify their skills.

Takeaways

  • πŸ‘‰ Understanding the structure and purpose of IPL data sets from 2008-2020.
  • 🎯 Techniques to solve SQL data analysis problems.
  • πŸ’‘ Utilizing SQL functions: SELECT, COUNT, GROUP BY, UNION.
  • πŸ”₯ Analyzing player performance and match statistics.
  • ⏸️ Handling large datasets by breaking and reassembling them.
  • 🧩 Addressing data import challenges and solutions.
  • πŸ“Š Importance of SQL skills for data analytics roles.
  • πŸš€ Practical SQL exercises on platforms like HackerRank.
  • πŸ› οΈ Solutions to common SQL interview questions.
  • πŸ” Data cleaning insights using real-world datasets.

Timeline

  • 00:00:00 - 00:05:00

    The speaker introduces the topic of IPL datasets available for download, where they have shared links in the community tab for access. The main focus will be on solving various data points using these datasets, specifically two main datasets containing IPL matches data from 2008 to 2020. The speaker explains the dataset's structure, including details such as match ID, city, date, player of the match, venue, and match results.

  • 00:05:00 - 00:10:00

    The speaker discusses two datasets: one with general match information and the second one detailing ball-by-ball data, which contains about two lakh entries. They highlight the challenge of handling large datasets in SQL Server Management Studio (SSMS) due to row limits and describe their method of splitting the data into smaller segments to import successfully. They encourage the audience to try importing directly but offers a solution if issues arise.

  • 00:10:00 - 00:15:00

    The speaker confirms their audio is working and shares links to the dataset for the audience to download, emphasizing they are still available in the community tab. They address a question about SQL competencies needed for interviews, suggesting that the content in their beginner to advanced SQL playlists is sufficient for analytics roles.

  • 00:15:00 - 00:20:00

    The speaker outlines how they divided the ball-by-ball dataset into smaller datasets for easier import into SQL, noting issues with row limits. They explain using a union operation to combine the smaller datasets back into one table within the SQL environment, successfully preparing the data for analysis.

  • 00:20:00 - 00:25:00

    The speaker prepares to start querying the datasets to solve specific problems, like counting matches per season, by utilizing SQL functions such as SELECT and GROUP BY. They reflect on the necessity of understanding dataset details before conducting any analysis and adjusting methodologies as required.

  • 00:25:00 - 00:30:00

    The speaker begins solving a data exploration task: counting the number of IPL matches played per season. They describe extracting the year from match dates to achieve this. This process involves SQL operations that filter and group data, then use SELECT commands to display results in a structured format.

  • 00:30:00 - 00:35:00

    The speaker solves for the player with the most 'Player of the Match' awards using SQL commands to count occurrences in the dataset, filtering and sorting results to highlight the player with the most awards. They demonstrate using SELECT and ORDER BY in SQL queries to find these statistics efficiently.

  • 00:35:00 - 00:40:00

    The speaker attempts to find the player with the most 'Player of the Match' awards per season using SQL. They run into issues with query syntax, particularly with ORDER BY in subqueries, but resolve these to achieve sorted data demonstrating the use of ranking functions in getting desired outputs.

  • 00:40:00 - 00:45:00

    The speaker continues solving dataset queries, moving to finding the team with the most wins in IPL history, using simple group-by logic over the 'winner' column to list teams and their win counts. They refine sorting results by descending order to show top team performers.

  • 00:45:00 - 00:50:00

    Attention is given to finding the locations where the most matches were played, illustrating the query process by grouping data based on venues and counting occurrences to identify top venues. They navigate SQL limitations on sorting and query structure to obtain a ranked list of venues.

  • 00:50:00 - 00:55:00

    The speaker extracts batsman performance data from ball-by-ball details, counting runs and sixes to identify players with top scoring. They instruct on joining tables and aggregating necessary performance data, applying SQL functions to extract, sort, and present summarized cricket statistics.

  • 00:55:00 - 01:00:00

    The task involves calculating the percentage of total runs scored by each batsman. The speaker describes using SQL to join summary predicates and compare individual performance against cumulative data, emphasizing understanding SQL's aggregation capabilities for statistical summaries.

  • 01:00:00 - 01:05:00

    The query process extends to counting sixes scored by players, using a similar logic to previous scoring counts, filtering based on run values associated with each play action. The speaker builds complex counting and sorting operations to identify top six-hitters throughout recorded matches.

  • 01:05:00 - 01:10:00

    The speaker demonstrates calculating the highest strike rate among batsmen with over 3000 runs, highlighting rankings in SQL processes. They emphasize handling of diverse data operations including filtering with sub-query conditionals and using calculated metrics for performance evaluation.

  • 01:10:00 - 01:24:16

    In closing, the speaker addresses questions on dataset sources and the preparation for competition in data roles, while giving advice on improving SQL skills through practice and problem-solving on platforms like HackerRank and LeetCode. They discuss future live streams focusing on remaining queries and encourage sharing of alternative solutions.

Show more

Mind Map

Video Q&A

  • What is the main content of the IPL data set discussed in the video?

    India's Premier League (IPL) data analysis, focusing on matches and player statistics from 2008 to 2020.

  • What skills does the video teach?

    Using SQL queries to solve various data analysis problems related to an IPL data set.

  • How does the video approach solving data analysis problems?

    By SQL queries such as SELECT, GROUP BY, COUNT, and by analyzing player performance and match stats from the data.

  • How does the presenter suggest improving SQL skills?

    Participants should practice SQL queries on platforms like HackerRank or LeetCode and understand the concepts from the video tutorials.

  • What other exercises guide viewers in building their SQL proficiency?

    Working through common SQL interview problems and analytical exercises such as those discussed in the video.

  • How can the learned skills be utilized in real-life projects?

    As practice projects for SQL and data analysis, participants can create dashboards using the insights obtained from data sets.

  • Where can one practice data cleaning skills?

    Data cleaning exercises can be practiced using Kaggle or by addressing inconsistencies like NA values in a dataset.

  • What challenges might viewers face while learning SQL?

    These include practicing on coding platforms, understanding advanced functions and fixing syntax errors in SQL scripts.

  • What time frame and data does the video explore?

    2008 to 2020 IPL matches, focused on details such as player stats, venues, and match outcomes.

  • What kind of data sets are utilized in the video?

    The data sets are substantial, consisting of match information and ball-by-ball details for analysis exercises.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!
Subtitles
en
Auto Scroll:
  • 00:00:01
    so friends today we'll be discussing on
  • 00:00:03
    to the ipl data set
  • 00:00:06
    so
  • 00:00:06
    i've got few data sets
  • 00:00:09
    onto the ipl
  • 00:00:11
    uh from google
  • 00:00:13
    i've attached few links
  • 00:00:16
    into the
  • 00:00:17
    community tab also
  • 00:00:20
    i have given the link so you can
  • 00:00:22
    download the data sets from the link
  • 00:00:25
    which i have mentioned into the
  • 00:00:27
    community tab and
  • 00:00:29
    here we'll be solving
  • 00:00:31
    multiple data points
  • 00:00:34
    which i have gathered for this purpose
  • 00:00:38
    so you'll be seeing
  • 00:00:40
    multiple sheets multiple links which are
  • 00:00:42
    provided into the
  • 00:00:44
    community tab post for this particular
  • 00:00:47
    live stream
  • 00:00:48
    i will simply explain what all these
  • 00:00:51
    data sets are all about
  • 00:00:54
    and then we'll be proceeding ahead from
  • 00:00:56
    there so mainly i have two
  • 00:00:59
    data sets as you can see here
  • 00:01:02
    uh
  • 00:01:03
    so from the name you can just pick up
  • 00:01:06
    so these two are main data sets ipl
  • 00:01:09
    matches
  • 00:01:10
    uh 2008 to 2020. so basically both of
  • 00:01:14
    these sheets or all of these sheets they
  • 00:01:17
    contain the data from 2008 to 2020
  • 00:01:22
    so
  • 00:01:22
    only
  • 00:01:23
    these particular date ranges data are
  • 00:01:26
    present into all of these sheets so into
  • 00:01:29
    this sheet you can see here i'll just
  • 00:01:31
    quickly explain what is this data all
  • 00:01:33
    about
  • 00:01:34
    so into the column a you can see i have
  • 00:01:37
    the id so this can be considered as the
  • 00:01:40
    id of the different matches which are
  • 00:01:42
    which would have been played
  • 00:01:44
    into the column b i have the
  • 00:01:47
    city so into which city this particular
  • 00:01:49
    match was being played into the column c
  • 00:01:52
    i have the date date on which this match
  • 00:01:55
    was being played the player of match of
  • 00:01:58
    this particular match
  • 00:02:00
    the venue in which the match was
  • 00:02:02
    conducted uh whether this venue was a
  • 00:02:05
    neutral venue or not
  • 00:02:06
    the team won uh similarly the team to
  • 00:02:10
    who won the toss
  • 00:02:13
    and what was the decision which was
  • 00:02:15
    taken by the team who had won the toss
  • 00:02:19
    similarly the winner who was the winner
  • 00:02:21
    of this particular match
  • 00:02:23
    so into a row you can see
  • 00:02:25
    uh the data is present for just a match
  • 00:02:29
    also we have the result like what was
  • 00:02:31
    the result of this particular match so
  • 00:02:34
    as you can see the results it was based
  • 00:02:36
    on to the wickets or it was based on to
  • 00:02:38
    the run so and also the result margin so
  • 00:02:41
    you can see 140 so this particular match
  • 00:02:44
    was won by kolkata night riders by 140
  • 00:02:47
    runs so this particular data is present
  • 00:02:49
    or whether this was an eliminator or not
  • 00:02:51
    so eliminator it simply means that
  • 00:02:54
    either it would have been a semi-final
  • 00:02:56
    or a final match
  • 00:02:58
    uh in method you can see na is present
  • 00:03:00
    but will be coming to this later what is
  • 00:03:03
    present into this method into the column
  • 00:03:06
    p you can see the umpire one and the
  • 00:03:07
    empire two information is present here
  • 00:03:10
    so i hope the data set is all clear so
  • 00:03:13
    you can
  • 00:03:14
    just mention it
  • 00:03:16
    whether the data set is clear or not
  • 00:03:20
    and then we'll be moving ahead from here
  • 00:03:26
    so
  • 00:03:27
    so this was the very first data set
  • 00:03:29
    coming to the second data set you can
  • 00:03:32
    see
  • 00:03:33
    so this data set is ipl ball by ball
  • 00:03:36
    2008 to 2020 now this particular data
  • 00:03:40
    set is massive
  • 00:03:41
    massive in the sense like
  • 00:03:43
    this particular sheet it contains a lot
  • 00:03:45
    of data so you can just see the count
  • 00:03:47
    so the count is close to two lakhs so
  • 00:03:50
    two lakh rows is contained into this
  • 00:03:52
    particular sheet similarly you can see
  • 00:03:54
    into the column i have the id of the
  • 00:03:55
    match into the column b i have the
  • 00:03:57
    inning so uh there are two innings so
  • 00:04:00
    whether this innings whether this
  • 00:04:02
    information is from the innings one or
  • 00:04:04
    whether this information is from the
  • 00:04:06
    innings to
  • 00:04:07
    and also the over is mentioned here so
  • 00:04:10
    which ball like
  • 00:04:12
    so this is into the information so the
  • 00:04:15
    information which is presented to the
  • 00:04:16
    column d you can see this is the ball so
  • 00:04:19
    this is the fifth ball of the sixth over
  • 00:04:21
    so this information is present here
  • 00:04:23
    similarly the batsman name
  • 00:04:26
    uh the non striker the baller name
  • 00:04:30
    like for this particular ball who was
  • 00:04:33
    the
  • 00:04:34
    non-striker who was the batsman who was
  • 00:04:36
    the baller so what was the batsman runs
  • 00:04:39
    what was the extra runs which were
  • 00:04:41
    considered
  • 00:04:42
    what was the total runs
  • 00:04:44
    uh non-boundary runs is wicked so was
  • 00:04:47
    this uh did they get a wicket or not
  • 00:04:51
    into this particular ball
  • 00:04:53
    so we have all such information which is
  • 00:04:55
    present here similarly we have the
  • 00:04:56
    batting team bowling team and all these
  • 00:04:59
    other informations into this particular
  • 00:05:01
    data set now tell me ashutosh why do we
  • 00:05:04
    have so many different data sets which
  • 00:05:06
    we need to download
  • 00:05:08
    so the thing was as you can see like
  • 00:05:11
    there is a lot of
  • 00:05:12
    count of rows which is present here
  • 00:05:15
    so i'll just see
  • 00:05:18
    so if anybody of you can please let me
  • 00:05:20
    know if my odd i'm audible or not
  • 00:05:31
    okay i'll just check this
  • 00:05:35
    i'll copy the video link
  • 00:05:41
    i'll paste this here
  • 00:05:49
    i'll paste this here
  • 00:05:51
    yes i am audible
  • 00:05:54
    great
  • 00:05:58
    place this year
  • 00:06:00
    yes i am audible
  • 00:06:01
    [Music]
  • 00:06:04
    great
  • 00:06:07
    oh i think i'm not able to see the chat
  • 00:06:10
    here so
  • 00:06:12
    no worries so can you tell me how much
  • 00:06:15
    sql is enough for interviews so
  • 00:06:18
    the number of content which i have
  • 00:06:20
    covered till now into my uh zero to one
  • 00:06:24
    advanced sequel playlist and also on to
  • 00:06:26
    the advanced sequel playlist so both of
  • 00:06:29
    these playlists they are kind of enough
  • 00:06:32
    for cracking the interviews for any
  • 00:06:35
    analytics purpose so you could check out
  • 00:06:38
    both of these
  • 00:06:39
    playlists and all of these questions
  • 00:06:42
    which i take into account into all of
  • 00:06:44
    different playlists they are
  • 00:06:47
    kind of you can say
  • 00:06:49
    pretty much whatever questions they have
  • 00:06:50
    been asked into the interviews
  • 00:06:53
    okay link to the data set you can find
  • 00:06:56
    into the community tab so you can go to
  • 00:06:59
    community tab of my channel and you can
  • 00:07:01
    download it from there
  • 00:07:03
    i have pasted all the links of the data
  • 00:07:06
    sets
  • 00:07:07
    which i am using okay let me just
  • 00:07:10
    provide once more so i'll just open my
  • 00:07:12
    github
  • 00:07:16
    so
  • 00:07:17
    i would give this link here
  • 00:07:21
    you can just
  • 00:07:22
    i would tell you what all want to
  • 00:07:24
    download and
  • 00:07:26
    you can download it from here
  • 00:07:29
    please open this particular link and the
  • 00:07:31
    name of the files
  • 00:07:34
    or else wait i'll just give it
  • 00:07:42
    one by one so there are
  • 00:07:44
    six files which we need to download to
  • 00:07:46
    understand the problem statement
  • 00:07:48
    basically
  • 00:07:49
    so i'm just providing it here
  • 00:07:54
    i'll just provide into the
  • 00:07:57
    chart
  • 00:08:14
    please download the datasets from here
  • 00:08:18
    let me know if you are facing any issues
  • 00:08:20
    while you're downloading these datasets
  • 00:08:34
    and i'll be providing the links of all
  • 00:08:36
    of these data sets into the description
  • 00:08:39
    box of this video also so
  • 00:08:41
    no need to worry
  • 00:08:43
    about that
  • 00:08:48
    yeah i think it's
  • 00:08:51
    fine
  • 00:09:00
    okay so
  • 00:09:02
    i would go back to the video
  • 00:09:04
    do you have any questions
  • 00:09:10
    i hope the data
  • 00:09:12
    side is quite clear now
  • 00:09:14
    yeah hello turning point
  • 00:09:20
    so i've just explained the data sets
  • 00:09:22
    i would uh now explain like why i
  • 00:09:24
    provided so many data sets so
  • 00:09:26
    why data analysts are not getting job
  • 00:09:30
    uh can you please elaborate on this
  • 00:09:35
    i don't think so they are not getting
  • 00:09:37
    job
  • 00:09:40
    i think practice is the key so we need
  • 00:09:42
    to practice a lot on to hacker rank
  • 00:09:45
    lead code
  • 00:09:49
    we need to practice the different
  • 00:09:50
    questions
  • 00:09:51
    i think we can get the job
  • 00:09:59
    okay i'll just go back to the sheet and
  • 00:10:02
    i'll come back to the questions after
  • 00:10:04
    this so
  • 00:10:05
    uh i hope the data set is quite clear
  • 00:10:07
    now so the data set is quite simple
  • 00:10:11
    if you would see now why i had
  • 00:10:14
    to break or why do i need to have so
  • 00:10:17
    many sheets so the thing was as you
  • 00:10:19
    could see like there are close to two
  • 00:10:21
    lakh rows which are present into this
  • 00:10:24
    particular data set
  • 00:10:25
    now when i was trying to import this
  • 00:10:27
    particular data set uh into my ssms so
  • 00:10:31
    it was throwing an error so basically it
  • 00:10:32
    was not throwing an error so the number
  • 00:10:35
    of rows which you can see here is 2 lakh
  • 00:10:36
    so
  • 00:10:37
    when i was doing select star or select
  • 00:10:40
    count star from this particular data set
  • 00:10:42
    i was not getting the exact count like
  • 00:10:45
    you can see here one nine three four six
  • 00:10:47
    nine is the total number of data entries
  • 00:10:49
    which are present into this particular
  • 00:10:50
    sheet i was not getting this particular
  • 00:10:52
    enter rather the entry was quite less
  • 00:10:55
    i think it was close to around six
  • 00:10:57
    thousand so i thought of or i think it
  • 00:11:00
    was close to sixty thousand so i thought
  • 00:11:01
    of breaking this particular sheet
  • 00:11:05
    into sixty thousand data points or sixty
  • 00:11:07
    thousand rows and then i did break this
  • 00:11:10
    particular sheet into four different
  • 00:11:13
    sheets as you can see here so 16 to 3 is
  • 00:11:16
    one lakh 80 000 and the rest of the
  • 00:11:20
    rows i think i've just
  • 00:11:22
    occupied into one of the sheets here so
  • 00:11:24
    you can see here i have
  • 00:11:26
    1
  • 00:11:27
    four hundred and seventy two rows here
  • 00:11:29
    which is present here so this is how i
  • 00:11:31
    have uh accommodated all of the
  • 00:11:34
    different uh sheets here or you can see
  • 00:11:37
    now or you can see the entire sheet here
  • 00:11:40
    by breaking the entire data set into
  • 00:11:42
    multiple small small
  • 00:11:44
    data sets which could be accommodated
  • 00:11:46
    into my sql server management studio and
  • 00:11:50
    then uh i just imported into into my
  • 00:11:52
    ssms so
  • 00:11:54
    now you would just ask me like
  • 00:11:56
    did i follow any procedure to break this
  • 00:11:58
    particular data set it was no i i just
  • 00:12:01
    simply took the very first 60 000 rows i
  • 00:12:04
    copy pasted the data into a separate
  • 00:12:06
    sheet then i moved to the next 60 000
  • 00:12:08
    rows i just copy pasted the data into a
  • 00:12:10
    different sheet so this is how i have
  • 00:12:13
    done simply and then i've just made
  • 00:12:15
    these particular different sheets so
  • 00:12:18
    uh you can just see from your end if you
  • 00:12:20
    are able to just import it
  • 00:12:23
    just at one go if you're not able to do
  • 00:12:25
    so break this data set into sheets uh
  • 00:12:29
    two sheets or three sheet space basis on
  • 00:12:31
    your
  • 00:12:32
    problem which you are getting and then
  • 00:12:34
    you can import it into the sql server
  • 00:12:36
    management studio
  • 00:12:38
    i have imported from before i have all
  • 00:12:41
    the
  • 00:12:42
    different you can see tables
  • 00:12:46
    i'll just go to tables here
  • 00:12:48
    so you can see here i have imported all
  • 00:12:50
    of
  • 00:12:51
    these different data sets ipl 1 ipl 2
  • 00:12:54
    ipl 3 apple 4 and ipl so into this
  • 00:12:57
    particular ipl it contains the
  • 00:13:01
    information about the matches from 2008
  • 00:13:04
    to 2020
  • 00:13:06
    and into ipl one i will do apple three
  • 00:13:09
    ipl for it basically i broke uh this
  • 00:13:13
    particular sheet which contained uh the
  • 00:13:15
    ball by boiling information and it
  • 00:13:18
    contained around 2 lakh rows i've just
  • 00:13:20
    broke down this particular sheet into
  • 00:13:21
    multiple sheets into multiple data sets
  • 00:13:24
    and then i have
  • 00:13:26
    copied the data into different sheets
  • 00:13:28
    and i've presented it here imported it
  • 00:13:30
    here sequentially
  • 00:13:32
    so okay let me know if you have any
  • 00:13:34
    questions
  • 00:13:36
    till this moment
  • 00:13:39
    hi uh everybody whoever is joining where
  • 00:13:42
    can we practice data cleaning
  • 00:13:46
    okay
  • 00:13:47
    so right now i don't have any such video
  • 00:13:50
    onto data cleaning i can think of so
  • 00:13:55
    i
  • 00:13:55
    am planning to prepare our dedicated two
  • 00:13:58
    or three videos on to data cleaning so i
  • 00:14:01
    guess that would be enough and from
  • 00:14:03
    there we you can practice
  • 00:14:07
    and
  • 00:14:08
    you can you can also practice from you
  • 00:14:10
    can just simply download the data sets
  • 00:14:12
    from kaggle
  • 00:14:13
    and try to identify how i can clean this
  • 00:14:17
    particular data set or how i can make
  • 00:14:20
    use of this particular data set what are
  • 00:14:22
    the problems which i could encounter
  • 00:14:25
    when i'm doing certain data analysis
  • 00:14:27
    tasks so
  • 00:14:28
    how i can get rid of all of these uh
  • 00:14:31
    kind of different data cleaning process
  • 00:14:33
    or dirty data which i am having right
  • 00:14:35
    now
  • 00:14:36
    so from where did you get this data set
  • 00:14:39
    i got this data set from google to be
  • 00:14:42
    exact i got this data set from kaggle
  • 00:14:45
    by the way i would just provide i've
  • 00:14:47
    provided all the links of
  • 00:14:50
    all these data sets here so you can just
  • 00:14:51
    download it from here also
  • 00:14:55
    my question was what our company really
  • 00:14:58
    expecting from data analyst is cutthroat
  • 00:15:01
    competition on yep that is true like
  • 00:15:03
    competition is
  • 00:15:05
    uh has increased a lot into
  • 00:15:08
    the past one year or the two years so
  • 00:15:12
    you are very much to
  • 00:15:15
    what our company really expecting
  • 00:15:19
    i guess the company they
  • 00:15:22
    are expecting you to have expertise uh
  • 00:15:25
    into the skills they are wanting so
  • 00:15:28
    if they are asking sql python so
  • 00:15:31
    you should be pretty much
  • 00:15:33
    good at it
  • 00:15:35
    and also they look into the aptitude
  • 00:15:37
    part your analytical skills so
  • 00:15:40
    uh
  • 00:15:42
    they are pretty much focusing onto the
  • 00:15:45
    aptitude part as well
  • 00:15:46
    also your problem solving part so
  • 00:15:50
    i think practice is the key again you
  • 00:15:52
    should practice from
  • 00:15:54
    a hacker ranked lead code
  • 00:15:56
    all of these different websites so that
  • 00:15:58
    might help
  • 00:16:02
    we should uh move forward now so
  • 00:16:09
    so you can see here
  • 00:16:11
    i have
  • 00:16:13
    so
  • 00:16:15
    i just need to get two data sets
  • 00:16:18
    as you can see here
  • 00:16:21
    this is one of the data set and this is
  • 00:16:23
    pretty much clear this is into this
  • 00:16:26
    dbo dot ipl
  • 00:16:27
    table but
  • 00:16:29
    all of these four
  • 00:16:31
    data sets you can see it contained
  • 00:16:33
    the same data but the data has been
  • 00:16:35
    split into multiple sheets so
  • 00:16:37
    let us do one thing we can use the union
  • 00:16:40
    operator
  • 00:16:41
    to join all of these different data sets
  • 00:16:44
    into a single data set
  • 00:16:46
    and we can just have
  • 00:16:48
    a single i mean a single table which
  • 00:16:52
    could contain all of these data sets so
  • 00:16:54
    i've just used the union operator you
  • 00:16:56
    can see here
  • 00:16:57
    first of all i have created a
  • 00:17:00
    temporary table so the temporal name is
  • 00:17:03
    ipls app so i'll just renamed it
  • 00:17:05
    for now ipls
  • 00:17:07
    and i've just mentioned your create
  • 00:17:09
    table ipls and i've mentioned all the
  • 00:17:11
    different columns which i require and
  • 00:17:14
    also the data type of all of these
  • 00:17:16
    different columns and then i am
  • 00:17:19
    after this i have created a table i am
  • 00:17:21
    inserting into this particular
  • 00:17:24
    table the different data sets which i
  • 00:17:26
    have split so you can see here i am
  • 00:17:30
    telling select star from tutorial dot
  • 00:17:32
    dbo dot ipl one
  • 00:17:34
    so here i have just imported and i'm
  • 00:17:37
    doing a union so first of all the ipl
  • 00:17:40
    one so the data which was contained into
  • 00:17:43
    the ipl one sheet it has been imported
  • 00:17:47
    after this i'm doing a union so it will
  • 00:17:49
    join vertically so basically we could
  • 00:17:51
    also join uh multiple tables or data
  • 00:17:54
    sets vertically so union is used for
  • 00:17:56
    that case so
  • 00:17:57
    after i have got the ipl1 dataset i am
  • 00:18:00
    just joining it vertically with ipl 2
  • 00:18:03
    similarly for ipl3 and ipl4
  • 00:18:06
    now if you would see if i would do a
  • 00:18:08
    simple select count star from ipls so
  • 00:18:11
    this is the exact number of rows which
  • 00:18:13
    are present into my
  • 00:18:15
    uh
  • 00:18:16
    the second data set so i would just
  • 00:18:18
    count the number of rows i think it must
  • 00:18:20
    be exact one nine three four six nine
  • 00:18:24
    so i think it's close
  • 00:18:26
    yeah so it is exactly the same
  • 00:18:29
    the number of sheets or the number of
  • 00:18:32
    rows which are presented to our second
  • 00:18:34
    sheet
  • 00:18:35
    so this is how we have uh solved a
  • 00:18:39
    particular problem so why i did this
  • 00:18:41
    because the entire data said it was not
  • 00:18:44
    i was not able to import it completely
  • 00:18:46
    into my ssms only 60 000 rows at once or
  • 00:18:50
    close to around 80 000 rows was being
  • 00:18:52
    imported so i just simply break down the
  • 00:18:55
    entire data set which had two lakh rows
  • 00:18:58
    into multiple sheets and each of the
  • 00:19:00
    sheet i just kept around 60 000 rows so
  • 00:19:04
    all of these sheets you can see i peel
  • 00:19:06
    one apple to appeal three i built four
  • 00:19:07
    so if i would copy this and i would just
  • 00:19:12
    simply
  • 00:19:13
    do a select count star for
  • 00:19:15
    each of these sheets here
  • 00:19:21
    you will
  • 00:19:22
    get to see it is
  • 00:19:26
    exactly the same ipl one
  • 00:19:32
    till two
  • 00:19:35
    with three
  • 00:19:38
    pl4 so
  • 00:19:40
    just select this and execute this
  • 00:19:43
    invalid
  • 00:19:44
    okay
  • 00:19:57
    i forgot to paste this also
  • 00:20:06
    just run this again
  • 00:20:11
    so you can see here close to 59
  • 00:20:14
    999 rows are present here and if i would
  • 00:20:16
    simply
  • 00:20:17
    sum up this
  • 00:20:19
    we will be getting the count uh the
  • 00:20:22
    number of rows which are present into
  • 00:20:24
    our
  • 00:20:25
    the master data set so i could just
  • 00:20:27
    simply call this as a master data set
  • 00:20:29
    and i've just broken this master data
  • 00:20:31
    set into multiple data sets so
  • 00:20:34
    you if you could just see from your end
  • 00:20:36
    if you are able to get the entire data
  • 00:20:38
    set at a single go
  • 00:20:41
    if not you can just simply do the same
  • 00:20:43
    process which i am doing right now
  • 00:20:46
    and then
  • 00:20:47
    let me
  • 00:20:49
    see if
  • 00:20:50
    you have any doubts
  • 00:21:00
    oh from where do you practice domain
  • 00:21:02
    knowledge and develop analytical skills
  • 00:21:06
    uh
  • 00:21:07
    so basically the
  • 00:21:09
    analytical skills
  • 00:21:11
    it is close to the problem solving
  • 00:21:13
    skills so the more number of problems
  • 00:21:15
    you are solving the better you get at
  • 00:21:18
    your analytical skills so
  • 00:21:20
    you'll be getting different viewpoint
  • 00:21:22
    like okay this particular problem can be
  • 00:21:24
    solved by this process also or this
  • 00:21:27
    particular problem could have been
  • 00:21:28
    solved by the second process so you'll
  • 00:21:30
    be getting getting multiple points
  • 00:21:32
    multiple views when you are solving a
  • 00:21:34
    lot of different problems
  • 00:21:37
    so that is what i think you can develop
  • 00:21:39
    your analytical skills
  • 00:21:41
    and regarding the domain knowledge uh i
  • 00:21:44
    think that is not pretty much important
  • 00:21:47
    but you are having some kind of domain
  • 00:21:49
    knowledge like for example
  • 00:21:51
    if we are
  • 00:21:53
    dealing with the stock market data set
  • 00:21:55
    so we know okay
  • 00:21:57
    there are certain certain sharp points
  • 00:21:59
    so we can smooth in the data set by
  • 00:22:01
    using the moving average thing so these
  • 00:22:03
    are kind of uh different domain
  • 00:22:05
    knowledge
  • 00:22:06
    is required
  • 00:22:09
    but
  • 00:22:10
    practicing domain knowledge
  • 00:22:12
    i don't think it is pretty much that
  • 00:22:14
    important
  • 00:22:15
    important part is you are able to solve
  • 00:22:17
    the problem into the interviews that is
  • 00:22:20
    pretty much fine if not you need to
  • 00:22:22
    practice a lot
  • 00:22:24
    so you should
  • 00:22:26
    practice a lot of questions i should say
  • 00:22:29
    let us move to the ssms again
  • 00:22:33
    okay so what are the different problems
  • 00:22:35
    which we are going to solve now before
  • 00:22:37
    that let us see
  • 00:22:38
    once i'll do a select star from
  • 00:22:44
    ipls and
  • 00:22:46
    the second
  • 00:22:47
    table is select star from
  • 00:22:53
    ipl okay
  • 00:22:56
    i would execute this statement
  • 00:23:03
    okay
  • 00:23:12
    execute so
  • 00:23:14
    okay
  • 00:23:21
    okay you can see here both of these data
  • 00:23:23
    sets they have been imported finally
  • 00:23:25
    into our ssms uh both of these excel
  • 00:23:29
    sheets i have imported finally so
  • 00:23:33
    one of the sheet it contains the details
  • 00:23:35
    of the matches which have been
  • 00:23:37
    played
  • 00:23:38
    and the other sheet it contains the
  • 00:23:40
    information about the ball by ball
  • 00:23:42
    information for each of the matches
  • 00:23:44
    which have been
  • 00:23:46
    been played okay
  • 00:23:47
    so let us move to the problem set so i
  • 00:23:50
    have a lot of different problems which i
  • 00:23:52
    could have think of or
  • 00:23:54
    i would find onto the google so i would
  • 00:23:57
    just simply note down here into this
  • 00:23:58
    particular sheet so i've just prepared
  • 00:24:00
    this particular sheet
  • 00:24:01
    let us see how many problems which we
  • 00:24:04
    are able to solve today
  • 00:24:06
    and then
  • 00:24:08
    the rest of the problems we can solve on
  • 00:24:10
    a different day so
  • 00:24:12
    you could just simply take a screenshot
  • 00:24:14
    of this
  • 00:24:15
    for now
  • 00:24:16
    you can practice from your end also
  • 00:24:19
    and then or you if you want to follow
  • 00:24:22
    along this video with me you can do that
  • 00:24:24
    also right now
  • 00:24:26
    all the data sets which i have provided
  • 00:24:29
    into the
  • 00:24:30
    chat box so you can just download the
  • 00:24:32
    data set from there
  • 00:24:34
    let us start
  • 00:24:35
    so the very first
  • 00:24:38
    question you can see here which we need
  • 00:24:40
    to find is the number of matches which
  • 00:24:43
    have been played per season so we need
  • 00:24:46
    to find this
  • 00:24:48
    now how to go about this like
  • 00:24:51
    we need to count the number of matches
  • 00:24:53
    which have been played
  • 00:24:55
    into per season
  • 00:24:56
    now
  • 00:24:57
    uh i have not solved all of these
  • 00:25:00
    questions i've just prepared the data
  • 00:25:01
    set i've just prepared
  • 00:25:05
    the
  • 00:25:06
    problems from before i haven't solved it
  • 00:25:09
    yet
  • 00:25:10
    i wonder if i have if we have the season
  • 00:25:14
    information or not
  • 00:25:17
    um okay
  • 00:25:21
    okay we have the dates
  • 00:25:23
    and
  • 00:25:24
    in a single a single season would be
  • 00:25:27
    played in a single day
  • 00:25:29
    year so we could just simply count the
  • 00:25:32
    number of times a year is coming so for
  • 00:25:34
    example 2008 so i would just simply grab
  • 00:25:38
    the year from all of these
  • 00:25:40
    dates and then i would simply count the
  • 00:25:44
    number of times
  • 00:25:46
    this particular year is coming
  • 00:25:48
    and that would simply give the number of
  • 00:25:50
    matches which have been played
  • 00:25:52
    into this season
  • 00:25:55
    okay let us simply use this particular
  • 00:25:58
    logic to get the total number of matches
  • 00:26:00
    which have been played into the
  • 00:26:02
    each of the season
  • 00:26:04
    because i could not find a column where
  • 00:26:07
    the season information is present here
  • 00:26:10
    it is just the date on which the match
  • 00:26:13
    was played so i'm just using this
  • 00:26:15
    particular logic that
  • 00:26:17
    uh the entire matches in a particular
  • 00:26:19
    season would be played into
  • 00:26:22
    the same year so i would just simply
  • 00:26:25
    grab the year information from all of
  • 00:26:27
    these dates
  • 00:26:28
    and i would simply count the number of
  • 00:26:30
    distinct let's say the matches id which
  • 00:26:33
    is coming
  • 00:26:34
    and that would simply give the number of
  • 00:26:36
    matches which have been played into the
  • 00:26:38
    each of the season now
  • 00:26:40
    uh i don't know how much accurate this
  • 00:26:42
    particular data set is
  • 00:26:44
    i haven't verified from my end but i
  • 00:26:47
    think i've downloaded it from kaggle so
  • 00:26:48
    it must be pretty much
  • 00:26:51
    very much correct data
  • 00:26:52
    at least 98 of the data might be pretty
  • 00:26:55
    much correct so let us start writing the
  • 00:26:58
    query for this
  • 00:27:00
    you could also write from your end and
  • 00:27:03
    you can paste
  • 00:27:04
    okay
  • 00:27:07
    sorry
  • 00:27:08
    is enough for getting job
  • 00:27:10
    yes
  • 00:27:11
    it is pretty much enough for getting job
  • 00:27:14
    uh the advanced sequel pianist i would
  • 00:27:16
    request uh you know it so please
  • 00:27:21
    go through all the videos which i have
  • 00:27:23
    been posting around so
  • 00:27:25
    you should
  • 00:27:27
    practice a lot of different questions on
  • 00:27:29
    to lead code hacker rank hacker earth
  • 00:27:32
    to strengthen your concepts
  • 00:27:34
    and that is pretty much enough for the
  • 00:27:36
    concepts which i'm covering so i'll be
  • 00:27:38
    covering a lot of different concepts uh
  • 00:27:40
    from now like afterwards also but
  • 00:27:43
    whatever is present into my playlist
  • 00:27:46
    right now it is pretty much enough what
  • 00:27:48
    do you think we must select domain at
  • 00:27:50
    the start of the career just jump into
  • 00:27:52
    any domain and then
  • 00:27:58
    i think you should pretty much jump just
  • 00:28:01
    right now just learn these skills
  • 00:28:04
    and then you can just think of
  • 00:28:06
    changing domains if you're not liking
  • 00:28:09
    and some
  • 00:28:10
    projects to practice and build my
  • 00:28:12
    portfolio
  • 00:28:13
    so i have prepared a portfolio project
  • 00:28:16
    onto the indian census data so it is uh
  • 00:28:20
    i've broken down the video or into two
  • 00:28:23
    parts part one and part two so you can
  • 00:28:25
    watch that
  • 00:28:26
    and pretty much have covered a lot of
  • 00:28:28
    different concepts so you
  • 00:28:30
    can get to learn a lot of different
  • 00:28:32
    concepts
  • 00:28:33
    different concepts onto the sequel the
  • 00:28:35
    advanced window function the rank
  • 00:28:37
    function so i've tried to covered a lot
  • 00:28:39
    of different concepts from the indian
  • 00:28:41
    sensors data so you can watch that video
  • 00:28:44
    uh i don't know
  • 00:28:47
    i think you will get to learn a lot from
  • 00:28:49
    that particular video so i also have
  • 00:28:51
    prepared a video on the shark tank india
  • 00:28:53
    data set you can watch that video also
  • 00:28:58
    okay i think
  • 00:29:00
    we could just move
  • 00:29:02
    to our screen ssms so the thing which we
  • 00:29:06
    need is the total number of matches
  • 00:29:07
    which have been played into each of the
  • 00:29:09
    season and the logic which i have
  • 00:29:11
    created here is i would just simply
  • 00:29:14
    get like all the matches would be played
  • 00:29:16
    into a single year like a single season
  • 00:29:18
    would take place into a year so from
  • 00:29:21
    this particular date column you can see
  • 00:29:22
    i would just simply grab the
  • 00:29:24
    the year and i would just simply count
  • 00:29:27
    the distinct match ids which are coming
  • 00:29:30
    so that would serve my purpose so i
  • 00:29:31
    would just write here select
  • 00:29:36
    and here from
  • 00:29:40
    date
  • 00:29:44
    and id
  • 00:29:45
    from
  • 00:29:49
    tutorial.dbo.ipl so i will just copy
  • 00:29:50
    paste here
  • 00:29:57
    so you can see i'm getting the year as
  • 00:29:59
    well as i'm getting the
  • 00:30:01
    id of the matches so i would simply
  • 00:30:02
    count the number of
  • 00:30:06
    now i think uh number of
  • 00:30:09
    different ids which are coming up for
  • 00:30:12
    each of the year so
  • 00:30:14
    i just put this into a sub query so this
  • 00:30:16
    was pretty much simple
  • 00:30:18
    i just tried to gathered a lot of
  • 00:30:20
    different metrics which i could solve
  • 00:30:22
    into this live session onto this
  • 00:30:24
    particular data set
  • 00:30:26
    even if the title you can see this is a
  • 00:30:28
    simpler data exploration project so
  • 00:30:32
    just count select here comma
  • 00:30:36
    of
  • 00:30:37
    distinct so
  • 00:30:40
    discounts number of distinct id which is
  • 00:30:42
    coming here
  • 00:30:45
    from
  • 00:30:46
    this
  • 00:30:49
    and at the last we need to
  • 00:30:52
    group by on to the basis of the year
  • 00:30:57
    that's it
  • 00:30:58
    let us execute the code
  • 00:31:00
    so friends you can see here for each of
  • 00:31:03
    the year i am just getting the number of
  • 00:31:05
    matches which have been played so into
  • 00:31:07
    2010 60 matches were played 2011 73
  • 00:31:11
    matches have been played so you can just
  • 00:31:14
    see the uh the logic which i have
  • 00:31:16
    applied here to solve this particular
  • 00:31:19
    problem so number of matches
  • 00:31:22
    i would just rename this column
  • 00:31:26
    just execute the code again
  • 00:31:29
    so yeah this is pretty much
  • 00:31:32
    the thing which we had
  • 00:31:35
    been asked into the problem so i just
  • 00:31:38
    needed to get the number of matches
  • 00:31:40
    which have been played into each of the
  • 00:31:42
    season
  • 00:31:44
    so this is the result which we are
  • 00:31:46
    getting so in the year 2008
  • 00:31:49
    uh 58 matches were played 2957 matches
  • 00:31:52
    were played so this is the information
  • 00:31:55
    which we are getting let us move to the
  • 00:31:57
    second problem
  • 00:32:01
    this second problem says most player of
  • 00:32:04
    match
  • 00:32:05
    so these are all very much simple so
  • 00:32:07
    here also
  • 00:32:09
    which is the data set which we require
  • 00:32:13
    number of with the most player of
  • 00:32:15
    matches
  • 00:32:16
    latest
  • 00:32:17
    most player of match we want okay
  • 00:32:21
    so
  • 00:32:22
    batsman uh
  • 00:32:26
    all by matches here also we'll be using
  • 00:32:29
    this sheet
  • 00:32:33
    and which column is it
  • 00:32:36
    player of match you can see into the
  • 00:32:37
    column d i have player of the match so
  • 00:32:40
    we could just simply count the number of
  • 00:32:42
    times the player is coming into this
  • 00:32:44
    particular column because for each row a
  • 00:32:46
    single match information is present here
  • 00:32:49
    no duplicate information is present here
  • 00:32:52
    so we could just simply count the number
  • 00:32:54
    of times this particular player is
  • 00:32:56
    coming into this particular row and the
  • 00:32:59
    player who is coming the most number of
  • 00:33:01
    times that simply means that
  • 00:33:03
    the highest
  • 00:33:06
    man of the match has been won by that
  • 00:33:08
    player so i would just simply run this
  • 00:33:11
    code again once
  • 00:33:14
    and we'll be executing onto the column d
  • 00:33:16
    so i will just write here select
  • 00:33:18
    layer
  • 00:33:20
    of
  • 00:33:22
    match comma count of
  • 00:33:26
    the same column
  • 00:33:29
    layer
  • 00:33:30
    of
  • 00:33:32
    match
  • 00:33:34
    dom
  • 00:33:38
    i'll just copy this
  • 00:33:40
    paste this
  • 00:33:42
    and at the last i'll
  • 00:33:44
    do the same thing i'll group by on to
  • 00:33:46
    the basis of the
  • 00:33:48
    layer of match
  • 00:33:50
    so i hope you're getting this point
  • 00:33:55
    i would simply execute this statement
  • 00:33:58
    so you can see here for each of the
  • 00:34:00
    player i'm getting the number of times
  • 00:34:02
    they have got the player of match so
  • 00:34:04
    i would now do this
  • 00:34:08
    rename this column so this would be
  • 00:34:11
    uh just give any name number
  • 00:34:16
    man of match or just right here man of
  • 00:34:18
    match and at the last i need to get
  • 00:34:24
    the
  • 00:34:25
    player who has scored the most
  • 00:34:27
    or who have won the most player of the
  • 00:34:29
    match so i would order by on the basis
  • 00:34:31
    of this particular column into the
  • 00:34:33
    descending order let us see if this
  • 00:34:35
    produces the result or not
  • 00:34:38
    so you can see here a b is he has won
  • 00:34:40
    the man of the match award 23 times the
  • 00:34:44
    highest number of times and then
  • 00:34:46
    the rest of the player you can see here
  • 00:34:49
    the information is present here so this
  • 00:34:51
    is how we can solve this particular
  • 00:34:53
    problem like getting the player who has
  • 00:34:56
    won the most player of the match in
  • 00:34:59
    just
  • 00:35:02
    hey thanks an image
  • 00:35:06
    okay you're proud okay that's
  • 00:35:09
    so much nice of you
  • 00:35:14
    oh hey
  • 00:35:19
    let us
  • 00:35:21
    go back to the
  • 00:35:23
    ssms
  • 00:35:27
    the third problem which we need to do
  • 00:35:30
    here is
  • 00:35:31
    let us move to the third problem so the
  • 00:35:33
    third problem says most player of the
  • 00:35:35
    match per season
  • 00:35:38
    so
  • 00:35:38
    into the season one which player has got
  • 00:35:43
    the highest player of the match into
  • 00:35:45
    this season to which player has got the
  • 00:35:47
    highest player of the match we want to
  • 00:35:50
    get this particular information let's
  • 00:35:51
    see how to do this
  • 00:35:53
    so
  • 00:35:54
    um okay
  • 00:35:58
    so
  • 00:35:59
    now this time i'll be uh using
  • 00:36:02
    another column to group by
  • 00:36:05
    and i would simply get the
  • 00:36:10
    number of player of match won by each of
  • 00:36:13
    the player in each year
  • 00:36:15
    so
  • 00:36:16
    i'll just do this here year
  • 00:36:19
    from date i would extract the year from
  • 00:36:21
    date
  • 00:36:22
    and i will present this
  • 00:36:24
    into the
  • 00:36:29
    this particular column itself i'll
  • 00:36:31
    execute the code
  • 00:36:34
    because it's not been painted
  • 00:36:40
    okay
  • 00:36:41
    my bad so i'll just
  • 00:36:44
    ctrl x
  • 00:36:59
    i'll execute the code again so you can
  • 00:37:01
    see here for example chris gayle into
  • 00:37:04
    the year 2011
  • 00:37:05
    he has won the player of the match six
  • 00:37:09
    times
  • 00:37:10
    similarly chris gayle into the year 2012
  • 00:37:13
    he has won the player of the match five
  • 00:37:15
    times so this is the information which
  • 00:37:17
    we wanted now for each of the season i
  • 00:37:20
    want to know like which player has won
  • 00:37:22
    the highest player of the match so uh
  • 00:37:25
    from here you must have understood what
  • 00:37:26
    is the thing which you need to do here
  • 00:37:28
    so we'll be applying the rank function
  • 00:37:29
    here to do
  • 00:37:32
    provide the ranking so i'll do your
  • 00:37:33
    select
  • 00:37:35
    and i'll be partitioning on to the basis
  • 00:37:37
    of the year column so
  • 00:37:39
    i would rename this column as here
  • 00:37:43
    so do here
  • 00:37:46
    select
  • 00:37:50
    layer of match
  • 00:37:54
    comma
  • 00:37:57
    year comma count
  • 00:38:00
    okay this is man of match
  • 00:38:03
    and i'll be using the rank functions
  • 00:38:05
    rank over
  • 00:38:06
    i'll be partitioning my data set or the
  • 00:38:09
    result which i've got
  • 00:38:11
    onto the basis of the year column and
  • 00:38:15
    for each year
  • 00:38:17
    i'll be
  • 00:38:22
    ranking on the basis of the man of the
  • 00:38:24
    match column into the descending order
  • 00:38:26
    so that the player who has won the
  • 00:38:28
    highest number of man of the match into
  • 00:38:30
    a particular year would get the rank as
  • 00:38:32
    one so i'll be giving this
  • 00:38:35
    and this is the rank column from
  • 00:38:42
    let us execute the code
  • 00:38:53
    we rank or partition by year ordered by
  • 00:38:56
    man of the match
  • 00:38:58
    the order clause clauses invalid in
  • 00:39:00
    views
  • 00:39:06
    there's something wrong we have done
  • 00:39:08
    here i'll just execute this particular
  • 00:39:10
    code
  • 00:39:16
    so player of match and into the year the
  • 00:39:20
    number of man of the match they've got
  • 00:39:29
    partition by year and ordered by
  • 00:39:34
    man of the match
  • 00:39:38
    i think it's pretty much correct
  • 00:39:41
    why is it showing error
  • 00:39:43
    use
  • 00:39:44
    [Music]
  • 00:39:55
    okay uh
  • 00:39:58
    rank our partition by okay let me just
  • 00:40:01
    simply remove this
  • 00:40:03
    let's see if we are getting any output
  • 00:40:05
    here or not
  • 00:40:14
    okay
  • 00:40:16
    so i think the pretty much the concept
  • 00:40:18
    which we are applying here is correct
  • 00:40:21
    but
  • 00:40:21
    there might be certain syntax error
  • 00:40:23
    which we are getting right now
  • 00:40:25
    we need to correct that
  • 00:40:30
    okay
  • 00:40:32
    i'll do a control x
  • 00:40:43
    let us
  • 00:40:47
    do it again so
  • 00:40:48
    just provide this into a sub query
  • 00:40:56
    right here select
  • 00:40:58
    player
  • 00:41:01
    of match
  • 00:41:09
    comma year of
  • 00:41:13
    man of match
  • 00:41:18
    our parties
  • 00:41:24
    so i'll be partitioning my data set onto
  • 00:41:26
    the basis of the year
  • 00:41:29
    and
  • 00:41:33
    order by onto the basis of
  • 00:41:51
    okay let me try
  • 00:41:52
    this
  • 00:41:54
    for the final time
  • 00:41:55
    or
  • 00:41:56
    unless we'll be moving to the next
  • 00:41:58
    question
  • 00:41:59
    just throwing error order by clause is
  • 00:42:01
    invalid in views inline function derived
  • 00:42:04
    tables
  • 00:42:06
    okay oh
  • 00:42:11
    i think we could
  • 00:42:15
    oh
  • 00:42:17
    we cubed use a temporary table to do
  • 00:42:20
    this because
  • 00:42:23
    we cannot
  • 00:42:25
    use the order by clause
  • 00:42:28
    as it is throwing an error like the
  • 00:42:29
    order by clause is invalid in views
  • 00:42:31
    inline functions
  • 00:42:33
    so
  • 00:42:33
    [Music]
  • 00:42:39
    i think i would remove this
  • 00:42:42
    because
  • 00:42:43
    of the
  • 00:42:45
    and it works pretty much fine it was all
  • 00:42:48
    because into the sub query you cannot
  • 00:42:50
    use the order by clause and that is what
  • 00:42:53
    it was throwing the error so
  • 00:42:55
    if so i just read the error which i was
  • 00:42:58
    getting here into a lot of details and i
  • 00:43:02
    could get the answer so you can see here
  • 00:43:04
    order by clause is invalid into the sub
  • 00:43:06
    query so into the sub query you can see
  • 00:43:08
    i was using the order by clause and this
  • 00:43:10
    is pretty much invalid to you so i
  • 00:43:13
    removed this
  • 00:43:15
    and i got the rank now after i did get
  • 00:43:19
    the rank
  • 00:43:22
    okay one more error
  • 00:43:30
    so i'll just execute the statement so
  • 00:43:32
    you can see here i'm getting the rank
  • 00:43:33
    here
  • 00:43:34
    now i'm only connect so you can if you
  • 00:43:36
    would see the result closely
  • 00:43:39
    the rank is being provided on to the
  • 00:43:42
    basic number of
  • 00:43:44
    highest number of man of the matches
  • 00:43:45
    which have been won now we are only
  • 00:43:47
    concerned with the highest man of the
  • 00:43:49
    match
  • 00:43:51
    award winner so
  • 00:43:53
    i would simply filter out the data
  • 00:43:55
    select star from this
  • 00:43:58
    and i would simply filter out the data
  • 00:44:00
    where
  • 00:44:03
    rank is equal to 1
  • 00:44:06
    and that would pretty much give the data
  • 00:44:09
    set of all the
  • 00:44:11
    players who have won the highest number
  • 00:44:13
    of man of the match award in each of the
  • 00:44:15
    year so you can see
  • 00:44:18
    i'm getting the result here
  • 00:44:21
    shawn marsh into the year 2008 won the
  • 00:44:24
    highest number of man of the match award
  • 00:44:25
    that was the five chris gayle similarly
  • 00:44:28
    for all the players you can see i'm
  • 00:44:29
    getting the result
  • 00:44:31
    let me see
  • 00:44:32
    for
  • 00:44:34
    yeah this can be considered as a
  • 00:44:36
    portfolio project so
  • 00:44:38
    this is simple data exploration which
  • 00:44:40
    i'm doing right now
  • 00:44:42
    and you can use this particular result
  • 00:44:45
    whichever i'm getting to build simply
  • 00:44:48
    beautiful dashboards and include into
  • 00:44:50
    your portfolio
  • 00:44:52
    so you can do it definitely
  • 00:44:56
    moving on to the third question
  • 00:45:02
    moving on to the
  • 00:45:04
    four fourth question so most wins by any
  • 00:45:08
    team
  • 00:45:09
    so
  • 00:45:10
    here we need to solve or find the
  • 00:45:14
    team which has won the highest number of
  • 00:45:17
    matches till date
  • 00:45:20
    so which of the sheet will be using it
  • 00:45:23
    here
  • 00:45:26
    so here you can see here i have a winner
  • 00:45:27
    column and simply i'll be using the same
  • 00:45:29
    concept i'll be grouping by on to the
  • 00:45:31
    basis of the winner column
  • 00:45:34
    and that would do the thing
  • 00:45:38
    so i would just copy this particular
  • 00:45:42
    table name
  • 00:45:44
    for we
  • 00:45:52
    select star from
  • 00:45:58
    and i'll be using this particular winner
  • 00:45:59
    column as you can see here winner
  • 00:46:02
    so i would just simply count winner
  • 00:46:05
    comma count of
  • 00:46:07
    winner
  • 00:46:11
    and at the last i'll be grouping by on
  • 00:46:15
    to the basis of the winner
  • 00:46:17
    that's it let us
  • 00:46:21
    execute the command
  • 00:46:23
    so you can see here i am getting
  • 00:46:25
    the number of times
  • 00:46:28
    number of matches which have been worn
  • 00:46:30
    by each of the team i am getting the
  • 00:46:32
    information here
  • 00:46:34
    these are kind of the dirty data so you
  • 00:46:35
    can see here na is present four times so
  • 00:46:38
    these are certain data's which we need
  • 00:46:41
    to study beforehand before we are doing
  • 00:46:43
    the analysis and we should definitely
  • 00:46:46
    remove all of these dirty data because
  • 00:46:48
    na does not any is no team only so
  • 00:46:52
    definitely that is a dirty data or it
  • 00:46:54
    might be a certain data point
  • 00:46:57
    which is kind of exception so might be
  • 00:46:59
    the result was not decided for four
  • 00:47:01
    matches
  • 00:47:02
    so that is why we are getting any and we
  • 00:47:04
    are reading four so
  • 00:47:06
    we these are kind of the data cleaning
  • 00:47:08
    points which you need to take into
  • 00:47:10
    account while we're doing the data
  • 00:47:11
    analysis but for now
  • 00:47:13
    this is a simple data exploration
  • 00:47:14
    project with the help of this simple sql
  • 00:47:16
    query so
  • 00:47:17
    that is the motive of this particular
  • 00:47:19
    live session
  • 00:47:20
    let us move to the
  • 00:47:21
    uh next question so top five venues
  • 00:47:25
    where match is played
  • 00:47:27
    okay so top five venues
  • 00:47:31
    here you can see when you is present
  • 00:47:34
    here so the same concept i will be using
  • 00:47:37
    i would
  • 00:47:39
    be
  • 00:47:39
    removing this winner i'll be
  • 00:47:42
    pasting this venue column
  • 00:47:48
    when new
  • 00:47:50
    and execute
  • 00:47:52
    so you can see here for each of the
  • 00:47:54
    stadiums i am getting the number of
  • 00:47:57
    uh matches which have been played into
  • 00:47:59
    each of the venue i am getting the
  • 00:48:00
    information
  • 00:48:02
    so friends uh this is pretty much clear
  • 00:48:05
    but here they wanted top five venues so
  • 00:48:08
    i into the ssms limit function
  • 00:48:11
    does not support so let me see if using
  • 00:48:14
    the top
  • 00:48:16
    this particular thing we are getting the
  • 00:48:18
    result or not okay we are getting the
  • 00:48:21
    result but yeah
  • 00:48:24
    group by
  • 00:48:25
    and order y
  • 00:48:27
    we need to do
  • 00:48:29
    because you can see i'm not getting the
  • 00:48:32
    top five result or the top five venues
  • 00:48:34
    were matched highest number of matches
  • 00:48:36
    were played
  • 00:48:38
    here we need to do into the descending
  • 00:48:41
    order here they're pointing into the
  • 00:48:43
    ascending order so friends into the eden
  • 00:48:46
    gardens you can see 77 matches highest
  • 00:48:49
    number of matches are being played into
  • 00:48:51
    the history of the ipl
  • 00:48:53
    so this is the data set just till 2020
  • 00:48:55
    so uh till 2020 data is present here so
  • 00:48:59
    you can see here indian gardens
  • 00:49:00
    photoshop kotlin all these different
  • 00:49:02
    stadiums the highest number of matches
  • 00:49:04
    they have been played till now
  • 00:49:06
    let us move to the next data point here
  • 00:49:10
    most runs by any batsman
  • 00:49:12
    this is also i think most run so this
  • 00:49:16
    particular sheet it just contains the
  • 00:49:18
    information of the you could say
  • 00:49:21
    uh the matches but for getting the runs
  • 00:49:23
    and
  • 00:49:24
    the batsman or the boiler information we
  • 00:49:26
    need to take into account the second
  • 00:49:28
    sheet the sheet which contains ball by
  • 00:49:30
    ball information so from this particular
  • 00:49:32
    sheet we'll be
  • 00:49:34
    getting the data of the most number of
  • 00:49:36
    fronts
  • 00:49:37
    which has been scored by
  • 00:49:39
    the batsman so let me just repeat
  • 00:49:42
    or look at the question once again so
  • 00:49:44
    most runs
  • 00:49:45
    by any batsman i need to see this
  • 00:49:49
    most runs by any batsman so
  • 00:49:52
    oh
  • 00:49:53
    batsman and onto the total runs column
  • 00:49:56
    i'll be doing a group by
  • 00:49:58
    now so
  • 00:49:59
    i'll do a select
  • 00:50:01
    or rather
  • 00:50:03
    oh first of all let's see select star
  • 00:50:05
    from
  • 00:50:07
    hashtag ipls
  • 00:50:10
    okay great uh from here i'll do
  • 00:50:14
    select bat
  • 00:50:17
    men comma count of
  • 00:50:22
    total
  • 00:50:25
    runs
  • 00:50:31
    from uh
  • 00:50:32
    i'll just provide this hashtag ibls
  • 00:50:38
    and group by
  • 00:50:41
    that's when i'll do
  • 00:50:45
    i'll execute the code
  • 00:50:48
    so you can see here i'm getting the
  • 00:50:49
    details of the total number of runs
  • 00:50:51
    which have been scored by each of the
  • 00:50:53
    batsman what is the result which we
  • 00:50:55
    wanted so we wanted most runs by any
  • 00:50:58
    batsman
  • 00:50:59
    so we need to
  • 00:51:01
    do the same thing which we have done
  • 00:51:03
    into the previous problem so i'll do a
  • 00:51:06
    simple order by
  • 00:51:08
    count of
  • 00:51:14
    total
  • 00:51:17
    runs into the descending order
  • 00:51:20
    and here i'll just mention top
  • 00:51:23
    one
  • 00:51:25
    i'll execute the code
  • 00:51:27
    and you can see vr kohli is the highest
  • 00:51:30
    run scorer into the the history of the
  • 00:51:32
    ipl
  • 00:51:33
    till 2020 we have the data set so
  • 00:51:36
    this is the answer
  • 00:51:38
    let's move to the next problem
  • 00:51:41
    okay let me see if any more questions
  • 00:51:43
    are there
  • 00:51:45
    okay
  • 00:51:47
    so percentage of total runs
  • 00:51:50
    scored by each batsman so
  • 00:51:54
    what does this problem it is saying so
  • 00:51:56
    we need to get the percentage of the
  • 00:51:58
    total runs scored by each of the
  • 00:52:00
    batteries so let's say
  • 00:52:02
    if there are five players
  • 00:52:05
    let's say the player ids are one two
  • 00:52:08
    three four and five
  • 00:52:10
    and the total number of runs which have
  • 00:52:12
    been scored into these three of the ipl
  • 00:52:14
    is hundred
  • 00:52:16
    so
  • 00:52:17
    these hundred runs definitely would have
  • 00:52:18
    been scored by these five players only
  • 00:52:20
    let's see if there are five players into
  • 00:52:22
    the ipl so i would just redistribute
  • 00:52:25
    this 100 runs here 20 30
  • 00:52:28
    ready 20 okay 10 and then this would be
  • 00:52:31
    25
  • 00:52:32
    25 so i guess from this particular
  • 00:52:35
    problem which they mentioned a
  • 00:52:37
    percentage of total runs scored by each
  • 00:52:39
    batsman it simply means they want a
  • 00:52:41
    percentage value of the runs scored by
  • 00:52:43
    the batsman divided by the total runs
  • 00:52:46
    which have been scored into the history
  • 00:52:47
    of the ipl so they basically want this
  • 00:52:50
    particular kind of number
  • 00:52:53
    and into the percentage terms
  • 00:52:56
    so this is the kind of
  • 00:52:59
    value they want
  • 00:53:01
    from this particular problem let us see
  • 00:53:03
    how to solve this
  • 00:53:06
    so basically we would require to get the
  • 00:53:08
    total runs which have been scored
  • 00:53:11
    into the
  • 00:53:13
    ipl and then we
  • 00:53:15
    need to divide this number by the runs
  • 00:53:17
    which have been scored by each of the
  • 00:53:18
    batsmen
  • 00:53:19
    to get the percentage value
  • 00:53:23
    let us see how to do this
  • 00:53:27
    so we had had a column of the total
  • 00:53:31
    number of runs scored by
  • 00:53:33
    each of the batsmen oh okay
  • 00:53:37
    i would just remove this top one batsman
  • 00:53:39
    count of runs
  • 00:53:41
    group by batsman and
  • 00:53:43
    pretty much fine
  • 00:53:46
    i'll just execute this statement
  • 00:53:49
    so you can see here
  • 00:53:52
    this is the information of the batsman
  • 00:53:54
    and the runs which have been scored by
  • 00:53:56
    the batsman
  • 00:53:59
    okay so
  • 00:54:00
    into
  • 00:54:02
    another column we need to somehow get
  • 00:54:04
    the total number of runs which have been
  • 00:54:05
    scored till date in ipl
  • 00:54:09
    there are many ways to do this but a
  • 00:54:11
    simple way would be using the window
  • 00:54:12
    function so i would do a sum over runs
  • 00:54:17
    and over
  • 00:54:24
    i would do this order by
  • 00:54:28
    runs
  • 00:54:29
    rows
  • 00:54:31
    between
  • 00:54:34
    unbounded
  • 00:54:37
    reseeding and
  • 00:54:39
    unbounded following now why i'm doing
  • 00:54:41
    this
  • 00:54:42
    you could check my
  • 00:54:44
    running some video to understand this
  • 00:54:46
    particular window function into lot of
  • 00:54:48
    more detail
  • 00:54:50
    i hope this would work so somehow runs
  • 00:54:52
    over order by runs rosebud unbounded
  • 00:54:55
    preceding and unbounded following
  • 00:55:01
    um okay let me simply run this code
  • 00:55:12
    count of total runs
  • 00:55:15
    hold v
  • 00:55:17
    we
  • 00:55:23
    just execute the code here
  • 00:55:32
    oh it is saying hashtag total runs is
  • 00:55:36
    invalid into the
  • 00:55:38
    a
  • 00:55:40
    i would need to remove this particular
  • 00:55:43
    column from here it cannot be used here
  • 00:55:45
    obviously
  • 00:55:47
    it should be made into a sub query and
  • 00:55:50
    then i can use this
  • 00:55:52
    so i'll just put this into a sub query
  • 00:55:58
    i'll do a select
  • 00:55:59
    star comma
  • 00:56:03
    paste this here
  • 00:56:30
    let us execute the code
  • 00:56:32
    so yes we are getting the count of the
  • 00:56:34
    total run so
  • 00:56:36
    is this the total number of runs which
  • 00:56:38
    have been scored
  • 00:56:39
    into the history of ipl
  • 00:56:41
    let us verify this
  • 00:56:45
    so i would just paste this code
  • 00:56:48
    execute the code
  • 00:56:50
    we must know like the answer which we
  • 00:56:52
    are getting is correct or not like the
  • 00:56:54
    total number of runs into the history of
  • 00:56:56
    ipl which have been scored is
  • 00:56:58
    which is coming right now it is correct
  • 00:57:00
    on or not so we should definitely check
  • 00:57:03
    this so i'll do a select
  • 00:57:06
    sum over
  • 00:57:10
    total runs
  • 00:57:13
    from
  • 00:57:15
    let's see if we are getting the correct
  • 00:57:17
    answer yes
  • 00:57:19
    the answer is pretty much correct
  • 00:57:20
    whichever we are getting one nine three
  • 00:57:22
    four six seven
  • 00:57:30
    okay
  • 00:57:38
    okay
  • 00:57:39
    this should be sum of total runs
  • 00:57:44
    i was just doing wrong
  • 00:57:47
    from before i'm so sorry for that so
  • 00:57:52
    now this is pretty much correct i'll
  • 00:57:54
    just execute the code
  • 00:57:58
    so
  • 00:58:00
    i've done wrong in from the previous
  • 00:58:02
    into the previous problem also so that
  • 00:58:04
    should be count
  • 00:58:06
    that should not be counted should be sum
  • 00:58:08
    of total runs
  • 00:58:11
    this is the correct answer
  • 00:58:13
    what we are getting right now
  • 00:58:16
    okay so
  • 00:58:20
    batsman and total runs we have got
  • 00:58:23
    and and now simply it is very much
  • 00:58:25
    simple we
  • 00:58:28
    just need to
  • 00:58:30
    divide the total runs
  • 00:58:32
    column with the runs
  • 00:58:34
    which are getting so i would just total
  • 00:58:38
    runs
  • 00:58:44
    would do this
  • 00:58:49
    let us execute the code
  • 00:58:56
    so you can see here i am getting
  • 00:58:59
    into decimal points
  • 00:59:14
    and you can shorten this number which
  • 00:59:16
    you are getting to certain number of
  • 00:59:18
    decimal points obviously
  • 00:59:20
    but the pretty much the concept uh
  • 00:59:23
    remains the same so
  • 00:59:25
    this is how we can solve or get the
  • 00:59:27
    percentage runs which have been scored
  • 00:59:29
    by each of the player next one is most
  • 00:59:33
    success by any batsman so if you would
  • 00:59:35
    look closely into this particular data
  • 00:59:37
    set
  • 00:59:38
    uh
  • 00:59:39
    we have a column here batsman runs so
  • 00:59:42
    the number of runs which was scored into
  • 00:59:44
    this particular ball by this particular
  • 00:59:45
    batsman it is mentioned here so wherever
  • 00:59:47
    six is coming that simply means that a
  • 00:59:49
    six was scored into that particular ball
  • 00:59:52
    so we just need to count the number of
  • 00:59:54
    times and this particular
  • 00:59:56
    uh
  • 00:59:57
    six is coming corresponding to a batsman
  • 00:59:59
    and that would
  • 01:00:01
    give us the total number of sixes which
  • 01:00:04
    uh
  • 01:00:05
    or the most success hits by hit by any
  • 01:00:08
    batsman so this is how we need to do
  • 01:00:11
    this so i would simply
  • 01:00:14
    okay before that let me check the chat
  • 01:00:20
    uh so hi
  • 01:00:21
    please attach the link of the data sets
  • 01:00:23
    and the questions which
  • 01:00:25
    you solved so
  • 01:00:27
    i have attached the link into the chat
  • 01:00:30
    box here you can see here if you scroll
  • 01:00:32
    up into the chat box so i've attached
  • 01:00:34
    the link also you know i would just
  • 01:00:37
    mention all the links into my into the
  • 01:00:40
    description box of this video
  • 01:00:42
    after ending this particular live stream
  • 01:00:44
    so i'll just mention all the links of
  • 01:00:45
    the
  • 01:00:46
    uh data sets which i'm using currently
  • 01:00:50
    and also the questions so these are the
  • 01:00:52
    questions which i'll be solving so i'll
  • 01:00:54
    be taking multiple live streams so
  • 01:00:57
    really solving all of these different
  • 01:00:59
    problems with the live stream only so
  • 01:01:01
    you can just take a screenshot of this
  • 01:01:03
    particular screen and then
  • 01:01:05
    solve it beforehand or
  • 01:01:08
    how you would like so you could just do
  • 01:01:11
    it
  • 01:01:11
    as per your comfort level so
  • 01:01:14
    the
  • 01:01:15
    question which we are solving right now
  • 01:01:17
    is the most successful batsman so i've
  • 01:01:19
    explained the concept which i'll be
  • 01:01:21
    using here so
  • 01:01:23
    let us run the
  • 01:01:26
    statement so i'll just write a select
  • 01:01:28
    star from hashtag
  • 01:01:32
    pls
  • 01:01:34
    let's see the output which is coming out
  • 01:01:39
    and we need to count the
  • 01:01:41
    number of times
  • 01:01:43
    sixes so
  • 01:01:45
    each row contains information for each
  • 01:01:48
    ball
  • 01:01:49
    which was played so whenever you are
  • 01:01:52
    seeing the batsman runs is equal to six
  • 01:01:55
    that means a six was code
  • 01:01:57
    so first of all let us just filter out
  • 01:01:59
    the data for
  • 01:02:01
    the balls on which the six was good so i
  • 01:02:03
    just write a select star from
  • 01:02:07
    ipls where
  • 01:02:11
    batsman
  • 01:02:15
    runs is equal to 6
  • 01:02:18
    so
  • 01:02:18
    let us execute the code here
  • 01:02:21
    so you can see i'm just getting
  • 01:02:22
    information of the balls where a 6 was
  • 01:02:25
    called now it's pretty much simple from
  • 01:02:26
    here so
  • 01:02:27
    i would present this into a sub query
  • 01:02:30
    and after this i would write a select
  • 01:02:34
    bats
  • 01:02:35
    men comma count of
  • 01:02:39
    let's say batsman only combats
  • 01:02:43
    men
  • 01:02:44
    from
  • 01:02:46
    this and at the last i'll be grouping by
  • 01:02:50
    on to the basis of
  • 01:02:52
    the batsman so how many times this
  • 01:02:54
    batsman is coming into the batsman
  • 01:02:56
    column so those many number of sixes
  • 01:02:58
    must have been scored by the batsman so
  • 01:03:00
    you can see here uh matthew 8 and 44
  • 01:03:03
    sixes
  • 01:03:05
    uh similarly for all the batsman which
  • 01:03:07
    we are getting here right now so what is
  • 01:03:08
    the problem which they wanted us to
  • 01:03:10
    solve so they wanted us to solve the
  • 01:03:12
    most success so after this i'll be
  • 01:03:15
    ordering by on to the basis of
  • 01:03:18
    the
  • 01:03:19
    count of batsmen
  • 01:03:21
    and this should be
  • 01:03:23
    into the descending order and at the
  • 01:03:26
    above i'll be using the top functions of
  • 01:03:28
    top one batsman
  • 01:03:31
    i'll again execute the code here
  • 01:03:34
    so you can see chris gayle has scored
  • 01:03:36
    349 sixes
  • 01:03:38
    similarly the next question is very much
  • 01:03:41
    similar
  • 01:03:44
    most force which have been scored now
  • 01:03:47
    you can just argue like
  • 01:03:49
    six was okay but
  • 01:03:51
    there might be a case where four runs
  • 01:03:53
    have
  • 01:03:54
    come from a particular ball and the
  • 01:03:57
    let's say
  • 01:03:58
    that was not a boundary that was this
  • 01:04:01
    externs which were accommodated into the
  • 01:04:04
    particular
  • 01:04:07
    ball so you can see here
  • 01:04:09
    i have a column which is the non
  • 01:04:12
    boundary and here
  • 01:04:13
    the runs are present here
  • 01:04:16
    but we are not getting a pretty much
  • 01:04:17
    deep information
  • 01:04:19
    on to whether a 4 was like 4 which was
  • 01:04:23
    mentioned here is a 4 which has been
  • 01:04:25
    scored
  • 01:04:26
    into that particular ball or not so
  • 01:04:29
    we should take into account the batsman
  • 01:04:32
    runs
  • 01:04:34
    and whenever the batsman runs is equal
  • 01:04:36
    to 4 that means that a 4 was scored from
  • 01:04:39
    the batsman the extra runs column or the
  • 01:04:41
    total runs columns
  • 01:04:43
    we need not to take into account we
  • 01:04:45
    should only be concerned for for the
  • 01:04:47
    batsman runs columns
  • 01:04:49
    i hope the difference is pretty much
  • 01:04:50
    clear
  • 01:04:51
    into all of these three columns like
  • 01:04:53
    what is the main difference
  • 01:04:55
    i hope that is pretty much clear so we
  • 01:04:56
    are only concerned with the batsman run
  • 01:04:59
    so i would take this into account
  • 01:05:02
    and instead of
  • 01:05:04
    six i would mention here four
  • 01:05:07
    and similar information will be getting
  • 01:05:09
    for the highest number of fours which
  • 01:05:10
    have been scored by any player so you
  • 01:05:12
    can see
  • 01:05:13
    cigarette the one has code 591 force so
  • 01:05:16
    this is pretty much information which we
  • 01:05:18
    wanted
  • 01:05:20
    moving ahead onto the next problem and
  • 01:05:23
    it's pretty much interesting the problem
  • 01:05:24
    says 3 000 runs
  • 01:05:26
    club
  • 01:05:28
    the highest strike rate so this
  • 01:05:30
    particular problem it mainly says the
  • 01:05:32
    players who scored more than 3 3000 runs
  • 01:05:36
    who is having the highest strike rate
  • 01:05:39
    among all of these players
  • 01:05:42
    so we'll be solving this particular
  • 01:05:43
    problem let us see how to solve this but
  • 01:05:45
    before that
  • 01:05:46
    let me see if the chart is there okay no
  • 01:05:50
    so three thousand runs so first of all
  • 01:05:52
    we are only concerned with the players
  • 01:05:53
    who have scored
  • 01:05:55
    three thousand runs along with that if
  • 01:05:57
    we somehow get the number of balls which
  • 01:05:59
    they have faced
  • 01:06:01
    to score the three thousand runs
  • 01:06:05
    i think our problem is pretty much clear
  • 01:06:07
    then
  • 01:06:08
    so first of all let us just
  • 01:06:11
    filter out or get the
  • 01:06:14
    total number of runs which have been
  • 01:06:15
    scored by the players so i would just do
  • 01:06:18
    this
  • 01:06:19
    my pls and group by batsman
  • 01:06:26
    instead of total runs i guess we should
  • 01:06:29
    use
  • 01:06:30
    the batsman runs
  • 01:06:38
    yes
  • 01:06:39
    after executing you can see
  • 01:06:42
    i'm getting the
  • 01:06:44
    batsman run so
  • 01:06:47
    also if i could get the total number of
  • 01:06:50
    balls which they have faced
  • 01:06:52
    so if you would look closely into
  • 01:06:55
    this particular sheet
  • 01:06:57
    each of the row that contains
  • 01:07:00
    information for a particular ball
  • 01:07:02
    so the number of times the batsman name
  • 01:07:04
    is coming into this particular column
  • 01:07:06
    that is column e that means those number
  • 01:07:08
    of balls they have face so we could just
  • 01:07:10
    simply count the number of times the
  • 01:07:12
    batsman name is coming into the batsman
  • 01:07:13
    column
  • 01:07:14
    so that is simply means that fee will be
  • 01:07:17
    getting the total number of balls which
  • 01:07:18
    have been faced by the batsman
  • 01:07:21
    i would mention this into the group by
  • 01:07:24
    clause also
  • 01:07:26
    that okay no need why we need to mention
  • 01:07:29
    this
  • 01:07:30
    i just execute this statement
  • 01:07:39
    okay count of batsman this is the total
  • 01:07:42
    balls
  • 01:07:45
    ctrl x ctrl v
  • 01:07:50
    and this is the total balls
  • 01:07:53
    faced
  • 01:07:57
    [Music]
  • 01:07:59
    mention this
  • 01:08:01
    execute the command once again and
  • 01:08:05
    you can see we are getting the result
  • 01:08:07
    now i'll be using this into a sub query
  • 01:08:10
    and from this particular sub query i'll
  • 01:08:13
    be getting
  • 01:08:14
    the strike rate so i'll just do a select
  • 01:08:16
    bad spin so strike rate is basically
  • 01:08:18
    total number of runs divided by the
  • 01:08:20
    total number of balls which was faced so
  • 01:08:24
    batsman
  • 01:08:26
    runs divided by the total
  • 01:08:28
    [Music]
  • 01:08:32
    balls which have been faced and i would
  • 01:08:34
    rename this column as the
  • 01:08:36
    strike rate column prompt
  • 01:08:41
    okay let us execute the command
  • 01:08:46
    i should
  • 01:08:50
    oh okay so you can see you have zeros we
  • 01:08:53
    are getting zeros also
  • 01:08:55
    i think to remove these zeros i need to
  • 01:08:58
    multiply either the denominator or the
  • 01:09:00
    numerator
  • 01:09:02
    with one point oh
  • 01:09:06
    so that we can get into the decimal
  • 01:09:08
    numbers
  • 01:09:10
    i'll put this into bracket and also i
  • 01:09:12
    need to multiply this with 100
  • 01:09:16
    i'll execute the code so you can see
  • 01:09:18
    here zero is still coming
  • 01:09:21
    no worries but you can see here i'm
  • 01:09:23
    getting the strike rate for the batsman
  • 01:09:26
    now after this i need to
  • 01:09:31
    find the
  • 01:09:32
    player who has scored greater than 3 000
  • 01:09:36
    runs with the highest strike rate
  • 01:09:38
    as you can see here 3 000 runs club with
  • 01:09:41
    the highest strike rate we need to solve
  • 01:09:43
    a lot of many different problems
  • 01:09:46
    but yeah uh so player who has scored the
  • 01:09:51
    greater than 3 000 runs
  • 01:09:54
    okay let me get this column also
  • 01:09:58
    we'll control c
  • 01:10:02
    all we
  • 01:10:06
    execute the code
  • 01:10:17
    okay great
  • 01:10:19
    so i'll do here select
  • 01:10:29
    batsman
  • 01:10:33
    runs
  • 01:10:36
    comma strike
  • 01:10:40
    rate from
  • 01:10:45
    where
  • 01:10:48
    batsman runs should be greater than
  • 01:10:51
    equal to 3 000
  • 01:10:54
    okay
  • 01:10:58
    where it should be where
  • 01:11:05
    so you can see here we are getting all
  • 01:11:07
    those batsman information who has scored
  • 01:11:10
    greater than 3 000 runs now who is the
  • 01:11:12
    player who is having the highest strike
  • 01:11:14
    rate along with 3000 runs we need to get
  • 01:11:17
    this information
  • 01:11:19
    so after this we need to order by order
  • 01:11:23
    by under the basis of what order right
  • 01:11:25
    onto the basis of the strike rate
  • 01:11:29
    and this should be into the descending
  • 01:11:31
    order
  • 01:11:33
    and from this i need to get the top
  • 01:11:36
    one batsman
  • 01:11:38
    i hope the logic is pretty much clear
  • 01:11:41
    so you can see abwarz is the batsman
  • 01:11:44
    with the highest strike rate into the
  • 01:11:46
    history of the ipl with the players who
  • 01:11:48
    have scored greater than 3 000 runs so
  • 01:11:51
    this is the information which we wanted
  • 01:11:53
    and we have got here
  • 01:11:55
    so this was a pretty much interesting
  • 01:11:56
    problem i guess the number 12 problem
  • 01:12:00
    was pretty much interesting to solve
  • 01:12:02
    similarly earlier we have done for
  • 01:12:05
    uh batsman who scored greater than 3 000
  • 01:12:08
    runs now
  • 01:12:09
    we are
  • 01:12:12
    concerned with the ballers with the
  • 01:12:14
    lowest
  • 01:12:15
    economy rate for
  • 01:12:17
    the bollowers world at least 50 overs
  • 01:12:21
    so we need to solve this so
  • 01:12:23
    this is pretty much like the strike rate
  • 01:12:25
    economy rate this are pretty much the
  • 01:12:27
    domain knowledge obviously like you must
  • 01:12:30
    pretty much know what does these terms
  • 01:12:32
    mean but
  • 01:12:33
    you could find the definitions onto
  • 01:12:35
    google also
  • 01:12:36
    let's say if you don't know anything xyz
  • 01:12:39
    of cricket so you could simply google
  • 01:12:42
    these terms and you can find the
  • 01:12:43
    definition the mathematical definitions
  • 01:12:45
    of these particular terms
  • 01:12:48
    so economic it could be simply means
  • 01:12:50
    like the total number of runs which was
  • 01:12:52
    considered by the bowlers by the total
  • 01:12:55
    number of
  • 01:12:56
    balls that has been bowled by the
  • 01:12:58
    ballers so we could just divide both of
  • 01:13:00
    these numbers and we'll be getting the
  • 01:13:02
    economy rate
  • 01:13:04
    so let us see let us see how to solve
  • 01:13:06
    this so
  • 01:13:08
    we'll be getting the lowest economy rate
  • 01:13:10
    i think this
  • 01:13:12
    would
  • 01:13:13
    okay let's see uh lowest economy rate
  • 01:13:16
    for the boulder was bold at least 50
  • 01:13:18
    hours
  • 01:13:22
    how could we know uh the information of
  • 01:13:24
    the
  • 01:13:25
    ballers who have bowled at least 50
  • 01:13:28
    hours
  • 01:13:29
    so we can know this information
  • 01:13:34
    by counting the number of times the
  • 01:13:37
    bowler name is coming into this
  • 01:13:38
    particular column and we know into over
  • 01:13:42
    uh
  • 01:13:43
    six
  • 01:13:44
    balls are there and
  • 01:13:47
    we are only concerned with the bowlers
  • 01:13:48
    who have bowled at least 50 over so 6
  • 01:13:51
    into 50 that means 300 balls a minimum
  • 01:13:54
    of 6 300 balls should have been bowed by
  • 01:13:57
    the bowlers here which
  • 01:13:59
    we are only concerned with so
  • 01:14:01
    first of all let us get this information
  • 01:14:03
    so i'll just
  • 01:14:05
    select
  • 01:14:06
    i think first of all i would do a select
  • 01:14:09
    star from
  • 01:14:11
    simply forget the columns which we have
  • 01:14:14
    so i'll just do a simple select star
  • 01:14:16
    from
  • 01:14:17
    stag ipls
  • 01:14:19
    now this particular
  • 01:14:21
    hashtag ipls you
  • 01:14:23
    you can see i've just created a
  • 01:14:24
    temporary table to accommodate all of
  • 01:14:27
    these data sets oh okay so you can see
  • 01:14:30
    here i have the
  • 01:14:32
    bowler names are just right here bowler
  • 01:14:34
    comma count of well like the number of
  • 01:14:36
    times the boiler name is coming here so
  • 01:14:39
    this means that
  • 01:14:41
    those many balls have been bowled by
  • 01:14:44
    these bullets
  • 01:14:45
    and at the last i'll do a select from
  • 01:14:49
    hashtag ipls
  • 01:14:52
    and at the last
  • 01:14:54
    i'll be obviously grouping by
  • 01:14:57
    on to the basis of
  • 01:14:59
    the bowler column so that i could just
  • 01:15:01
    get the count of the total number of
  • 01:15:02
    bowls that have been bowled and the
  • 01:15:05
    number of fronts which have been
  • 01:15:06
    considered so
  • 01:15:08
    this bowler for example sr watson the
  • 01:15:11
    total number of fronts which was
  • 01:15:13
    considered into the fourth ball of the
  • 01:15:14
    14th over
  • 01:15:16
    so that simply means that uh total run
  • 01:15:18
    so you can see here is zero so there is
  • 01:15:20
    nothing like the runs which was
  • 01:15:22
    considered by the bowler so i would just
  • 01:15:24
    simply take into account the total runs
  • 01:15:27
    that that is pretty much simple so that
  • 01:15:29
    i would be doing right now
  • 01:15:31
    so i'll be using this total runs column
  • 01:15:33
    so i would do a sum of
  • 01:15:37
    total runs that's it
  • 01:15:40
    let us execute the code
  • 01:15:43
    so you can see here the baller name
  • 01:15:46
    and this is the total
  • 01:15:50
    balls and this is total
  • 01:15:54
    runs conceded
  • 01:16:00
    if i would just simply run the code
  • 01:16:02
    again
  • 01:16:04
    total balls bold and the total
  • 01:16:07
    runs which have been considered so you
  • 01:16:09
    can find you can see we can get this
  • 01:16:12
    particular information
  • 01:16:14
    now we're only concerned with the
  • 01:16:15
    bowlers who are bold at least 50 hours
  • 01:16:17
    and we just discussed right now that in
  • 01:16:19
    over there are six balls present so 50
  • 01:16:22
    over means that 300 balls a minimum of
  • 01:16:24
    300 balls should be
  • 01:16:25
    bowed by uh should have been balled by
  • 01:16:28
    the ballers so we'll be filtering out
  • 01:16:30
    the data for this so i'll just present
  • 01:16:32
    this into a sub query and
  • 01:16:35
    i'll do
  • 01:16:36
    a simple select
  • 01:16:40
    and i'll do here
  • 01:16:42
    like bowler comma
  • 01:16:47
    we need to get the economy date so
  • 01:16:48
    [Music]
  • 01:16:50
    i pretty much assume that the formula is
  • 01:16:52
    the total runs
  • 01:16:54
    which have been conceded
  • 01:16:59
    and
  • 01:17:01
    my guess this should be
  • 01:17:04
    divided by the
  • 01:17:06
    total
  • 01:17:10
    balls that have been bowled and you
  • 01:17:13
    could pretty much convert the total
  • 01:17:15
    number of balls which have been bowled
  • 01:17:17
    into uh
  • 01:17:19
    into in terms of over but i guess the
  • 01:17:22
    answer that would remain pretty much the
  • 01:17:24
    same
  • 01:17:29
    economy rate
  • 01:17:32
    from
  • 01:17:35
    this
  • 01:17:36
    this execute the code
  • 01:17:39
    so you can see here i'm getting the
  • 01:17:40
    economy date
  • 01:17:42
    but here i need to mention here where
  • 01:17:47
    total balls
  • 01:17:51
    this should be greater than
  • 01:17:54
    what so this should be greater than
  • 01:17:58
    299 because
  • 01:18:00
    okay 300 balls
  • 01:18:02
    greater than 50 hours at least 50 hours
  • 01:18:05
    should have been bald as you can see it
  • 01:18:06
    was mentioned there
  • 01:18:08
    so you can see here we are getting the
  • 01:18:10
    economy rates
  • 01:18:12
    and here i'll be ordering by so order by
  • 01:18:16
    the
  • 01:18:17
    what
  • 01:18:18
    order by the economy rate
  • 01:18:24
    definitely into the ascending order
  • 01:18:25
    because for bowlers we want
  • 01:18:28
    that they concede as many as less runs
  • 01:18:32
    possible
  • 01:18:33
    so
  • 01:18:34
    the best baller is the boiler who has
  • 01:18:35
    considered the less lesser and possible
  • 01:18:38
    due to the more number of balls that the
  • 01:18:41
    person has boiled or the bowler has bolt
  • 01:18:43
    so this particular the economy uh
  • 01:18:45
    particular column this should be as less
  • 01:18:47
    as possible
  • 01:18:48
    so i would also put this here top one
  • 01:18:51
    now because why did i tell this because
  • 01:18:54
    uh i would not write here descending
  • 01:18:57
    order that simply means so i'll just
  • 01:18:59
    simply execute the code so you can see
  • 01:19:00
    rashid khan is the economy rate is the
  • 01:19:03
    best among the bowlers who have bowled
  • 01:19:05
    at least 50 hours now why is the economy
  • 01:19:08
    rate i'm getting because i've just
  • 01:19:10
    divided this number with the total balls
  • 01:19:12
    you could just convert the balls into
  • 01:19:14
    the overs and you can get the particular
  • 01:19:17
    answer but the answer it will remain the
  • 01:19:19
    same i think i guess yeah i think it
  • 01:19:22
    will remain exactly the same
  • 01:19:24
    so this is how we solve this particular
  • 01:19:26
    problem
  • 01:19:30
    okay let me
  • 01:19:33
    move forward to the
  • 01:19:35
    problem ahead so total number of matches
  • 01:19:37
    played till 2020
  • 01:19:41
    this is pretty much straightforward we
  • 01:19:43
    just simply need to count the distinct
  • 01:19:45
    ids which are present into any of the
  • 01:19:47
    sheet
  • 01:19:48
    or not any of the sheet i would
  • 01:19:51
    take into account select
  • 01:19:53
    count of
  • 01:19:56
    distinct
  • 01:19:58
    id
  • 01:20:00
    from
  • 01:20:03
    and i will take into account this
  • 01:20:05
    particular sheet
  • 01:20:09
    tutorial
  • 01:20:10
    dot
  • 01:20:20
    let's execute the code
  • 01:20:22
    so you can see here we are getting the
  • 01:20:24
    816 matches that have been played into
  • 01:20:27
    the
  • 01:20:27
    till 2020.
  • 01:20:30
    the next question let us solve so this
  • 01:20:33
    question will be the last question of
  • 01:20:35
    this particular live session i will be
  • 01:20:37
    conducting a multiple live session after
  • 01:20:39
    this
  • 01:20:40
    uh
  • 01:20:41
    and i'll be announcing it primarily
  • 01:20:43
    from before
  • 01:20:44
    total number of matches win by each team
  • 01:20:48
    so this is also pretty much simple
  • 01:20:50
    so which of the column will be using
  • 01:20:52
    this i guess we have solved
  • 01:20:55
    this particular problem okay no worries
  • 01:20:58
    i'll be using this particular table
  • 01:21:01
    itself and to get the information of the
  • 01:21:07
    team who has won the
  • 01:21:10
    total number of matches wins by each
  • 01:21:12
    team okay so here we
  • 01:21:15
    want to get the information of the
  • 01:21:18
    matches which have been won by each of
  • 01:21:22
    the team
  • 01:21:24
    we need not to get information of the
  • 01:21:25
    total number of matches which have been
  • 01:21:27
    played but the matches which have been
  • 01:21:28
    won
  • 01:21:30
    so
  • 01:21:32
    we'll be using this winner column itself
  • 01:21:35
    and we'll be doing
  • 01:21:37
    so i'll do a select
  • 01:21:41
    winner
  • 01:21:42
    comma count of
  • 01:21:48
    winner
  • 01:21:52
    group by
  • 01:21:55
    winner
  • 01:21:57
    so i'll execute this and you can see we
  • 01:21:59
    are getting the information of the total
  • 01:22:01
    number of matches which have been won by
  • 01:22:04
    each of the team so friends this is
  • 01:22:06
    pretty much the 15 questions which we
  • 01:22:08
    have solved uh into the live session for
  • 01:22:11
    this particular live sessions i'll be
  • 01:22:13
    solving so
  • 01:22:14
    the next problems are kind of very much
  • 01:22:16
    interesting some of the problems are
  • 01:22:18
    very much interesting and we'll be
  • 01:22:20
    solving this using some of the same
  • 01:22:22
    advanced level sql functions which we
  • 01:22:25
    have learned so
  • 01:22:26
    uh some of the major questions which
  • 01:22:28
    i've been getting is what is the best
  • 01:22:29
    way to learn sql or from where to learn
  • 01:22:31
    sql so you could just simply watch my
  • 01:22:34
    zero to one add one sql course and also
  • 01:22:37
    the advanced sequel playlist and also
  • 01:22:39
    the sequel interview questions which i
  • 01:22:42
    keep posting every now and then so
  • 01:22:44
    pretty much you can go
  • 01:22:45
    ahead practice from all of these data
  • 01:22:48
    sets all of these problems which i post
  • 01:22:50
    every now and then and i think that
  • 01:22:52
    would be pretty much enough
  • 01:22:54
    also practice problems under the hacker
  • 01:22:56
    rank lead code
  • 01:22:58
    and such kind of different sessions i
  • 01:23:00
    think it would help you a lot and you
  • 01:23:02
    can ask of problems you can just comment
  • 01:23:05
    down your doubts whichever you are
  • 01:23:06
    having or you have any alternate
  • 01:23:08
    solutions for any problems you can just
  • 01:23:11
    mention that into the comment box that
  • 01:23:13
    can help me as well as all the other
  • 01:23:15
    participants who are seeing this
  • 01:23:16
    particular live session or will be
  • 01:23:18
    watching
  • 01:23:19
    this video ahead so that would pretty
  • 01:23:21
    much help all of them so we'll be
  • 01:23:23
    solving all these different problems so
  • 01:23:25
    there are a lot of different problems we
  • 01:23:26
    could just simply solve 15
  • 01:23:29
    problems into this particular live
  • 01:23:31
    session i'll be solving try to solve
  • 01:23:34
    i'll be trying to solve much more
  • 01:23:36
    different problems i had into the future
  • 01:23:38
    live sessions uh so that we just don't
  • 01:23:40
    waste time
  • 01:23:42
    and i could accommodate
  • 01:23:44
    so my plan was to solve a particular
  • 01:23:47
    column of problems but i guess
  • 01:23:50
    we could just simply solve 15 problems
  • 01:23:53
    not even half but no worries i'll be
  • 01:23:55
    keeping uh this live session every now
  • 01:23:57
    and then and then i'll be informing you
  • 01:23:59
    guys
  • 01:24:00
    you can just pretty much participate
  • 01:24:02
    just flow along with me or you can take
  • 01:24:04
    a screenshot of this particular screen
  • 01:24:06
    you can just practice from your end also
  • 01:24:07
    and that would pretty much work for
  • 01:24:09
    everybody so friends this was all
  • 01:24:13
    about this particular live session
Tags
  • IPL
  • data analysis
  • SQL
  • matches
  • statistics
  • player performance
  • data sets
  • ball by ball
  • rank function
  • community tab