PlayerUnknown’s Battlegrounds is an online multiplayer battle royale game developed by the PUBG Corporation. PUBG drops 100 players without any equipment onto a island and must explore, scavenge, and kill other players until only 1 remains, all while the play area gradually shrinks and artillery rains down.
PUBG has sold over 50 million copies, and averages approximately 500,000-800,000 players a month.
We’ll be utilising data collected by Kaggle via the PUBG Developer API. The dataset comprises of 65,000 games worth of anonymised player data, split into training and tests sets. For the purposes of this exploratory analysis we’re only going to be looking at the training data set.
The training set comes in the form of a .CSV file. This file contains 113,290,736 individual data points in 26 columns and 4,357,336 rows.
We will be using a variety of Python based libraries and programs to help us process, visualise, and manipulate our data.
- NumPy – adds support for multi-dimensional arrays and matrices.
- Pandas – data structure and analysis tool
- Matplotlib – plotting library
- Seaborn – statistical data visualisation
The data set is comprised of 26 data columns. These columns are:
- groupId – Integer ID to identify a group within a match. If the same group of players plays in different matches, they will have a different groupId each time.
- matchId – Integer ID to identify match. There are no matches that are in both the training and testing set.
- assists – Number of enemy players this player damaged that were killed by teammates.
- boosts – Number of boost items used.
- damageDealt – Total damage dealt. Note: Self inflicted damage is subtracted.
- DBNOs – Number of enemy players knocked.
- headshotKills – Number of enemy players killed with headshots.
- heals – Number of healing items used.
- killPlace – Ranking in match of number of enemy players killed.
- killPoints – Kills-based external ranking of player. (Think of this as an Elo ranking where only kills matter.)
- kills – Number of enemy players killed.
- killStreaks – Max number of enemy players killed in a short amount of time.
- longestKill – Longest distance between player and player killed at time of death. This may be misleading, as downing a – player and driving away may lead to a large longestKill stat.
- maxPlace – Worst placement we have data for in the match. This may not match with numGroups, as sometimes the data skips over placements.
- numGroups – Number of groups we have data for in the match.
- revives – Number of times this player revived teammates.
- rideDistance – Total distance traveled in vehicles measured in meters.
- roadKills – Number of kills while in a vehicle.
- swimDistance – Total distance traveled by swimming measured in meters.
- teamKills – Number of times this player killed a teammate.
- vehicleDestroys – Number of vehicles destroyed.
- walkDistance – Total distance traveled on foot measured in meters.
- weaponsAcquired – Number of weapons picked up.
- winPoints – Win-based external ranking of player. (Think of this as an Elo ranking where only winning matters.)
- winPlacePerc – The target of prediction. This is a percentile winning placement, where 1 corresponds to 1st place, and 0 corresponds to last place in the match. It is calculated off of maxPlace, not numGroups, so it is possible to have missing chunks in a match.
Since the main component of PUBG is killing other players, we’ll start there. Using Matplotlib we can plot the kill counts on a simple bar chart. We’ll cut the data off past 8+ kills for the sake of simplicity.
According to the data, most players don’t get any kills.
The average person kills 0.9345 players.
The scatter plot below indicates there is a correlation between killing and winning (who would’ve guessed).This scatter plot also gives us an indication as to some foul play or miracles taking place. Some players have achieved 40+ kills, which is surprising considering there are at most 100 players on an 8x8km map. Although this isn’t impossible since there are locations such as school and military base which attract large amounts of attention.
Since vehicles are sparse, most players find themselves running to escape the ‘blue zone’ and other players.
The average player walks/runs for 1055.1m
The scatter plot below indicates there is a high correlation between running and winning percentage. This isn’t a revelation though since the longer you’re alive, the more you run.
Although vehicles are sparse, they can be a very useful tool for outrunning the ‘blue zone’ and getting around.
The average player drives for 423.9m
22.7940% of players haven’t driven at all
The scatter plot below indicates there is a small correlation between driving and winning.
Healing and boosts
Players can heal using bandages and first aid kits. They also have the option of boosts such as painkillers and energy drinks. These boosts provide different between such as health regeneration and increased movement speed.
The average player uses 1.2 heal items.
The average player uses 1.0 boost items.
Which has a bigger impact on your chances of winning though? According to the line graph below, boosts matter more than heals.
Both healing and boosts have a high correlation with winning, however boosts matter more. The scatter plots below support this (click to expand).
PUBG offers 3 game modes: solo, duo (2 players), and squad (3 players).
12.93% of games were solo
70.46% of games were duo
16.61% of games were in squads
The line graph below indicates that squads get more kills however those kills don’t always translate into victories. Solos and duos are more effective.
Below is a heatmap indicating all negative and positive correlating variables with the target. The heatmap shows us that there are a number of positive correlating factors that count towards a victory. These are: