Telemetry – mission completion time example

Telemetry data, which are units of data (hardwired/gameplay metrics) embedded into the game via code that are then transmitted to a collection server. This data can help answer questions about player behavior and how players are experiencing games on a level that can be difficult or impossible to obtain through direct observation. Example behaviors caught by telemetry in an action game, for example, can include units such as player deaths, number of enemies killed, abilities used, weapon/ammo pick ups, or mission completion time (as in the following example). One primary use of telemetry data, which can be collected during a playtest or during another period of play (e.g., closed/open alpha or beta, as well as post-launch), is for post-playtest analysis. In the following example, a developer was curious how long it took players to finish the missions they played. This dev had an ideal length in mind for each mission and this data would inform next steps to help reach that vision.

The figure shows the data split by each player (x axis, n = 8) and their completion time per mission (y axis, in minutes).

Following more iteration and playtesting, additional players can be added to these analyses to increase the sample size and provide an average completion time per mission. Averages are not super useful with small sample sizes (here n = 8) since outliers can easily pull the average in a certain direction, skewing the results. And you can see from the graph that the range of time it took playtesters is quite varied within missions.

Furthermore, data can be collected during longer periods of play, such as an alpha or beta period, with averages calculated throughout as well how the average mission completion time may have fluctuated from one play period to subsequent ones (e.g., pre-alpha playtests –> closed alpha –> closed beta –> open beta). This will give devs a better representation of the execution of the mission design and how iteration has helped manipulate its length on a larger scale.

This is one example of how telemetry data collection during playtests can be useful. Specific queries can be generated to pull the relevant data from the server and then visualized for consumption by the dev team following the playtest. In turn, this data can help inform and drive next steps for designing specific game elements to ensure players’ experience is as close to the intention as possible.

Time spent playing different game modes

Company X wants to know how long players are spending playing different modes of their new third-person shooter game. The game has four modes: single-player campaign, online multiplayer, and local and online horde modes. An hour before Company X’s weekly update meeting, a researcher is asked to show data representing the time spent (in hours) playing these game modes during the game’s first week following launch from a small random sample of players. A time-spent analysis is important and of interest because it demonstrates what activities/game modes players are engaging in and for how long, which can help guide game development during production as well as post-launch to ultimately see if the game is matching design intention.

The researcher was not worried about generating such data with little prior notice, because Company X collects an exuberant amount of data via telemetry. The figure below was presented along with the statistical analyses of the data, which were paired-samples tests because of the within-subjects design (each group consisted of the same players).

Furthermore, the researcher calculated confidence intervals (the I-shaped bars on the graphs), which are ranges of scores constructed such that the true population will fall within its range in 95% of samples, for each group. The true mean would simply be the average time spent in each of the game modes of every player to play the game during its first week (as opposed to just the small random sample used in this example). Since the researcher doesn’t know the true mean of the entire population of players, he/she doesn’t know if the sample values (means) here are a good or bad estimate of this value. So, rather than fixating on these four means in the sample, the researcher could use an interval estimate instead, utilizing the sample means here as the midpoint, as well as setting a lower and upper limit.

Essentially, the researcher calculated upper and lower limits for each game mode in the sample. Again, since this is only a small sample of the population of the players of Company X’s new game, we do not know the true mean of the population. However, for example, if Company X gathered 99 more different samples of players to generate 99 more figures similar to the one below (and calculated 95% confidence intervals), the researcher could confidently say that in 95/100 of those samples, the true mean of the population of players would fall within that range.

time spent analysis

The small random sample consisted of 23 players and their average time spent in each of the four game modes: single-player campaign, online multiplayer, and local and online horde modes. While the figure alone can show differences visually, further analyses can validate whether there are statistically significant differences between the modes.

Based on this sample, players spent more time, on average, playing single-player campaign (M = 7.37, SE = .900) than local horde mode (M = 1.53, SE = .455). This difference, 5.84 hours, BCa 95% [3.875, 7.803], was significant, t(22) = 6.166, p = .000. Players also spent more time playing single-player campaign than online horde mode (M = 3.87, SE = .614). This difference of 3.5 hours, BCa 95% [1.180, 5.820], was significant, t(22) = 3.128, p = .005.

While there was no significant difference in time spent playing single-player campaign compared to online multiplayer, players spent more time, on average, playing online multiplayer (M = 8.35, SE = .857) compared to local horde mode. This difference of 6.82 hours, BCa 95% [4.727, 8.908] was significant, t(22) = 6.763, p = .000. Players also spent more time, on average, playing online multiplayer than online horde mode; the difference of 4.48 hours, BCa 95% [1.844, 7.013], was significant, t(22) = 3.664, p = .001.

Additionally, players spent more time, on average, playing online compared to local horde mode; the difference of 2.34 hours, BCa 95% [-3.674, -1.004], was significant,  t(22) = -3.634, p = .001.

Conclusion

To summarize, players spent more time playing both single-player campaign and online multiplayer than both local and online horde modes. Additionally, they spent more time playing online horde mode than local horde mode either by themselves or with friends. Again, the confidence intervals are an important piece of information here, because they allow Company X to confidently assume what the true mean of the player population would be for each game mode, and, ultimately, generalize findings based on this small random sample of players.

For more on paired-samples tests and confidence intervals:

Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics. London, England. SAGE.

The relationship between game sales, marketing and usability budgets

Company X wanted to predict how certain variables affect how many copies of their game sell. They believed a specific, single, variable would be the strongest predictor of game sales, so they compiled data related to the marketing budget for their last 100 games (yeah, they’ve been busy) and calculated a simple linear regression to predict the number of game sales (in millions) based on budget allotted to marketing (in millions).

linear regression marketing game sales

A significant regression was found F(1, 98) =  46.662, = .000, with an R² of .323, which tells us that marketing budget can account for about 32% of the variation in game sales, meaning that their model cannot explain 68% of the variation, which is explained by other variables. The following equation can be derived from the analysis: the predicted amount of game sales is equal to 9.342 + .391 (marketing budget) million when the marketing budget is measured in millions. Average game sales increased .391 million for each million allotted to the marketing budget. Additionally, as seen in the graph above, there was a strong, positive correlation (= .568) between marketing budget and game sales, indicating that, on average, the more Company X spent on funding marketing for their games, the more copies were sold.

While Company X was pleased to see that putting money into marketing their games had a positive relationship with game sales, they were interested in incorporating another variable into their model in an attempt to explain more than the 32% of variability in game sales. Therefore, they added the budget allotted to usability testing for each of the 100 games and, theoretically, thought that the amount of money allotted to usability testing, which provided a better overall user experience, might impact how many copies were ultimately sold.

game sales and usability

A significant multiple regression was found F(1, 98) =  90.465, = .000, with an R² of .651, which tells us that, together, marketing and usability budgets can account for about 65% of the variance in game sales, meaning that adding the usability budgets to the model explained about 33% (R² change = .328) more of the variability in game sales. Therefore, the predicted amount of game sales is equal to 4.382 + .194 (marketing) + .408 (usability), where marketing is measured in millions and usability is measured in thousands. Game sales increased .194 million for each million spent on marketing and .408 million for each thousand spent on usability testing. Additionally, as seen in the graph above, there was a strong, positive correlation (= .767) between usability budget and game sales, indicating that, on average, the more Company X spent on funding usability testing for their games, the more copies were sold.

Both marketing budget (= .000) and usability budget (p = .000) were significant predictors of game sales. Importantly, the adjusted  (.644) was very similar to that of the model, indicating that, if the model were derived from the population rather than this sample, it would account for approximately .7% less variance in game sales. For comparison, below, you can see a combination of the previous two graphs with trendlines for each predictor.

game sales marketing and usability

Conclusion

It is important to keep in mind that these are mock analyses and the data is purposely fabricated to show specific results. Additionally, there are multiple ways of conducting multiple regression analyses, and one important manipulation is how predictors are entered into the model, which can greatly affect the outcome. Here, Company X began by conducting a simple linear regression to investigate the relationship between game sales and marketing budget. Although it was a significant predictor of game sales, the marketing budget of their games only explained about 32% of the variance in game sales. So, they added another predictor variable they thought might have a positive relationship with game sales, usability budget, and found that, when combined with marketing budgets, usability budgets explained 65% of the variance in game sales. This is a much better explanation of the variance compared to the 32% explained solely by marketing budgets. Therefore, while Company X realizes the importance of marketing their games, they now know there is a significant, positive, relationship between how much of their budget is allotted to usability testing and will continue these practices with future games in hopes of continuing their impressive sales record.

 

Mock data analysis with CEGE model

The Core Elements of the Gaming Experience (CEGE) is a comprehensive model that consists of different factors that, together, form the experience between a video game and its user.  The two main umbrella variables associated with the model are Video-Game and Puppetry. Video-Game is simply the game itself, which is broken into the latent (subjective/not measurable) variables of Evironment and Game-play. These latent variables are inferred and are categorized under observable variables; Environment includes graphics and sounds of the game, whereas Gameplay includes the scenario and rules of the game.

Puppetry is the interaction of the player with the video game and is comprised of three main, latent variables: Control, Ownership, and Facilitators. Control is simply “taking control” of the game by learning how to use and manipulate things within it and is comprised of three observable factors: small actions (basic actions the player can do in the game), goal (main objective of the game), and something-to-do (the player needs to feel that there is always something to do in the game). Ownership is when the player takes his action in the game as his own and is ultimately rewarded by the game for them. Ownership is comprised of four observable factors: big actions (strategies used by the player, comprised of many small actions), you-but-not-you (player can take part in actions that he would not necessarily do in real life), personal goals (something not important to winning the game but an action completed for a personal reason), and reward (the game needs to provide the player with rewards). Lastly, Facilitators are external factors that can affect the interaction process between a video game and user and is comprised of three observable factors: aesthetics (how the game looks to the player), time (the time the player is willing to dedicate to the game), and previous experience (previous experiences of the player can affect how long he is willing to play and actions he will take in the game). Ultimately, the observable variables get lumped into the umbrella variables (e.g., gameplay and environment) and, when these umbrella variables of Video-game and Puppetry are met, the ultimate player experience/enjoyment is achieved.

The CEGE model has been operationalized and made into a standardized 38-question questionnaire that touches upon all of the previously discussed factors. Each question is rated on a 1-7 Likert scale (1 = completely false, 7 = completely true for this demonstration). Here two examples from the questionnaire:

25. I knew how to manipulate the game to move forward (puppetry – control/ownership)

26. The graphics were appropriate for the type of game (video-game – environment)

With the CEGE model now briefly explained, we will look at a mock data set utilizing it. For the sake of this analysis, we will pretend that Company X is interested in testing the user experience of one of its games during multiple stages of development. Right from the beginning of the conceptual stage, they have planned to utilize the CEGE model to test during prototyping as well as both the alpha and beta stages of production. Company X will also implement other forms of usability testing that will help them incorporate necessary changes to the game during these development phases. They are hypothesizing that average CEGE scores will increase with each phase because they will be able to address player’s feedback from other usability tests in between stages before testing again. For this study, they recruited a total of 33 participants, 20 males and 13 females (average age of 18.606 years), and 11 different participants were tested during each stage. To test whether there was a change in CEGE scores over time, they used a one-way between-subjects ANOVA. The alpha level was .05 two-tailed. Here is what they found:

Results

Overall CEGE score 

cege scores

The one-way between-subjects ANOVA revealed a significant effect of development phase on overall CEGE score, F(2, 30) = 12.303, p = .000, indicating that, on average, there were differences in CEGE scores during the testing phases. Post hoc comparisons using the Tukey HSD test indicated that the mean CEGE score for the prototyping phase (M = 4.203, SD = 8.502) was lower than the mean CEGE score during both alpha (M = 5.104, SD = .987; p = .030) and beta (M = 5.856, SD = .371; p = .000) phases; however, while CEGE scores during the alpha phase were lower than those during the beta phase, the difference only trended toward significance (p = .078). Taken together, these results indicate that CEGE scores did indeed improve with each successive testing phase, suggesting that Company X addressed issues encountered by players during earlier testing phases, ultimately producing better experiences.

  Enjoyment

enjoyment

The pattern of results for Enjoyment scores is similar to that of the CEGE scores. There was a significant effect of development phase on Enjoyment scores, F(2, 30) = 16.240, p = .000, indicating that, on average, there were differences in scores on questions related to Enjoyment during the testing phases. Post hoc comparisons indicated that the mean Enjoyment score for the prototyping phase (M = 3.605, SD = 1.073) was lower than the mean Enjoyment score during both alpha (M = 5.211, SD = .1.118; p = .003) and beta (M = 6.090, SD = .9078; p = .000). However, Enjoyment scores during the alpha phase were not significantly lower than those during the beta phase (p = .133). Together, this suggests that changes made to the game following testing during prototyping had a positive effect on enjoyment scores; however Company X did not see that same improvement between alpha and beta testing.

Frustration

frustration

The opposite pattern of results was true for Frustration scores. There was a significant effect of development phase on Frustration scores, F(2, 30) = 14.291, p = .000, indicating that, on average, Frustration scores differed during the different testing phases. Post hoc comparisons indicated that the mean Frustration score for the prototyping phase (M = 4.636, SD = 1.629) was higher than the mean Frustration score during both alpha (M = 2.409, SD = 1.319; p = .002) and beta (M = 1.590, SD = .1.157; p = .000). However, Frustration scores during the alpha phase were not significantly higher than those during the beta phase (p = .360). Taken together, this suggests that changes made to the game following testing during prototyping made the game significantly less frustrating to players; however, the game was not any less frustrating to players following changes made between alpha and beta phases.

Puppetry (ownership)

puppetry ownership

While there were no significant differences in Puppetry (control) scores between the three development phases, there was a significant effect of development phase on Puppetry (ownership) scores, F(2, 30) = 16.366, p = .000, indicating that, on average, there were differences in scores on questions related to Ownership during the testing phases. Post hoc comparisons indicated that the mean Ownership score for the prototyping phase (M = 4.029, SD = 1.100) was lower than the mean Ownership score during both alpha (M = 5.317, SD = .769; p = .002) and beta (M = 5.893, SD = .186; p = .000). However, Ownership scores during the alpha phase were not significantly lower than those during the beta phase (p = .212). Altogether, this indicates that changes made to the game following testing during prototyping had a positive effect on the players’ feeling of ownership of the game; however players’ feelings of ownership did not improve between alpha and beta testing.

Puppetry (control/ownership)

puppetry conrol ownership

Additionally, there was a significant effect of development phase on Puppetry (control/ownership) scores, F(2, 30) = 22.209, p = .000, indicating that, on average, there were differences in scores on questions related to Control/Ownership during the testing phases. Post hoc comparisons indicated that the mean Control/Ownership score for the prototyping phase (M = 3.545, SD = 1.213) was lower than the mean Control/Ownership score during both alpha (M = 5.545, SD = .1.128; p = .000) and beta (M = 6.272, SD = .467; p = .000). However, Control/Ownership scores during the alpha phase were not significantly lower than those during the beta phase (p = .216). Taken together, these results suggest that changes made to the game following testing during prototyping improved scores related to the players’ feelings of control and ownership of the game; however there was no change from alpha to beta phases.

Video-game (gameplay)

videogame gameplay

Although there were no differences between the three development phases on Video-game (environment) scores, there was a significant effect of development phase on Video-game (gameplay) scores, F(2, 30) = 7.165, p = .003, indicating that, on average, there were differences in scores on questions related to gameplay during the testing phases. Post hoc comparisons indicated that the mean gameplay score for the prototyping phase (M = 4.545, SD = 1.166) did not significantly differ from the gameplay score during alpha (M = 5.302, SD = 1.027; p = .164) testing; however, gameplay scores during prototyping were significantly lower than those during the beta (M = 6.075, SD = .529; p = .002) phase. Furthermore, gameplay scores were not significantly different between the alpha and beta phases (p = .153). Altogether, these results indicate that gameplay scores improved from prototyping to beta testing; however, there were no improvements in gameplay scores from prototyping to alpha and alpha to beta testing phases, suggesting that Company X had to wait until after beta testing to see significant increases in gameplay scores due to their changes throughout the development phases.

Discussion

Using the CEGE model and questionnaire, Company X was able to objectively reveal important information on factors related to different aspects of players’ experiences, such as video-game environment and gameplay, enjoyment, frustration, control, and ownership when playing their game. When combined with other forms of usability testing throughout the development process, the CEGE model can offer invaluable insights into how the usability changes implemented by Company X have ultimately affected player experience.

In the current study, some obvious patterns of results emerged from the data. Primarily, overall CEGE scores, which take into account questions 6-38 on the questionnaire, improved with each testing phase. This is an important finding because it encompasses many of the factors that influence player experience and, broadly, shows Company X that the changes they have made to their game, based on other usability tests, have improved overall player experience.

A secondary pattern that emerged was the improvement of scores from prototyping to alpha phase; however, there were no significant improvements in these scores between alpha and beta phases. Although the graphs show increases in the scores, and decreases when considering Frustration, from alpha to beta testing, these differences were not significant due to the variability in the data. This was true of all the other factors where there were differences between prototyping and alpha, including Enjoyment, Frustration, Puppetry (ownership, control/ownership), and Video-game gameplay. Overall, this indicates that changes implemented by Company X between prototyping and alpha produced improvements in these areas of player experience; however, changes made between alpha and beta did not result in improvements in enjoyment, frustration, ownership, control/ownership, and gameplay. This is likely less important to Company X because smaller-scale/less significant changes were likely made between alpha and beta, compared to prototyping and alpha phases. Importantly, players enjoyed the game significantly more from the initial testing (prototyping) to final testing (beta), which was the ultimate goal of Company X. Additionally, the frustration levels of players decreased between these phases, indicating that the changes implemented were positive in terms of player experience. Furthermore, players had an increased sense of ownership and control/ownership between prototyping and beta phases, suggesting that changes made by Company X allowed the players to feel like they were in better control of the game by having a better understanding of the basic controls, main goals, and always feeling that there was something to do. Similarly, players had an improved sense ownership, which included incorporating more strategy into their gaming, as well as completing actions based on personal goals and actions they would not perform in real life. This caused players to seek rewards for their actions in the game, ultimately allowing players to take ownership of their in-game actions. Lastly, there were no differences in Video-game environment, likely because Company X did not make any changes to the graphics or sound of the game during these three development phases. However, there were changes from prototyping to beta phase in regards to gameplay, indicating that Company X changed some aspects of the game’s general rules and scenario to cater to participants’ feedback, resulting in a better overall experience.

With this mock data analysis, an example of how a company can objectively measure variables that are related to the player experience using the CEGE model has been demonstrated. Along with other types of usability/user experience testing, this model can provide insightful information related to how a game can be modified in order to provide a better overall player experience.

For a more thorough review of the CEGE model, see Chapter 3 of Game User Experience Evaluation.