| Codecademy

Aggregation refers to using one value to describe multiple datapoints. Calculating an average is the classic example of aggregation, because we use one value (the average) to describe the “center” of multiple datapoints.

Aggregations like the average are also called summary statistics because they summarize an entire group of data using a statistic.

Let’s start by looking at aggregating, or summarizing, an entire column.

Using the .describe() method, we can calculate the most common aggregation functions like the mean, minimum, and maximum all at once:

results[‘total_goals’].describe()

count 17095.000000 mean 2.642820 std 1.845359 min 0.000000 25% 1.000000 50% 2.000000 75% 4.000000 max 31.000000 Name: total_goals, dtype: float64

This output tells us, for example, that across 17095 games there were an average of 2.64 goals per game, with a maximum of 31!

If we want to compute aggregations individually, we can apply individual pandas methods to one or more columns:

# Summarize a single column df[‘column_name’].summary_method() # Summarize multiple columns df[[‘column_name1’, ‘column_name2’]].summary_method()

Built-in summary methods include:

  • .mean() returns the mean
  • .median() returns the median
  • .std() returns the standard deviation
  • .max() and .min() return the maximum and minimum values respectively
  • .nunique() returns the count of unique values
  • .count() returns the count of non-null values
  • .sum() returns the sum

Let’s use .sum() to add up the total number of goals in the total_goals column:

results[‘total_goals’].sum() >>> 45179

One of our initial questions was whether teams did better at their home stadium. Let’s use .sum() to see if, overall, more goals are scored home or away:

results[[‘home_score’, ‘away_score’]].sum() >>> home_score 27293 away_score 17886 dtype: int64

It looks like a lot more goals were scored overall at home – but this might be misleading, since some of these games occurred at neutral venues (neither team was truly “at home”). We’ll dig into these intricacies more in later exercises!

How to Use Your Jupyter Notebook:

  • You can run a cell in the Notebook to the right by placing your cursor in the cell and clicking the Run button or the Shift+Enter/Return keys.
  • When you are ready to evaluate the code in your Notebook, press the Save button at the top of the Notebook or use the control/command+s keys before clicking the Test Work button at the bottom. Be sure to save your solution code in the cell marked ## YOUR SOLUTION HERE ## or it will not be evaluated.
  • When you are ready to move on, click Next.

Screenshot of the buttons at the top of a Jupyter Notebook. The Run and Save buttons are highlighted

Source

Leave a Reply