pyspark.pandas.DataFrame.aggregate#
- DataFrame.aggregate(func)[source]#
Aggregate using one or more operations over the specified axis.
- Parameters
- funcdict or a list
a dict mapping from column name (string) to aggregate functions (list of strings). If a list is given, the aggregation is performed against all columns.
- Returns
- DataFrame
See also
DataFrame.apply
Invoke function on DataFrame.
DataFrame.transform
Only perform transforming type operations.
DataFrame.groupby
Perform operations over groups.
Series.aggregate
The equivalent function for Series.
Notes
agg is an alias for aggregate. Use the alias.
Examples
>>> df = ps.DataFrame([[1, 2, 3], ... [4, 5, 6], ... [7, 8, 9], ... [np.nan, np.nan, np.nan]], ... columns=['A', 'B', 'C'])
>>> df A B C 0 1.0 2.0 3.0 1 4.0 5.0 6.0 2 7.0 8.0 9.0 3 NaN NaN NaN
Aggregate these functions over the rows.
>>> df.agg(['sum', 'min'])[['A', 'B', 'C']].sort_index() A B C min 1.0 2.0 3.0 sum 12.0 15.0 18.0
Different aggregations per column.
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})[['A', 'B']].sort_index() A B max NaN 8.0 min 1.0 2.0 sum 12.0 NaN
For multi-index columns:
>>> df.columns = pd.MultiIndex.from_tuples([("X", "A"), ("X", "B"), ("Y", "C")]) >>> df.agg(['sum', 'min'])[[("X", "A"), ("X", "B"), ("Y", "C")]].sort_index() X Y A B C min 1.0 2.0 3.0 sum 12.0 15.0 18.0
>>> aggregated = df.agg({("X", "A") : ['sum', 'min'], ("X", "B") : ['min', 'max']}) >>> aggregated[[("X", "A"), ("X", "B")]].sort_index() X A B max NaN 8.0 min 1.0 2.0 sum 12.0 NaN