pyspark.sql.functions.mode#
- pyspark.sql.functions.mode(col, deterministic=False)[source]#
- Returns the most frequent value in a group. - New in version 3.4.0. - Changed in version 4.0.0: Supports deterministic argument. - Parameters
- colColumnor str
- target column to compute on. 
- deterministicbool, optional
- if there are multiple equally-frequent results then return the lowest (defaults to false). 
 
- col
- Returns
- Column
- the most frequent value in a group. 
 
 - Notes - Supports Spark Connect. - Examples - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([ ... ("Java", 2012, 20000), ("dotNET", 2012, 5000), ... ("Java", 2012, 20000), ("dotNET", 2012, 5000), ... ("dotNET", 2013, 48000), ("Java", 2013, 30000)], ... schema=("course", "year", "earnings")) >>> df.groupby("course").agg(sf.mode("year")).sort("course").show() +------+----------+ |course|mode(year)| +------+----------+ | Java| 2012| |dotNET| 2012| +------+----------+ - When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true. - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(-10,), (0,), (10,)], ["col"]) >>> df.select(sf.mode("col", False)).show() +---------+ |mode(col)| +---------+ | 0| +---------+ >>> df.select(sf.mode("col", True)).show() +---------------------------------------+ |mode() WITHIN GROUP (ORDER BY col DESC)| +---------------------------------------+ | -10| +---------------------------------------+