Data Types
Spark SQL and DataFrames support the following data types:
- Numeric types
    - ByteType: Represents 1-byte signed integer numbers. The range of numbers is from- -128to- 127.
- ShortType: Represents 2-byte signed integer numbers. The range of numbers is from- -32768to- 32767.
- IntegerType: Represents 4-byte signed integer numbers. The range of numbers is from- -2147483648to- 2147483647.
- LongType: Represents 8-byte signed integer numbers. The range of numbers is from- -9223372036854775808to- 9223372036854775807.
- FloatType: Represents 4-byte single-precision floating point numbers.
- DoubleType: Represents 8-byte double-precision floating point numbers.
- DecimalType: Represents arbitrary-precision signed decimal numbers. Backed internally by- java.math.BigDecimal. A- BigDecimalconsists of an arbitrary precision integer unscaled value and a 32-bit integer scale.
 
- String type
    - StringType: Represents character string values.
 
- Binary type
    - BinaryType: Represents byte sequence values.
 
- Boolean type
    - BooleanType: Represents boolean values.
 
- Datetime type
    - TimestampType: Represents values comprising values of fields year, month, day, hour, minute, and second, with the session local time-zone. The timestamp value represents an absolute point in time.
- DateType: Represents values comprising values of fields year, month and day, without a time-zone.
 
- Complex types
    - ArrayType(elementType, containsNull): Represents values comprising a sequence of elements with the type of- elementType.- containsNullis used to indicate if elements in a- ArrayTypevalue can have- nullvalues.
- MapType(keyType, valueType, valueContainsNull): Represents values comprising a set of key-value pairs. The data type of keys is described by- keyTypeand the data type of values is described by- valueType. For a- MapTypevalue, keys are not allowed to have- nullvalues.- valueContainsNullis used to indicate if values of a- MapTypevalue can have- nullvalues.
- StructType(fields): Represents values with the structure described by a sequence of- StructFields (- fields).- StructField(name, dataType, nullable): Represents a field in a- StructType. The name of a field is indicated by- name. The data type of a field is indicated by- dataType.- nullableis used to indicate if values of these fields can have- nullvalues.
 
 
All data types of Spark SQL are located in the package org.apache.spark.sql.types.
You can access them by doing
import org.apache.spark.sql.types._
Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SparkSQLExample.scala" in the Spark repo.
    | Data type | Value type in Scala | API to access or create a data type | 
|---|---|---|
| ByteType | Byte | ByteType | 
| ShortType | Short | ShortType | 
| IntegerType | Int | IntegerType | 
| LongType | Long | LongType | 
| FloatType | Float | FloatType | 
| DoubleType | Double | DoubleType | 
| DecimalType | java.math.BigDecimal | DecimalType | 
| StringType | String | StringType | 
| BinaryType | Array[Byte] | BinaryType | 
| BooleanType | Boolean | BooleanType | 
| TimestampType | java.sql.Timestamp | TimestampType | 
| DateType | java.sql.Date | DateType | 
| ArrayType | scala.collection.Seq | ArrayType(elementType, [containsNull]) Note: The default value of containsNull is true. | 
| MapType | scala.collection.Map | MapType(keyType, valueType, [valueContainsNull]) Note: The default value of valueContainsNull is true. | 
| StructType | org.apache.spark.sql.Row | StructType(fields) Note: fields is a Seq of StructFields. Also, two fields with the same name are not allowed. | 
| StructField | The value type in Scala of the data type of this field (For example, Int for a StructField with the data type IntegerType) | StructField(name, dataType, [nullable]) Note: The default value of nullable is true. | 
All data types of Spark SQL are located in the package of
org.apache.spark.sql.types. To access or create a data type,
please use factory methods provided in
org.apache.spark.sql.types.DataTypes.
| Data type | Value type in Java | API to access or create a data type | 
|---|---|---|
| ByteType | byte or Byte | DataTypes.ByteType | 
| ShortType | short or Short | DataTypes.ShortType | 
| IntegerType | int or Integer | DataTypes.IntegerType | 
| LongType | long or Long | DataTypes.LongType | 
| FloatType | float or Float | DataTypes.FloatType | 
| DoubleType | double or Double | DataTypes.DoubleType | 
| DecimalType | java.math.BigDecimal | DataTypes.createDecimalType() DataTypes.createDecimalType(precision, scale). | 
| StringType | String | DataTypes.StringType | 
| BinaryType | byte[] | DataTypes.BinaryType | 
| BooleanType | boolean or Boolean | DataTypes.BooleanType | 
| TimestampType | java.sql.Timestamp | DataTypes.TimestampType | 
| DateType | java.sql.Date | DataTypes.DateType | 
| ArrayType | java.util.List | DataTypes.createArrayType(elementType) Note: The value of containsNull will be true DataTypes.createArrayType(elementType, containsNull). | 
| MapType | java.util.Map | DataTypes.createMapType(keyType, valueType) Note: The value of valueContainsNull will be true. DataTypes.createMapType(keyType, valueType, valueContainsNull) | 
| StructType | org.apache.spark.sql.Row | DataTypes.createStructType(fields) Note: fields is a List or an array of StructFields. Also, two fields with the same name are not allowed. | 
| StructField | The value type in Java of the data type of this field (For example, int for a StructField with the data type IntegerType) | DataTypes.createStructField(name, dataType, nullable) | 
All data types of Spark SQL are located in the package of pyspark.sql.types.
You can access them by doing
from pyspark.sql.types import *| Data type | Value type in Python | API to access or create a data type | 
|---|---|---|
| ByteType | int or long Note: Numbers will be converted to 1-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -128 to 127. | ByteType() | 
| ShortType | int or long Note: Numbers will be converted to 2-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -32768 to 32767. | ShortType() | 
| IntegerType | int or long | IntegerType() | 
| LongType | long Note: Numbers will be converted to 8-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -9223372036854775808 to 9223372036854775807. Otherwise, please convert data to decimal.Decimal and use DecimalType. | LongType() | 
| FloatType | float Note: Numbers will be converted to 4-byte single-precision floating point numbers at runtime. | FloatType() | 
| DoubleType | float | DoubleType() | 
| DecimalType | decimal.Decimal | DecimalType() | 
| StringType | string | StringType() | 
| BinaryType | bytearray | BinaryType() | 
| BooleanType | bool | BooleanType() | 
| TimestampType | datetime.datetime | TimestampType() | 
| DateType | datetime.date | DateType() | 
| ArrayType | list, tuple, or array | ArrayType(elementType, [containsNull]) Note: The default value of containsNull is True. | 
| MapType | dict | MapType(keyType, valueType, [valueContainsNull]) Note: The default value of valueContainsNull is True. | 
| StructType | list or tuple | StructType(fields) Note: fields is a Seq of StructFields. Also, two fields with the same name are not allowed. | 
| StructField | The value type in Python of the data type of this field (For example, Int for a StructField with the data type IntegerType) | StructField(name, dataType, [nullable]) Note: The default value of nullable is True. | 
| Data type | Value type in R | API to access or create a data type | 
|---|---|---|
| ByteType | integer Note: Numbers will be converted to 1-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -128 to 127. | "byte" | 
| ShortType | integer Note: Numbers will be converted to 2-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -32768 to 32767. | "short" | 
| IntegerType | integer | "integer" | 
| LongType | integer Note: Numbers will be converted to 8-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -9223372036854775808 to 9223372036854775807. Otherwise, please convert data to decimal.Decimal and use DecimalType. | "long" | 
| FloatType | numeric Note: Numbers will be converted to 4-byte single-precision floating point numbers at runtime. | "float" | 
| DoubleType | numeric | "double" | 
| DecimalType | Not supported | Not supported | 
| StringType | character | "string" | 
| BinaryType | raw | "binary" | 
| BooleanType | logical | "bool" | 
| TimestampType | POSIXct | "timestamp" | 
| DateType | Date | "date" | 
| ArrayType | vector or list | list(type="array", elementType=elementType, containsNull=[containsNull]) Note: The default value of containsNull is TRUE. | 
| MapType | environment | list(type="map", keyType=keyType, valueType=valueType, valueContainsNull=[valueContainsNull]) Note: The default value of valueContainsNull is TRUE. | 
| StructType | named list | list(type="struct", fields=fields) Note: fields is a Seq of StructFields. Also, two fields with the same name are not allowed. | 
| StructField | The value type in R of the data type of this field (For example, integer for a StructField with the data type IntegerType) | list(name=name, type=dataType, nullable=[nullable]) Note: The default value of nullable is TRUE. | 
