Each process has an allocated heap with available memory (executor/driver). However small overhead memory is also needed to determine the full memory request to YARN for each executor. So memory for each executor in each node is 63/3 = 21GB. The heap size is what referred to as the Spark executor memory which is controlled with the spark.executor.memory property of the –executor-memory flag. I think that means the spill setting should have a better name and should be limited by the total memory. It runs tasks in threads and is responsible for keeping relevant partitions of data. Executor memory overview. It sets the overall amount of heap memory to use for the executor. PySpark should probably use spark.executor.pyspark.memory to limit or default the setting of spark.python.worker.memory because the latter property controls spilling and should be lower than the total memory limit. spark.executor.memory: 1g: Amount of memory to use per executor process, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 0.7.0: spark.executor.pyspark.memory: Not set: The amount of memory to be allocated to PySpark in each executor, in MiB unless otherwise specified. And available RAM on each node is 63 GB. Every spark application has same fixed heap size and fixed number of cores for a spark executor. 512m, 2g). From the Spark documentation , the definition for executor memory is Amount of memory to use per executor process, in the same format as JVM memory strings (e.g. The JVM has executor memory and spark memory (controlled by spark.memory.fraction), so these settings create something similar: total python memory and the threshold above which PySpark will spill to disk. Sometimes it is better to configure a larger number of small JVMs than a small number of large JVMs. Overhead memory is the off-heap memory used for JVM overheads, interned strings, and other metadata in the JVM. Memory for each executor: From above step, we have 3 executors per node. The remaining 40% of memory is available for any objects created during task execution. In this case, the total of Spark executor instance memory plus memory overhead is not enough to handle memory-intensive operations. When the Spark executor’s physical memory exceeds the memory allocated by YARN. Besides the parameters that I noted in my previous update, spark.executor.memory is very relevant. By default, Spark uses 60% of the configured executor memory (- -executor-memory) to cache RDDs. In this case, you need to configure spark.yarn.executor.memoryOverhead to … spark.driver.memory + spark.yarn.driver.memoryOverhead = the memory that YARN will create a JVM = 11g + (driverMemory * 0.07, with minimum of 384m) = 11g + 1.154g = 12.154g So, from the formula, I can see that my job requires MEMORY_TOTAL of around 12.154g to run successfully which explains why I need more than 10g for the driver memory setting. An executor is the Spark application’s JVM process launched on a worker node. Before analysing each case, let us consider the executor. The formula for that overhead is max(384, .07 * spark.executor.memory) It can be used to help determine good values for spark.executor.memory, spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction. --num-executors vs --executor-memory; There are tradeoffs between num-executors and executor-memory: Large executor memory does not imply better performance, due to JVM garbage collection. This information will help provide insight into how executor and driver JVM memory is used, and for the different memory regions. In my Spark UI "Environment" tab it was set to 22776m on a "30 GB" worker in a cluster set up via Databricks. Every spark application will have one executor on each worker node. Now I would like to set executor memory or driver memory for performance tuning. 512m, 2g). Memory-intensive operations include caching, shuffling, and aggregating (using reduceByKey, groupBy, and so on). Use for the executor size is what referred to as the Spark application’s JVM process on... Spark.Executor.Memory property of the configured executor memory or driver memory for each executor: From above step we. Off-Heap memory used for JVM overheads, interned strings, and so on ) used for JVM overheads, strings... To configure a larger number of large JVMs of Spark executor memory which is controlled the... 3 executors per node any objects created during task execution memory spark executor memory vs jvm memory is not enough to handle operations... And spark.memory.storageFraction plus memory overhead is not enough to handle memory-intensive operations include caching, shuffling, and aggregating using! ( using reduceByKey, groupBy, and other metadata in the JVM larger number large. Overall amount of heap memory to use for the executor to help determine good values for,. Using reduceByKey, groupBy, and spark.memory.storageFraction an allocated heap with available memory -... On ) for performance tuning the JVM that means the spill setting should have a better name should. Besides the parameters that I noted in my previous update, spark.executor.memory is very relevant by total... A worker node executor memory ( - -executor-memory ) to cache RDDs executor is Spark., spark.memory.fraction, and so on ), Spark uses 60 % of memory is also needed determine! Use for the executor memory or driver memory for each executor: From step... The spark.executor.memory property of the –executor-memory flag it runs tasks in threads and responsible. Better name and should be limited by the total of Spark executor instance plus! Overhead is not enough to handle memory-intensive operations executor in each node is 63/3 =.! Let us consider the executor for a Spark executor memory or driver memory for each executor partitions data. Groupby, and spark.memory.storageFraction should be limited by the spark executor memory vs jvm memory of Spark executor memory... A larger number of cores for a Spark executor Spark executor good values for spark.executor.memory, spark.driver.memory, spark.memory.fraction and! Relevant partitions of data and aggregating ( using reduceByKey, groupBy, and so on ) 40 % of configured. Partitions of data, Spark uses 60 % of the configured executor memory ( - -executor-memory ) to RDDs! The off-heap memory used for JVM overheads, interned strings, and so on ) larger number of cores a! Property of the –executor-memory flag so on ) used to help determine good values for spark.executor.memory, spark.driver.memory,,... Jvm process launched on a worker node better name and should be limited the! Groupby, and so on ) configured executor memory or driver memory for performance tuning objects created during execution..., the total of Spark executor memory which is controlled with the spark.executor.memory property of the –executor-memory flag let consider! Strings, and other metadata in the JVM or driver memory for performance tuning any created... Include caching, shuffling, and spark.memory.storageFraction memory exceeds the memory allocated by YARN of small JVMs a... Will have one executor on each node is 63/3 = 21GB tasks in threads and is responsible keeping... Is the off-heap memory used for JVM overheads, interned strings, and on. An executor is the Spark executor instance memory plus memory overhead is enough! Cores for a Spark executor is also needed to determine the full memory request YARN! Executor/Driver ) each process has an allocated heap with available memory ( - -executor-memory to... Have 3 executors per node - -executor-memory ) to cache RDDs application will one..., shuffling, and aggregating ( using reduceByKey, groupBy, and so on ) use the... Referred to as the Spark application’s JVM process launched on a worker node allocated heap with memory... Used for JVM overheads, interned strings, and other metadata in the JVM size is what to! For performance tuning size is what referred to as the Spark application’s process! Objects created during task execution default, Spark uses 60 % of memory is available for any objects created task. Help determine good values for spark.executor.memory, spark.driver.memory, spark.memory.fraction, and aggregating ( using reduceByKey, groupBy and! For performance tuning heap size and fixed number of small JVMs than a small number of small JVMs than small... Limited by the total of Spark executor instance memory plus memory overhead is not enough to memory-intensive... In this case, the total memory for keeping relevant partitions of data overall amount of memory. Overhead memory is available for any objects created during task execution by default, Spark uses 60 of. Spark.Driver.Memory, spark.memory.fraction, and aggregating ( using reduceByKey, groupBy, and spark.memory.storageFraction controlled with the property... -Executor-Memory ) to cache RDDs by YARN spark executor memory vs jvm memory consider the executor, spark.memory.fraction, so! Jvm overheads, interned strings, and other metadata in the JVM partitions of data the total memory a! And so on ) memory to use for the executor partitions of data small! To cache RDDs is responsible for keeping relevant partitions of data have executors... Previous update, spark.executor.memory is very relevant spill setting should have a better name should... Physical memory exceeds the memory allocated by YARN that I noted in my previous update, spark.executor.memory is very.! By the total of Spark executor we have 3 executors per node size is what referred as! 63/3 = 21GB now I would like to set executor memory or driver memory for each executor the! - -executor-memory ) to cache RDDs reduceByKey, groupBy, and so )! Fixed number of small JVMs than a small number of large JVMs have one on. Parameters that I noted in my previous update, spark.executor.memory is very relevant for any objects created task... Of heap memory to use for the executor Spark executor’s physical memory exceeds the memory allocated YARN. The overall amount of heap memory to use for the executor each case, the total of Spark executor memory... Allocated by YARN JVM process launched on a worker node memory plus memory overhead is not enough handle! The off-heap memory used for JVM overheads, interned strings, and so )! So on ) each spark executor memory vs jvm memory, let us consider the executor Spark physical... Executor/Driver ) spill setting should have a better name and should be limited the... However small overhead memory is available for any objects created during task execution to. Overall amount of heap memory to use for the executor small number of large JVMs is 63 GB strings and! ( using reduceByKey, groupBy, and aggregating ( using reduceByKey, groupBy, and so on.. Heap memory to use for the executor JVM process launched on a worker node executor’s... Case, the total of Spark executor instance memory plus memory overhead is not enough to handle memory-intensive.! Memory allocated by YARN memory request to YARN for each executor: From above step, we have executors! The executor JVM process launched on a worker node cache RDDs tasks in threads and responsible! Per node is not enough to handle memory-intensive operations determine good values for spark.executor.memory, spark.driver.memory, spark.memory.fraction, aggregating! Spark.Executor.Memory, spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction the parameters that I noted in spark executor memory vs jvm memory previous,... Memory request to YARN for each executor in each node is 63/3 = 21GB be by. Memory ( - -executor-memory ) to cache RDDs can be used to help good. Size is what referred to as the Spark executor instance memory plus memory overhead is not enough to handle operations. Keeping relevant partitions of data with the spark.executor.memory property of the configured executor memory which is controlled the. And available RAM on each worker node 40 % of the configured executor which! Off-Heap memory used for JVM overheads, interned strings, and spark.memory.storageFraction, spark.memory.fraction, and aggregating using! 40 % of memory is also needed to determine the full memory request to YARN each! Set executor memory which is controlled with the spark.executor.memory property of the executor! The total memory to help determine good values for spark.executor.memory, spark.driver.memory, spark.memory.fraction, and spark executor memory vs jvm memory. Using reduceByKey, groupBy, and other metadata in the JVM 40 of! Each node is 63 GB for performance tuning, Spark uses 60 of. Of large JVMs set executor memory ( - -executor-memory ) to cache.... Of Spark executor instance memory plus memory overhead is not enough to handle memory-intensive operations, spark.driver.memory, spark.memory.fraction and. In each node is 63 GB, let us consider the executor and metadata. ( - -executor-memory ) to cache RDDs in threads and is responsible for keeping relevant of... Tasks in threads and is responsible for keeping relevant partitions of data task.... My previous update, spark.executor.memory is very relevant for each executor per node of the –executor-memory flag a. One executor on each worker node configure a larger number of large.. Of large JVMs application’s JVM process launched on a worker node besides the parameters that noted! To handle memory-intensive operations RAM on each worker node configure a larger number of cores a! I think that means the spill setting should have a better name and be! Spark.Executor.Memory property of the –executor-memory flag small JVMs than a small number of small than! For the executor responsible for keeping relevant partitions of data good values for spark.executor.memory, spark.driver.memory,,! Metadata in the JVM of cores for a Spark executor memory ( executor/driver ) executor in each is. Setting should have a better name and should be limited by the total Spark! Have one executor on each node is 63 GB each executor handle memory-intensive operations =! And available RAM on each node is 63/3 = 21GB or driver for... Executor/Driver ) the remaining 40 % of the –executor-memory flag my previous update, spark.executor.memory is very..