Enforcing application lifetime SLAs on YARN
This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate. If you’re interested in learning more, go to our recap blog here! Introduction Lifetime indicates the overall time spent by an application in YARN. The lifetime of an application is calculated from its start […] The post Enforcing application lifetime SLAs on YARN appeared first on Cloudera Blog.

This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate.
If you’re interested in learning more, go to our recap blog here!
Introduction
Lifetime indicates the overall time spent by an application in YARN. The lifetime of an application is calculated from its start time to finish time, including the actual run time as well as the wait time for resource allocation. Both users and administrators on the YARN system might occasionally be required to restrict the duration of specific lifetime Service Level Agreements (SLAs).
The requirements for restricting the SLA durations are different for users and administrators. For example, a user might run a scheduled cron job that returns statistics of an application run during specific ‘N’ minutes on a daily basis. Assume that the scheduled job runs at specified times with different datasets and takes about an hour to complete. If the job does not complete within the estimated time duration, the resultant output might not be useful to the user, especially if they monitor the application to claim its resources after the run. Therefore, restricting the lifetime of an application removes the need to monitor its run.
An administrator might be required to restrict the application lifetime for a particular leaf queue. This requirement can be very helpful in organizations where queues are shared across many departments. In such scenarios, restricting the lifetime of applications submitted to leaf queues would ensure optimal availability of resources among users from different departments who wish to submit their jobs.
YARN addresses the requirements of both users and administrators by enabling them to configure the lifetime for applications. This feature is available in Apache Hadoop starting with Hadoop 2.9. Hortonworks Data Platform (HDP) includes this feature starting with HDP 3.0.
How can admins enforce application lifetime SLAs?
YARN allows admins to set lifetime of an application at leaf-queue in capacity-scheduler.xml. Below are queue configurations to set application lifetime for a leaf-queue.
Property | Description |
yarn.scheduler.capacity. |
Maximum lifetime of an application which is submitted to a queue in seconds. Any value less than or equal to zero will be considered as disabled. This will be a hard time limit for all applications in this queue. If positive value is configured then any application submitted to this queue will be killed after exceeds the configured lifetime. User can also specify lifetime per application basis in application submission context. But user lifetime will be overridden if it exceeds queue maximum lifetime. It is point-in-time configuration. Note : Configuring too low value will result in killing application sooner. This feature is applicable only for leaf queue. |
yarn.scheduler.capacity.root. |
Default lifetime of an application which is submitted to a queue in seconds. Any value less than or equal to zero will be considered as disabled. If the user has not submitted application with lifetime value then this value will be taken. It is point-in-time configuration. Note : Default lifetime can’t exceed maximum lifetime. This feature is applicable only for leaf queue. |
How can users enforce application lifetime SLAs?
User can set lifetime of an application either during job submission or update lifetime after submission. This will ensures application doesn’t run more than configured application lifetime SLA.
Set app lifetime using Java API
During application submission user can set lifetime in ApplicationSubmissionContext i.e ApplicationSubmissionContext#setApplicationTimeouts(Map Timeout type LIFETIME is timeout imposed on overall application life time. It includes actual run-time plus non-runtime. Non-runtime includes time elapsed by scheduler to allocate container, time taken to store in RMStateStore and etc.
YARN provides CLI interface to update lifetime of an application. User can update lifetime i.e either extend or reduce the lifetime value.
Below CLI command update lifetime of an application is from NOW. Example:
In the above example, lifetime for application application_1465246237936_0001 is updated to 300 seconds from NOW. If current time is 10:00 AM then application timeout happens at 10:05 AM.
Update timeout of an application for given timeout type.
http://rm-http-address:port/ws/v1/cluster/apps/{appid}/timeout
PUT
HTTP Request
Response Header:
Response Body:
http://rm-http-address:port/ws/v1/cluster/apps/{appid}/timeouts/{type}
GET
HTTP Request:
Response Header:
Response Body:
YARN enforces application lifetime SLAs by providing configurations and APIs. One can make use of this feature to auto clean up of applications and release the resources. This feature has been implemented as part of YARN-3813.
We would like to thank all those who contributed patches to Application Timeout feature: Akhil PB, Miklos Szegedi (besides the authors of this post). Thanks also to Jian He, Vinod Vavilapalli, Sunil Govindan, Wangda Tan, Varun Vasudev, Nijel SF for their helping designing and reviews!
Be sure to check out our recap blog which you can find here!
The post Enforcing application lifetime SLAs on YARN appeared first on Cloudera Blog. Sample Code:
CLI/REST Interfaces
CLI
Syntax:
REST
Update Lifetime of an Application
URI:
HTTP Operations Supported:
Elements of the timeout object
Item
Data Type
Description
type
string
Timeout type. Valid values are the members of the ApplicationTimeoutType enum. LIFETIME is currently the only valid value.
expiryTime
string
Time at which the application will expire in ISO8601 yyyy-MM-dd’T’HH:mm:ss.SSSZ format.
Get Lifetime of an Application
URL:
HTTP Operations Supported:
Elements of the timeout (Application Timeout) object
Item
Data Type
Description
type
string
Timeout type. Valid values are the members of the ApplicationTimeoutType enum. LIFETIME is currently the only valid value.
expiryTime
string
Time at which the application will expire in ISO8601 yyyy-MM-dd’T’HH:mm:ss.SSSZ format.
remainingTimeInSeconds
long
Remaining time for configured application timeout. -1 indicates that application is not configured with timeout. Zero(0) indicates that application has expired with configured timeout type.
Conclusion
Acknowledgements