Personal tools
Platform LSF Version 6.0 - Running Jobs with Platform LSF - Working with Jobs
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
- Submitting Jobs (bsub)
- Modifying a Submitted Job (bmod)
- Controlling Jobs
- Submitting a Job to Specific Hosts
- Submitting a Job and Indicating Host Preference
- Using LSF with Non-Shared File Space
- Reserving Resources for Jobs
- Submitting a Job with Start or Termination Times
[ Top ]
Submitting Jobs (bsub)
- bsub command
- Submitting a job to a specific queue (bsub -q)
- Submitting a job associated to a project (bsub -P)
- Submitting a job associated to a user group (bsub -G)
- Submitting a job with a job name (bsub -J)
- Submitting a job to a service class (bsub -sla)
- Submitting a job under a job group (bsub -g)
bsub command
You submit a job with the
bsub
command. If you do not specify any options, the job is submitted to the default queue configured by the LSF administrator (usually queuenormal
).For example, if you submit the job
my_job
without specifying a queue, the job goes to the default queue.%bsub my_job
Job <1234> is submitted to default queue <normal>In the above example, 1234 is the job ID assigned to this job, and
normal
is the name of the default job queue.Your job remains pending until all conditions for its execution are met. Each queue has execution conditions that apply to all jobs in the queue, and you can specify additional conditions when you submit the job.
You can also specify an execution host or a range of hosts, a queue, and start and termination times, as well as a wide range of other job options. See the
bsub
command in the Platform LSF Reference for more details onbsub
options.Submitting a job to a specific queue (bsub -q)
Job queues represent different job scheduling and control policies. All jobs submitted to the same queue share the same scheduling and control policy. Each job queue can use a configured subset of server hosts in the cluster; the default is to use all server hosts.
System administrators can configure job queues to control resource access by different users and types of application. Users select the job queue that best fits each job.
The default queue is normally suitable to run most jobs, but the default queue may assign your jobs a very low priority, or restrict execution conditions to minimize interference with other jobs. If automatic queue selection is not satisfactory, choose the most suitable queue for each job.
The factors affecting which queue to choose are user access restrictions, size of job, resource limits of the queue, scheduling priority of the queue, active time windows of the queue, hosts used by the queue, scheduling load conditions, and the queue description displayed by the
bqueues -l
command.To see available queues, use the
bqueues
command.Use
bqueues -u
user_name
to specify a user or user group so thatbqueues
displays only the queues that accept jobs from these users.The
bqueues -m
host_name
option allows users to specify a host name or host group name so thatbqueues
displays only the queues that use these hosts to run jobs.You can submit jobs to a queue as long as its
STATUS
isOpen
. However, jobs are not dispatched unless the queue isActive
.The following examples are based on the queues defined in the default configuration. Your LSF administrator may have configured different queues.
To run a job during off hours because the job generates very high load to both the file server and the network, you can submit it to the night queue:
%bsub -q night
If you have an urgent job to run, you may want to submit it to the priority queue:
%bsub -q priority
If you want to use hosts owned by others and you do not want to bother the owners, you may want to run your low priority jobs on the idle queue so that as soon as the owner comes back, your jobs get suspended:
%bsub -q idle
If you are running small jobs and do not want to wait too long to get the results, you can submit jobs to the short queue to be dispatched with higher priority:
%bsub -q short
Make sure your jobs are short enough that they are not killed for exceeding the CPU time limit of the queue (check the resource limits of the queue, if any).
If your job requires a specific execution environment, you may need to submit it to a queue that has a particular job starter defined. LSF administrators are able to specify a queue-level job starter as part of the queue definition; ask them for the name of the queue and configuration details.
See Administering Platform LSF for information on queue-level job starters.
Submitting a job associated to a project (bsub -P)
Use the
bsub -P
project_name option to associate a project name with a job.On systems running IRIX 6, before the submitted job begins execution, a new array session is created and the project ID corresponding to the project name is assigned to the session.
Submitting a job associated to a user group (bsub -G)
You can use the
bsub -G
user_groupoption to submit a job and associate it with a specified user group. This option is only useful with fairshare scheduling.
For more details on fairshare scheduling, see Administering Platform LSF.
You can specify any user group to which you belong as long as it does not contain any subgroups. You must be a direct member of the specified user group.
User groups in non-leaf nodes cannot be specified because it will cause ambiguity in determining the correct shares given to a user.
For example, to submit the job
myjob
associated to user groupspecial
:%bsub -G special myjob
Submitting a job with a job name (bsub -J)
Use
bsub -J
job_name to submit a job and assign a job name to it.You can later use the job name to identify the job. The job name need not be unique.
For example, to submit a job and assign the name
my_job
:%bsub -J my_job
You can also assign a job name to a job array. See Administering Platform LSF for more information about job arrays.
Submitting a job to a service class (bsub -sla)
Use the
bsub -sla
service_class_name to submit a job to a service class for SLA-driven scheduling.You submit jobs to a service class as you would to a queue, except that a service class is a higher level scheduling policy that makes use of other, lower level LSF policies like queues and host partitions to satisfy the service-level goal that the service class expresses.
For example:
% bsub -W 15 -sla Kyuquot sleep 100submits the UNIX command
sleep
together with its argument 100 as a job to the service class namedKyuquot
.The service class name where the job is to run is configured in
lsb.serviceclasses
. If the SLA does not exist or the user is not a member of the service class, the job is rejected.Outside of the configured time windows, the SLA is not active, and LSF schedules jobs without enforcing any service-level goals. Jobs will flow through queues following queue priorities even if they are submitted with
-sla
.
You should submit your jobs with a run time limit (-W
option) or the queue should specify a run time limit (RUNLIMIT in the queue definition inlsb.queues
). If you do not specify a run time limit, LSF automatically adjusts the optimum number of running jobs according to the observed run time of finished jobs.
See Administering Platform LSF for more information about service classes and goal-oriented SLA driven scheduling.
Submitting a job under a job group (bsub -g)
Use
bsub -g
to submit a job into a job group. The job group does not have to exist before submitting the job. For example:% bsub -g /risk_group/portfolio1/current myjob Job <105> is submitted to default queue.Submits
myjob
to the job group/risk_group/portfolio1/current
.If group
/risk_group/portfolio1/current
exists, job 105 is attached to the job group.If group
/risk_group/portfolio1/current
does not exist, LSF checks its parent recursively, and if no groups in the hierarchy exist, all three job groups are created with the specified hierarchy and the job is attached to group.See Administering Platform LSF for more information about job groups.
[ Top ]
Modifying a Submitted Job (bmod)
[ Top ]
Modifying Pending Jobs (bmod)
If your submitted jobs are pending (
bjobs
shows the job in PEND state), use thebmod
command to modify job submission parameters. You can also modify entire job arrays or individual elements of a job array.See the
bmod
command in the Platform LSF Reference for more details.Replacing the job command-line
To replace the job command line, use the
bmod -Z "
new_command"
option. The following example replaces the command line option for job 101 with"myjob file"
:%bmod -Z "myjob file" 101
Changing a job parameter
To change a specific job parameter, use
bmod
with thebsub
option used to specify the parameter. The specified options replace the submitted options. The following example changes the start time of job 101 to 2:00 a.m.:%bmod -b 2:00 101
Resetting to default submitted value
To reset an option to its default submitted value (undo a
bmod
), append then
character to the option name, and do not include an option value. The following example resets the start time for job 101 back to its default value:%bmod -bn 101
Resource reservation can be modified after a job has been started to ensure proper reservation and optimal resource utilization.
Modifying a job submitted to a service class
Use the
-sla
option ofbmod
to modify the service class a job is attached to, or to attach a submitted job to a service class. Usebmod -slan
to detach a job from a service class. For example:
%
bmod -sla Kyuquot 2307
Attaches job 2307 to the service class
Kyuquot
.
%
bmod -slan 2307
Detaches job 2307 from the service class
Kyuquot
.You cannot:
- Use
-sla
with otherbmod
options- Move job array elements from one service class to another, only entire job arrays
- Modify the service class of jobs already attached to a job group
See Administering Platform LSF for more information about submitting jobs to service classes for SLA-driven scheduling.
Modifying a job submitted to a job group
Use the
-g
option ofbmod
and specify a job group path to move a job or a job array from one job group to another. For example:% bmod -g /risk_group/portfolio2/monthly 105moves job 105 to job group
/risk_group/portfolio2/monthly
.Like
bsub -g
, if the job group does not exist, LSF creates it.
bmod -g
cannot be combined with otherbmod
options. It can operate on finished, running, and pending jobs.You can modify your own job groups and job groups that other users create under your job groups. The LSF administrator can modify job groups of all users.
You cannot move job array elements from one job group to another, only entire job arrays. A job array can only belong to one job group at a time. You cannot modify the job group of a job attached to a service class.
bhist -l
shows job group modification information:% bhist -l 105 Job <105>, User <user1>, Project <default>, Job Group </risk_group>, Command <myjob> Wed May 14 15:24:07: Submitted from host <hostA>, to Queue <normal>, CWD <$HOME/lsf51/5.1/sparc-sol7-64/bin>; Wed May 14 15:24:10: Parameters of Job are changed: Job group changes to: /risk_group/portfolio2/monthly; Wed May 14 15:24:17: Dispatched to <hostA>; Wed May 14 15:24:17: Starting (Pid 8602); ...See Administering Platform LSF for more information about job groups.
[ Top ]
Modifying Running Jobs
Modifying resource reservation
A job is usually submitted with a resource reservation for the maximum amount required. Use
bmod -R
to modify the resource reservation for a running job. This command is usually used to decrease the reservation, allowing other jobs access to the resource.The following example sets the resource reservation for job 101 to 25MB of memory and 50 MB of swap space:
%bmod -R "rusage[mem=25:swp=50]" 101
By default, you can modify resource reservation for running jobs. Set LSB_MOD_ALL_JOBS in
lsf.conf
to modify additional job options.See Reserving Resources for Jobs for more details.
Modifying other job options
If LSB_MOD_ALL_JOBS is specified in
lsf.conf
, the job owner or the LSF administrator can use thebmod
command to modify the following job options for running jobs:
- CPU limit (
-c
[hour:
]minute[/
host_name |/
host_model] |-cn
)- Memory limit (
-M
mem_limit |-Mn
)- Run limit (
-W
run_limit[/
host_name |/
host_model] |-Wn
)- Standard output file name (
-o
output_file |-on
)- Standard error file name (
-e
error_file |-en
)- Rerunnable jobs (
-r
|-rn
)In addition to resource reservation, these are the only
bmod
options that are valid for running jobs. You cannot make any other modifications after a job has been dispatched.An error message is issued and the modification fails if these options are used on running jobs in combination with other
bmod
options.Modifying resource limits for running jobs
The new resource limits cannot exceed the resource limits defined in the queue.
To modify the CPU limit of running jobs, LSB_JOB_CPULIMIT=Y must be defined in
lsf.conf
.To modify the memory limit of running jobs, LSB_JOB_MEMLIMIT=Y must be defined in
lsf.conf
.Limitations
Modifying remote running jobs in a MultiCluster environment is not supported.
To modify the name of job error file for a running job, you must use
bsub -e
orbmod -e
to specify an error file before the job starts running.For more information
See Administering Platform LSF for more information about job output files, using job-level resource limits, and submitting rerunnable jobs.
[ Top ]
Controlling Jobs
LSF controls jobs dispatched to a host to enforce scheduling policies, or in response to user requests. The LSF system performs the following actions on a job:
- Suspend by sending a
SIGSTOP
signal- Resume by sending a
SIGCONT
signal- Terminate by sending a
SIGKILL
signalOn Windows, equivalent functions have been implemented to perform the same tasks.
- Killing Jobs (bkill)
- Suspending and Resuming Jobs (bstop and bresume)
- Changing Job Order Within Queues (bbot and btop)
[ Top ]
Killing Jobs (bkill)
The
bkill
command cancels pending batch jobs and sends signals to running jobs. By default, on UNIX,bkill
sends theSIGKILL
signal to running jobs.Before
SIGKILL
is sent,SIGINT
andSIGTERM
are sent to give the job a chance to catch the signals and clean up. The signals are forwarded frommbatchd
tosbatchd
, which waits for the job to exit before reporting the status. Because of these delays, for a short period of time after entering thebkill
command,bjobs
may still report that the job is running.On Windows, job control messages replace the
SIGINT
andSIGTERM
signals, and termination is implemented by theTerminateProcess()
system call.To kill job 3421:
%bkill 3421
Job <3421> is being terminatedForcing removal of a job from LSF
If a job cannot be killed in the operating system, use
bkill -r
to force the removal of the job from LSF.The
bkill -r
command removes a job from the system without waiting for the job to terminate in the operating system. This sends the same series of signals asbkill
without -r
, except that the job is removed from the system immediately, the job is marked as EXIT, and job resources that LSF monitors are released as soon as LSF receives the first signal.[ Top ]
Suspending and Resuming Jobs (bstop and bresume)
The
bstop
andbresume
commands allow you to suspend or resume a job.A job can also be suspended by its owner or the LSF administrator with the
bstop
command. These jobs are considered user-suspended and are displayed bybjobs
asUSUSP
.When the user restarts the job with the
bresume
command, the job is not started immediately to prevent overloading. Instead, the job is changed fromUSUSP
toSSUSP
(suspended by the system). TheSSUSP
job is resumed when host load levels are within the scheduling thresholds for that job, similarly to jobs suspended due to high load.If a user suspends a high priority job from a non-preemptive queue, the load may become low enough for LSF to start a lower priority job in its place. The load created by the low priority job can prevent the high priority job from resuming.
This can be avoided by configuring preemptive queues. See Administering Platform LSF for information about configuring queues.
Suspending a job
To suspend a job, use the
bstop
command. Suspending a job causes your job to go intoUSUSP
state if the job is already started, or to go intoPSUSP
state if your job is pending.By default, jobs that are suspended by the administrator can only be resumed by the administrator or
root
; users do not have permission to resume a job suspended by another user or the administrator. Administrators can resume jobs suspended by users or administrators. Administrators can also enable users to resume their own jobs that have been stopped by an administrator.
bstop
sends the following signals to the job:
SIGTSTP
for parallel or interactive jobs
SIGTSTP
is caught by the master process and passed to all the slave processes running on other hosts.SIGSTOP
for sequential jobs
SIGSTOP
cannot be caught by user programs. TheSIGSTOP
signal can be configured with the LSB_SIGSTOP parameter inlsf.conf
.To suspend job 3421, enter:
%bstop 3421
Job <3421> is being stoppedResuming a job
To resume a job, use the
bresume
command.Resuming a user-suspended job does not put your job into
RUN
state immediately. If your job was running before the suspension,bresume
first puts your job intoSSUSP
state and then waits forsbatchd
to schedule it according to the load conditions.For example, to resume job 3421, enter:
%bresume 3421
Job <3421> is being resumedYou cannot resume jobs suspended by another user; you can only resume your own jobs. If your job was suspended by the administrator, you cannot resume it; the administrator or
root
must resume the job for you.ENABLE_USER_RESUME parameter (lsb.params)
If ENABLE_USER_RESUME=Y in
lsb.params
, you can resume your own jobs that have been suspended by the administrator.[ Top ]
Changing Job Order Within Queues (bbot and btop)
By default, LSF dispatches jobs in a queue in the order of arrival (that is, first-come-first-served), subject to availability of suitable server hosts.
Use the
btop
andbbot
commands to change the position of pending jobs, or of pending job array elements, to affect the order in which jobs are considered for dispatch. Users can only change the relative position of their own jobs, and LSF administrators can change the position of any users' jobs.Moving a job to the bottom of a queue
Use
bbot
to move jobs relative to your last job in the queue.If invoked by a regular user,
bbot
moves the selected job after the last job with the same priority submitted by the user to the queue.If invoked by the LSF administrator,
bbot
moves the selected job after the last job with the same priority submitted to the queue.Moving a job to the top of a queue
Use
btop
to move jobs relative to your first job in the queue.If invoked by a regular user,
btop
moves the selected job before the first job with the same priority submitted by the user to the queue.If invoked by the LSF administrator,
btop
moves the selected job before the first job with the same priority submitted to the queue.In the following example, job 5311 is moved to the top of the queue. Since job 5308 is already running, job 5311 is placed in the queue after job 5308.
Note thatuser1
's job is still in the same position on the queue.user2
cannot usebtop
to get extra jobs at the top of the queue; when one of his jobs moves up the queue, the rest of his jobs move down.
%bjobs -u all
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 5308 user2 RUN normal hostA hostD /s500 Oct 23 10:16 5309 user2 PEND night hostA /s200 Oct 23 11:04 5310 user1 PEND night hostB /myjob Oct 23 13:45 5311 user2 PEND night hostA /s700 Oct 23 18:17 %btop 5311
Job <5311> has been moved to position 1 from top. %bjobs -u all
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 5308 user2 RUN normal hostA hostD /s500 Oct 23 10:16 5311 user2 PEND night hostA /s200 Oct 23 18:17 5310 user1 PEND night hostB /myjob Oct 23 13:45 5309 user2 PEND night hostA /s700 Oct 23 11:04[ Top ]
Controlling Jobs in Job Groups
Stopping (bstop)
Use the
-g
option ofbstop
and specify a job group path to suspend jobs in a job group% bstop -g /risk_group 106 Job <106> is being stoppedUse job ID 0 (zero) to suspend all jobs in a job group:
% bstop -g /risk_group/consolidate 0 Job <107> is being stopped Job <108> is being stopped Job <109> is being stoppedResuming (bresume)
Use the
-g
option ofbresume
and specify a job group path to resume suspended jobs in a job group:% bresume -g /risk_group 106 Job <106> is being resumedUse job ID 0 (zero) to resume all jobs in a job group:
% bresume -g /risk_group 0 Job <109> is being resumed Job <110> is being resumed Job <112> is being resumedTerminating (bkill)
Use the
-g
option ofbkill
and specify a job group path to terminate jobs in a job group. For example,% bkill -g /risk_group 106 Job <106> is being terminatedUse job ID 0 (zero) to terminate all jobs in a job group:
% bkill -g /risk_group 0 Job <1413> is being terminated Job <1414> is being terminated Job <1415> is being terminated Job <1416> is being terminated
bkill
only kills jobs in the job group you specify. It does not kill jobs in lower level job groups in the path. For example, jobs are attached to job groups/risk_group
and/risk_group/consolidate
:% bsub -g /risk_group myjob Job <115> is submitted to default queue <normal>. % bsub -g /risk_group/consolidate myjob2 Job <116> is submitted to default queue <normal>.The following
bkill
command only kills jobs in/risk_group
, not the subgroup/risk_group/consolidate
:% bkill -g /risk_group 0 Job <115> is being terminated % bkill -g /risk_group/consolidate 0 Job <116> is being terminatedDeleting (bgdel)
Use
bgdel
command to remove a job group. The job group cannot contain any jobs. For example:% bgdel /risk_group Job group /risk_group is deleted.deletes the job group
/risk_group
and all its subgroups.For more information
See Administering Platform LSF for more information about using job groups.
[ Top ]
Submitting a Job to Specific Hosts
To indicate that a job must run on one of the specified hosts, use the
bsub
-m "
hostA
hostB
..."
option.By specifying a single host, you can force your job to wait until that host is available and then run on that host.
For example:
%bsub -q idle -m "hostA hostD hostB" myjob
This command submits
myjob
to the idle queue and tells LSF to choose one host fromhostA
,hostD
andhostB
to run the job. All other batch scheduling conditions still apply, so the selected host must be eligible to run the job.Resources and bsub -m
If you have applications that need specific resources, it is more flexible to create a new Boolean resource and configure that resource for the appropriate hosts in the cluster.
This must be done by the LSF administrator. If you specify a host list using the
-m
option ofbsub
, you must change the host list every time you add a new host that supports the desired resources. By using a Boolean resource, the LSF administrator can add, move or remove resources without forcing users to learn about changes to resource configuration.[ Top ]
Submitting a Job and Indicating Host Preference
When several hosts can satisfy the resource requirements of a job, the hosts are ordered by load. However, in certain situations it may be desirable to override this behavior to give preference to specific hosts, even if they are more heavily loaded.
For example, you may have licensed software which runs on different groups of hosts, but you prefer it to run on a particular host group because the jobs will finish faster, thereby freeing the software license to be used by other jobs.
Another situation arises in clusters consisting of dedicated batch servers and desktop machines which can also run jobs when no user is logged in. You may prefer to run on the batch servers and only use the desktop machines if no server is available.
To see a list of available hosts, use the
bhosts
command.
- Submitting a job with host preference
- Submitting a job with different levels of host preference
- Submitting a job with resource requirements
Submitting a job with host preference
The
bsub -m
option allows you to indicate preference by using+
with an optional preference level after the host name. The keywordothers
can be used to refer to all the hosts that are not explicitly listed. You must specifyothers
with at least one host name or host group name.For example:
%bsub -m "hostD+ others" -R "solaris && mem> 10" myjob
In this example, LSF selects all
solaris
hosts that have more than 10 MB of memory available. IfhostD
meets this criteria, it will be picked over any other host which otherwise meets the same criteria. IfhostD
does not meet the criteria, the least loaded host among the others will be selected. All the other hosts are considered as a group and are ordered by load.A queue can also define host preferences for jobs. Host preferences specified by
bsub -m
override the queue specification.In the queue definition in
lsb.queues
, use the HOSTS parameter to list the hosts or host groups to which the queue can dispatch jobs.Use the not operator (
~
) to exclude hosts or host groups from the list of hosts to which the queue can dispatch jobs. This is useful if you have a large cluster, but only want to exclude a few hosts from the queue definition.See the Platform LSF Reference for information about the
lsb.queues
file.Submitting a job with different levels of host preference
You can indicate different levels of preference by specifying a number after the plus sign
(+)
. The larger the number, the higher the preference for that host or host group. You can also specify the+
with the keywordothers
.For example:
%bsub -m "groupA+2 groupB+1 groupC" myjob
In this example, LSF gives first preference to hosts in
groupA
, second preference to hosts ingroupB
and last preference to those ingroupC
. Ordering within a group is still determined by load.You can use the
bmgroup
command to display configured host groups.Submitting a job with resource requirements
To submit a job which will run on Solaris 7 or Solaris 8:
%bsub -R "sol7 || sol8" myjob
When you submit a job, you can also exclude a host by specifying a resource requirement using
hname
resource:%bsub -R "hname!=hostb && type==sgi6" myjob
[ Top ]
Using LSF with Non-Shared File Space
LSF is usually used in networks with shared file space. When shared file space is not available, use the
bsub -f
command to have LSF copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes.LSF attempts to run the job in the directory where the
bsub
command was invoked. If the execution directory is under the user's home directory,sbatchd
looks for the path relative to the user's home directory. This handles some common configurations, such as cross-mounting user home directories with the/net
automount option.If the directory is not available on the execution host, the job is run in
/tmp
. Any files created by the batch job, including the standard output and error files created by the-o
and-e
options tobsub
, are left on the execution host.LSF provides support for moving user data from the submission host to the execution host before executing a batch job, and from the execution host back to the submitting host after the job completes. The file operations are specified with the
-f
option tobsub
.LSF uses the
lsrcp
command to transfer files.lsrcp
contacts RES on the remote host to perform file transfer. If RES is not available, the UNIXrcp
command is used.See Administering Platform LSF for more information about file transfer in LSF.
bsub -f
The
-f "[
local_fileoperator
[
remote_file]]"
option to thebsub
command copies a file between the submission host and the execution host. To specify multiple files, repeat the-f
option.File name on the submission host
File name on the execution host
The files local_file and remote_file can be absolute or relative file path names. You must specify at least one file name. When the file remote_file is not specified, it is assumed to be the same as local_file. Including local_file without the operator results in a syntax error.
Operation to perform on the file. The operator must be surrounded by white space.
Valid values for operator are:
local_file on the submission host is copied to remote_file on the execution host before job execution. remote_file is overwritten if it exists.
remote_file on the execution host is copied to local_file on the submission host after the job completes. local_file is overwritten if it exists.
remote_file is appended to local_file after the job completes. local_file is created if it does not exist.
Equivalent to performing the > and then the < operation. The file local_file is copied to remote_file before the job executes, and remote_file is copied back, overwriting local_file, after the job completes. <> is the same as ><
If the submission and execution hosts have different directory structures, you must ensure that the directory where remote_file and local_file will be placed exists. LSF tries to change the directory to the same path name as the directory where the
bsub
command was run. If this directory does not exist, the job is run in your home directory on the execution host.You should specify remote_file as a file name with no path when running in non-shared file systems; this places the file in the job's current working directory on the execution host. This way the job will work correctly even if the directory where the
bsub
command is run does not exist on the execution host. Be careful not to overwrite an existing file in your home directory.[ Top ]
Reserving Resources for Jobs
About resource reservation
When a job is dispatched, the system assumes that the resources that the job consumes will be reflected in the load information. However, many jobs do not consume the resources they require when they first start. Instead, they will typically use the resources over a period of time.
For example, a job requiring 100 MB of swap is dispatched to a host having 150 MB of available swap. The job starts off initially allocating 5 MB and gradually increases the amount consumed to 100 MB over a period of 30 minutes. During this period, another job requiring more than 50 MB of swap should not be started on the same host to avoid over-committing the resource.
You can reserve resources to prevent overcommitment by LSF. Resource reservation requirements can be specified as part of the resource requirements when submitting a job, or can be configured into the queue level resource requirements.
Viewing host-level resource information
Use
bhosts -l
to view the amount of resources reserved on each host. Usebhosts -s
to view information about shared resources.Viewing queue-level resource information
To see the resource usage configured at the queue level, use
bqueues -l
.How resource reservation works
When deciding whether to schedule a job on a host, LSF considers the reserved resources of jobs that have previously started on that host. For each load index, the amount reserved by all jobs on that host is summed up and subtracted (or added if the index is increasing) from the current value of the resources as reported by the LIM to get amount available for scheduling new jobs:
available amount = current value - reserved amount for all jobsUsing the rusage string
To specify resource reservation at the job level, use
bsub -R
and include the resource usage section in the resource requirement (rusage
) string.For example:
%bsub -R "rusage[tmp=30:duration=30:decay=1]" myjob
will reserve 30 MB of temp space for the job. As the job runs, the amount reserved will decrease at approximately 1 MB/minute such that the reserved amount is 0 after 30 minutes.
[ Top ]
Submitting a Job with Start or Termination Times
By default, LSF dispatches jobs as soon as possible, and then allows them to finish, although resource limits might terminate the job before it finishes.
You can specify a time of day at which to start or terminate a job.
Submitting a job with a start time
If you do not want to start your job immediately when you submit it, use
bsub -b
to specify a start time. LSF will not dispatch the job before this time. For example:%bsub -b 5:00 myjob
This example submits a job that remains pending until after the local time on the master host reaches 5 a.m.
Submitting a job with a termination time
Use
bsub -t
to submit a job and specify a time after which the job should be terminated. For example:%bsub -b 11:12:5:40 -t 11:12:20:30 myjob
The job called
myjob
is submitted to the default queue and will start after November 12 at 05:40 a.m. If the job is still running on November 12 at 8:30 p.m., it will be killed.
[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: November 21, 2003
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2003 Platform Computing Corporation. All rights reserved.