SlideShare a Scribd company logo
1 of 13
Scheduling with Torque-Maui – A Tutorial
Contents The problem being addressed Torque – how it helps Maui – how it helps Job Submission – job priorities, job dependencies, job queues Job Monitoring Job Accounting Install
The problem Have jobs/tasks run as soon as possible Have higher priority jobs run earlier than others Run jobs on any free machine across a cluster automatically not just on one machine Have jobs run un-attended and inform in case of error Machine utilization has to be high Monitor and account for all the usage
Torque – how it helps What is TORQUE’s job as the resource manager. Accepting and starting jobs/tasks across a batch farm (qsub command) Cancelling jobs (qdel command) Monitoring the state of jobs (qstatcommand) Collecting return codes (qstat) Accounting of jobs, the time they took, memory used, etc (tracejob command)
Maui – how it helps What is MAUI’s Job? MAUI makes all the decisions. Should a job be started asking questions like: Is there enough resource to start the job? Given all the jobs I could start which one should I start? MAUI runs a scheduling iteration: When a job is submitted. When a job ends. At regular configurable intervals.
Job Submission Jobs are submitted to the batch system by means of the qsub command, as in qsub job.sh But you can also add resource description directly on the command line: qsub -l nodes=1:ppn=4 job.sh:mem=200mb:walltime=120 job.sh qsub Returns a <jobid>
Job priority Can give priority with qsub qsub –p 20 job.sh Default priority is 0 U can give priorities from 0 to 1023 for a job
Job dependencies Run a job after another job successfully ends echo “vflush” | qsub -W depend=afterok:10.penguin7.orchesys.com -p 10 -q flush_queue Here ‘10.penguin7.orchesys.com’ is jobid of another job which has to complete successfully only then the current job is launched.
Job Queues Batch systems are usually configured with multiple queues. Each queue can be configured to accept job from a certain group of users, or within specified resource limits Queue selection is performed with -q queuename on the qsubcommand line Glassbeam has default queue (batch) and flush_queue (where only one job can run at a time)
Job Monitoring For a job id, u can see the command that was fired for the job in the file /var/spool/torque/server_priv/jobs/<JOBID.SC> sudo cat 90.localhost.localdomain.SC /home/gbprod/testscript_aruba/aruba_parallel_loader  qa0 1306219430 aruba_test_pod /glassbeam/core/bin qstat – status of all submitted jobs  Status of only one job - qstat <jobid> Only running jobs - qstat –r Email alert for jobs - qsub -m ae -M santoshglassbeam.com  (Send email in case of a – abort, e – end of job)
Job accounting … Can give job return status, how much time and  show what happened today to job id Tracejob <jobid> tracejob -n d <jobid> (search last d days for the job),  fast version of tracejob: tracejob -f error -f system -f admin -f security -f sched -f debug -f debug2  -f job -f job_usage 114.localhost
Job accounting Tracejob output Job: 114.localhost.localdomain 05/30/2011 05:25:15  A    queue=batch 05/30/2011 05:25:15  A    user=gbprod group=glassbeamjobname=STDIN queue=batch ctime=1306747515 qtime=1306747515 etime=1306747515                           start=1306747515 owner=gbprod@localhost.localdomain exec_host=localhost/0 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 05/30/2011 05:25:25  A    user=gbprod group=glassbeamjobname=STDIN queue=batch ctime=1306747515 qtime=1306747515 etime=1306747515                           start=1306747515 owner=gbprod@localhost.localdomain exec_host=localhost/0 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1                           session=26992 end=1306747525 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:10
Install Torque install As root user Go to folder install/torque-gb-3.0.1 Run command: ./torque.setupgbprodlocalhost Maui install As root user  Go to folder install/maui-gb-3.3.1 Run command shinstall.sh

More Related Content

What's hot

All you need to know about the JavaScript event loop
All you need to know about the JavaScript event loopAll you need to know about the JavaScript event loop
All you need to know about the JavaScript event loopSaša Tatar
 
Quartz connector
Quartz connectorQuartz connector
Quartz connectorRahul Kumar
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationAndrew Hutchings
 
Nasamatic NewHaven.IO 2014 05-21
Nasamatic NewHaven.IO 2014 05-21Nasamatic NewHaven.IO 2014 05-21
Nasamatic NewHaven.IO 2014 05-21Prasanna Gautam
 
NSClient++ Workshop: 06 Scripting
NSClient++ Workshop: 06 ScriptingNSClient++ Workshop: 06 Scripting
NSClient++ Workshop: 06 ScriptingMichael Medin
 
On the way to low latency
On the way to low latencyOn the way to low latency
On the way to low latencyArtem Orobets
 
101 apend. scripting, crond, atd
101 apend. scripting, crond, atd101 apend. scripting, crond, atd
101 apend. scripting, crond, atdAcácio Oliveira
 
Cassandra Cluster Manager (CCM)
Cassandra Cluster Manager (CCM)Cassandra Cluster Manager (CCM)
Cassandra Cluster Manager (CCM)Chris Lohfink
 
nouka inventry manager
nouka inventry managernouka inventry manager
nouka inventry managerToshiaki Baba
 
Linux fundamental - Chap 15 Job Scheduling
Linux fundamental - Chap 15 Job SchedulingLinux fundamental - Chap 15 Job Scheduling
Linux fundamental - Chap 15 Job SchedulingKenny (netman)
 
Gearinfive
GearinfiveGearinfive
Gearinfivebpmedley
 
Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?The Software House
 
OpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvmOpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvmJooho Lee
 
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang MeetupДоклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang MeetupBadoo Development
 
Simple Tips and Tricks with Ansible
Simple Tips and Tricks with AnsibleSimple Tips and Tricks with Ansible
Simple Tips and Tricks with AnsibleKeith Resar
 
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...VCP Muthukrishna
 
agri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertoragri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertorToshiaki Baba
 

What's hot (20)

All you need to know about the JavaScript event loop
All you need to know about the JavaScript event loopAll you need to know about the JavaScript event loop
All you need to know about the JavaScript event loop
 
Puppet Data Mining
Puppet Data MiningPuppet Data Mining
Puppet Data Mining
 
Quartz connector
Quartz connectorQuartz connector
Quartz connector
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free Migration
 
Puppet and Openshift
Puppet and OpenshiftPuppet and Openshift
Puppet and Openshift
 
Nasamatic NewHaven.IO 2014 05-21
Nasamatic NewHaven.IO 2014 05-21Nasamatic NewHaven.IO 2014 05-21
Nasamatic NewHaven.IO 2014 05-21
 
NSClient++ Workshop: 06 Scripting
NSClient++ Workshop: 06 ScriptingNSClient++ Workshop: 06 Scripting
NSClient++ Workshop: 06 Scripting
 
On the way to low latency
On the way to low latencyOn the way to low latency
On the way to low latency
 
101 apend. scripting, crond, atd
101 apend. scripting, crond, atd101 apend. scripting, crond, atd
101 apend. scripting, crond, atd
 
Cassandra Cluster Manager (CCM)
Cassandra Cluster Manager (CCM)Cassandra Cluster Manager (CCM)
Cassandra Cluster Manager (CCM)
 
nouka inventry manager
nouka inventry managernouka inventry manager
nouka inventry manager
 
Linux fundamental - Chap 15 Job Scheduling
Linux fundamental - Chap 15 Job SchedulingLinux fundamental - Chap 15 Job Scheduling
Linux fundamental - Chap 15 Job Scheduling
 
Gearinfive
GearinfiveGearinfive
Gearinfive
 
Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 
OpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvmOpenShift4 Installation by UPI on kvm
OpenShift4 Installation by UPI on kvm
 
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang MeetupДоклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
 
Simple Tips and Tricks with Ansible
Simple Tips and Tricks with AnsibleSimple Tips and Tricks with Ansible
Simple Tips and Tricks with Ansible
 
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...
Zimbra Troubleshooting - Mails not being Delivered or Deferred or Connection ...
 
agri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertoragri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertor
 

Similar to Scheduling torque-maui-tutorial

Analysing in depth work manager
Analysing in depth work managerAnalysing in depth work manager
Analysing in depth work managerlpu
 
Analysing in depth work manager
Analysing in depth work managerAnalysing in depth work manager
Analysing in depth work managerbhatnagar.gaurav83
 
Processes And Job Control
Processes And Job ControlProcesses And Job Control
Processes And Job Controlahmad bassiouny
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking VN
 
Why you should revisit mgmt
Why you should revisit mgmtWhy you should revisit mgmt
Why you should revisit mgmtJulien Pivotto
 
To Batch Or Not To Batch
To Batch Or Not To BatchTo Batch Or Not To Batch
To Batch Or Not To BatchLuca Mearelli
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freqLinaro
 
Node.js flow control
Node.js flow controlNode.js flow control
Node.js flow controlSimon Su
 
Process scheduling
Process schedulingProcess scheduling
Process schedulingHao-Ran Liu
 
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]Accumulo Summit
 
FireWorks workflow software
FireWorks workflow softwareFireWorks workflow software
FireWorks workflow softwareAnubhav Jain
 

Similar to Scheduling torque-maui-tutorial (20)

Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Introduction to SLURM
 
Introduction to SLURM
 Introduction to SLURM Introduction to SLURM
Introduction to SLURM
 
Introduction to Slurm
Introduction to SlurmIntroduction to Slurm
Introduction to Slurm
 
Analysing in depth work manager
Analysing in depth work managerAnalysing in depth work manager
Analysing in depth work manager
 
Introduction to SLURM
Introduction to SLURMIntroduction to SLURM
Introduction to SLURM
 
Salesforce asynchronous apex
Salesforce asynchronous apexSalesforce asynchronous apex
Salesforce asynchronous apex
 
Analysing in depth work manager
Analysing in depth work managerAnalysing in depth work manager
Analysing in depth work manager
 
Processes And Job Control
Processes And Job ControlProcesses And Job Control
Processes And Job Control
 
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
 
Why you should revisit mgmt
Why you should revisit mgmtWhy you should revisit mgmt
Why you should revisit mgmt
 
To Batch Or Not To Batch
To Batch Or Not To BatchTo Batch Or Not To Batch
To Batch Or Not To Batch
 
Celery
CeleryCelery
Celery
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freq
 
Node.js flow control
Node.js flow controlNode.js flow control
Node.js flow control
 
CA 7-final-ppt
CA 7-final-pptCA 7-final-ppt
CA 7-final-ppt
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
 
Process scheduling
Process schedulingProcess scheduling
Process scheduling
 
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]
 
Queue your work
Queue your workQueue your work
Queue your work
 
FireWorks workflow software
FireWorks workflow softwareFireWorks workflow software
FireWorks workflow software
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Scheduling torque-maui-tutorial

  • 2. Contents The problem being addressed Torque – how it helps Maui – how it helps Job Submission – job priorities, job dependencies, job queues Job Monitoring Job Accounting Install
  • 3. The problem Have jobs/tasks run as soon as possible Have higher priority jobs run earlier than others Run jobs on any free machine across a cluster automatically not just on one machine Have jobs run un-attended and inform in case of error Machine utilization has to be high Monitor and account for all the usage
  • 4. Torque – how it helps What is TORQUE’s job as the resource manager. Accepting and starting jobs/tasks across a batch farm (qsub command) Cancelling jobs (qdel command) Monitoring the state of jobs (qstatcommand) Collecting return codes (qstat) Accounting of jobs, the time they took, memory used, etc (tracejob command)
  • 5. Maui – how it helps What is MAUI’s Job? MAUI makes all the decisions. Should a job be started asking questions like: Is there enough resource to start the job? Given all the jobs I could start which one should I start? MAUI runs a scheduling iteration: When a job is submitted. When a job ends. At regular configurable intervals.
  • 6. Job Submission Jobs are submitted to the batch system by means of the qsub command, as in qsub job.sh But you can also add resource description directly on the command line: qsub -l nodes=1:ppn=4 job.sh:mem=200mb:walltime=120 job.sh qsub Returns a <jobid>
  • 7. Job priority Can give priority with qsub qsub –p 20 job.sh Default priority is 0 U can give priorities from 0 to 1023 for a job
  • 8. Job dependencies Run a job after another job successfully ends echo “vflush” | qsub -W depend=afterok:10.penguin7.orchesys.com -p 10 -q flush_queue Here ‘10.penguin7.orchesys.com’ is jobid of another job which has to complete successfully only then the current job is launched.
  • 9. Job Queues Batch systems are usually configured with multiple queues. Each queue can be configured to accept job from a certain group of users, or within specified resource limits Queue selection is performed with -q queuename on the qsubcommand line Glassbeam has default queue (batch) and flush_queue (where only one job can run at a time)
  • 10. Job Monitoring For a job id, u can see the command that was fired for the job in the file /var/spool/torque/server_priv/jobs/<JOBID.SC> sudo cat 90.localhost.localdomain.SC /home/gbprod/testscript_aruba/aruba_parallel_loader qa0 1306219430 aruba_test_pod /glassbeam/core/bin qstat – status of all submitted jobs Status of only one job - qstat <jobid> Only running jobs - qstat –r Email alert for jobs - qsub -m ae -M santoshglassbeam.com (Send email in case of a – abort, e – end of job)
  • 11. Job accounting … Can give job return status, how much time and show what happened today to job id Tracejob <jobid> tracejob -n d <jobid> (search last d days for the job), fast version of tracejob: tracejob -f error -f system -f admin -f security -f sched -f debug -f debug2 -f job -f job_usage 114.localhost
  • 12. Job accounting Tracejob output Job: 114.localhost.localdomain 05/30/2011 05:25:15 A queue=batch 05/30/2011 05:25:15 A user=gbprod group=glassbeamjobname=STDIN queue=batch ctime=1306747515 qtime=1306747515 etime=1306747515 start=1306747515 owner=gbprod@localhost.localdomain exec_host=localhost/0 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 05/30/2011 05:25:25 A user=gbprod group=glassbeamjobname=STDIN queue=batch ctime=1306747515 qtime=1306747515 etime=1306747515 start=1306747515 owner=gbprod@localhost.localdomain exec_host=localhost/0 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 session=26992 end=1306747525 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:10
  • 13. Install Torque install As root user Go to folder install/torque-gb-3.0.1 Run command: ./torque.setupgbprodlocalhost Maui install As root user Go to folder install/maui-gb-3.3.1 Run command shinstall.sh