-
Challenge – Scheduling machine learning jobs in a high-performance computing cluster, where jobs have varying urgency based on factors like deadlines, computational needs, and impact.
-
Scenario – Urgent jobs (e.g., medical diagnosis model fine-tuning) should receive resources before less critical jobs (e.g., movie recommendation model training), regardless of submission time.
-
Question – How to efficiently prioritize and select the most urgent job as new jobs continuously arrive in the system?