Multi-stage parsing with confidence
Native text extraction first, OCR if image-only or low-confidence, LLM cleanup for OCR garbage. Each stage records confidence so downstream consumers know how much to trust the output.
Turns raw resumes and job descriptions into a ranked shortlist with reasons. Recruiters see ordered candidates, read the rationale, and act.
Ingests resumes (PDF, Word, image scans, or plain text). Parses each into a structured candidate profile: contact, education, experience, skills, certifications, location. For any open role, scores every candidate against the job description across multiple dimensions and produces a ranked list with reasons. The recruiter sees: the score, the top three reasons this candidate fits, the top three concerns, and a one-paragraph summary. They can override, reject, or push to the next stage.
Before: HR teams looked at 200-500 resumes per role manually. Spreadsheet tracking. Comments lost in email threads. Senior candidates buried under junior ones with louder buzzwords. No structured record of why someone was rejected. Recruiter time spent on first-pass triage instead of interviews. After: a role that took 5 days to shortlist now takes 30 minutes once the resumes are uploaded.
Drag-drop or bulk import. File pre-processing: PDF text extraction (PyMuPDF), OCR fallback for scans (Tesseract + LLM cleanup), DOCX to plain text.
One LLM call per resume with a strict JSON schema: name, contacts, education, experience, skills, summary. Schema violations retry; after 3 fails the resume goes to manual review.
Resume-level embedding for the full text, plus section-level embeddings for each role and skill cluster. Stored for fast re-scoring against multiple jobs.
Same pipeline for the JD: extract must-have / nice-to-have requirements, generate JD embedding.
Embedding cosine similarity, plus rule-based filters (years of experience, location, language). LLM produces pros/cons/verdict. Weighted aggregate score 0-100. Recruiter sees a ranked list.
Time to shortlist
5 days → 30 min
Resumes per role
200-500
File formats
PDF, DOCX, image OCR
Languages
Arabic + English
Schema retry
3 attempts then manual
These are not mockups. Every screenshot below is from the system running in production.





For anyone evaluating the system from an engineering angle: why these choices, and what was traded off.
Native text extraction first, OCR if image-only or low-confidence, LLM cleanup for OCR garbage. Each stage records confidence so downstream consumers know how much to trust the output.
A senior backend engineer who did one year of frontend is not a frontend candidate. Section embeddings let us match the right experience to the right requirement.
Recruiters do not trust a black-box "92% fit." They trust "Top three reasons + Top three concerns + summary." The LLM produces all three.
When a recruiter overrides the ranking, that signal feeds back into the model weights for next time. The system learns from real decisions.
The system ranks and explains. It never rejects a candidate without a human in the loop. Stakes are too high.
Share the workflow and the systems you use today. Within 24 hours we reply with scope, KPIs, timeline, and a SAR estimate.
Start nowTurns raw resumes and job descriptions into a ranked shortlist with reasons. Recruiters see ordered candidates, read the rationale, and act.
Ingests resumes (PDF, Word, image scans, or plain text). Parses each into a structured candidate profile: contact, education, experience, skills, certifications, location. For any open role, scores every candidate against the job description across multiple dimensions and produces a ranked list with reasons. The recruiter sees: the score, the top three reasons this candidate fits, the top three concerns, and a one-paragraph summary. They can override, reject, or push to the next stage.
Before: HR teams looked at 200-500 resumes per role manually. Spreadsheet tracking. Comments lost in email threads. Senior candidates buried under junior ones with louder buzzwords. No structured record of why someone was rejected. Recruiter time spent on first-pass triage instead of interviews. After: a role that took 5 days to shortlist now takes 30 minutes once the resumes are uploaded.
Drag-drop or bulk import. File pre-processing: PDF text extraction (PyMuPDF), OCR fallback for scans (Tesseract + LLM cleanup), DOCX to plain text.
One LLM call per resume with a strict JSON schema: name, contacts, education, experience, skills, summary. Schema violations retry; after 3 fails the resume goes to manual review.
Resume-level embedding for the full text, plus section-level embeddings for each role and skill cluster. Stored for fast re-scoring against multiple jobs.
Same pipeline for the JD: extract must-have / nice-to-have requirements, generate JD embedding.
Embedding cosine similarity, plus rule-based filters (years of experience, location, language). LLM produces pros/cons/verdict. Weighted aggregate score 0-100. Recruiter sees a ranked list.
Time to shortlist
5 days → 30 min
Resumes per role
200-500
File formats
PDF, DOCX, image OCR
Languages
Arabic + English
Schema retry
3 attempts then manual
These are not mockups. Every screenshot below is from the system running in production.





For anyone evaluating the system from an engineering angle: why these choices, and what was traded off.
Native text extraction first, OCR if image-only or low-confidence, LLM cleanup for OCR garbage. Each stage records confidence so downstream consumers know how much to trust the output.
A senior backend engineer who did one year of frontend is not a frontend candidate. Section embeddings let us match the right experience to the right requirement.
Recruiters do not trust a black-box "92% fit." They trust "Top three reasons + Top three concerns + summary." The LLM produces all three.
When a recruiter overrides the ranking, that signal feeds back into the model weights for next time. The system learns from real decisions.
The system ranks and explains. It never rejects a candidate without a human in the loop. Stakes are too high.
Share the workflow and the systems you use today. Within 24 hours we reply with scope, KPIs, timeline, and a SAR estimate.
Start now