DevOps and Site Reliability Engineer

About the role

We’re seeking a DevOps and Site Reliability Engineer with strong expertise in Microsoft Azure to manage our observability platform and AIOps Automation. The ideal candidate will have extensive hands-on experience with high traffic environments and security automation as well as in-depth platform knowledge.

About Bestway

Bestway Group is a diversified multinational family owned business with annualised turnover in excess of £4.5 billion.  Starting off as a chain of retail convenience stores, the Group has grown to become a diversified multinational business with interests across the wholesale, pharmacy, real estate, cement and banking sectors.  The Group is also the largest overseas investor in Pakistan. 

Owned by the Pervez, Choudrey and Sheikh families, Bestway Group was founded in 1976 by Sir Anwar Pervez OBE H Pk, who remains Chairman.  Serving over 12 million customers and employing over 28,000 individuals, the Group supports and serves communities through its operations across the UK, Pakistan and the Middle East

Responsibilities:

  • Observability Platform: Implement and own the Bestway Azure Observability Playbook — building comprehensive dashboards, alert rules, and runbooks using Application Insights, Log Analytics, and KQL.
  • AIOps Automation: Develop intelligent alerting systems that leverage AI/ML to detect early-warning signals — including IP reputation degradation, database saturation trends, and anomalous traffic patterns — before they escalate to incidents.
  • Release Assurance: Define and execute Operational Acceptance Testing (OAT) gates for all Production deployments, ensuring releases meet reliability, performance, and security thresholds before go-live.
  • Infrastructure Hygiene: Conduct periodic audits of the Azure tenant to identify and decommission orphaned or unutilised resources ('Zombie Resources') — directly reducing operational burn rate.
  • IaC & CI/CD: Build and maintain reusable Terraform modules; manage pipeline integrity across GitHub Actions workflows to ensure consistent, reproducible infrastructure deployments with multi-subscription Hub-and-Spoke Networking.

The Ideal Candidate:

  • Experience: 6+ years in DevOps or SRE roles, with specific experience managing high-traffic Azure-hosted environments at scale.
  • IaC Expert: Mastery of Terraform — including module authoring, remote state management, and workspace strategies for multi-environment deployments.
  • Monitoring Guru: Expert-level KQL (Kusto Query Language) for Log Analytics; comfortable building custom Azure Monitor Workbooks for operational reporting.
  • Security Automation: Strong security automation experience: passwordless authentication via OIDC, Azure Key Vault integration, and secrets management best practices.
  • Platform Knowledge: In-depth knowledge of Azure Container Apps (ACA), VNet Integration, and Private Endpoint configuration for secure, network-isolated workloads.

Benefits:

  • Competitive salary
  • Pension
  • 22 days annual leave, plus bank holidays
  • Onsite parking
  • Life assurance

We understand that no applicant ever ticks every box so please do consider applying should some or most of the above apply.  Bestway Group is an equal opportunity employer and we value diversity and inclusion. We welcome people of different nationalities, backgrounds, experiences, abilities, and perspectives. We want strong, and diverse teams built from talented individuals with different backgrounds identities and experiences.

If this is of interest to you and you would like to learn more, please do get in touch, we are looking forward to hearing from you.

DevOps and Site Reliability Engineer

Park Royal, Brent, Greater London, United Kingdom

NW10 7BW

Permanent - Full-time
Posted 3 days ago
Closing date: 20/05/2026
Job reference: GW1543871ParDASRE