Vacancy: DevOps Engineer (AI and Services)

DevOps Engineer (AI and Services)
Midrand

Applications in the form of an updated CV to be addressed to Recruitment Specialist via e-mail on [email protected] 
Closing Date: Vacancy Still Open

DevOps Engineer (AI and Services)

Role Description

Location: Midrand, Gauteng, South Africa

Full-time and Office based

Reporting Line: General Manager - AI and Services

Job Purpose

The DevOps Engineer will be responsible for deploying, managing, and optimizing the AI software stack to support our AI-driven applications. They will collaborate closely with data scientists and machine learning engineers to ensure seamless integration of AI models and services within our enterprise environment.

Recruitment Specialist

[email protected]

Key Responsibilities

  • Design, implement, and manage on-premises and hybrid infrastructure for AI Solutions.
  • Leverage tools like NVIDIA AI Enterprise to streamline containerized application deployments and manage GPU resources effectively.
  • Automate the deployment and configuration of AI software solutions, ensuring that machine learning models and AI frameworks (like TensorFlow, PyTorch, etc.) are optimized for performance.
  • Develop scripts and tools in Python to facilitate rapid deployment of AI applications across various environments.
  • Implement CI/CD pipelines specifically tailored for AI workloads, utilizing tools such as cuDNN, Jenkins, GitLab CI, or CircleCI to automate testing and deployment processes.
  • Collaborate with data science teams to ensure efficient model versioning and deployment strategies.
  • Establish monitoring solutions to track the performance and utilization of AI resources and systems, ensuring that applications run efficiently and reliably.
  • Analyze system performance, identify bottlenecks, and implement tuning strategies for optimal GPU and application performance.
  • Work closely with cross-functional teams, including data scientists, ML engineers, and IT, to support the deployment and integration of AI solutions into business applications.
  • Provide guidance and support for best practices in AI model training and deployment, ensuring effective use of AI tools and solutions.
  • Implement security measures and best practices to safeguard data and AI models within the AI Enterprise environment and infrastructure.
  • Ensure compliance with relevant data protection regulations and industry standards.
  • Create and maintain comprehensive documentation related to infrastructure setups, deployment processes, and operational guidelines.
  • Conduct training sessions for team members on AI tools, Platfroms, DevOps practices, and efficient workflows.

Requirements

Experience and Knowledge:

  • 3+ years of experience in a DevOps role, with a focus on Automation, AI, machine learning, or data engineering.
  • Hands-on experience with NVIDIA AI Enterprise software is an advantage
  • Experience in the following technologies is beneficial: Service Fabric, Redis, Rancher, ASP.NET, .Net Core, RabbitMQ, Elastic stack, Git, API, Terraform
  • Knowledge of industry-based 
    • AI Platforms (e.g., Nvidia, Intel, OpenShift AI, Kubernetes, etc) 
    • Models (HuggingFace, Nvidia, Lama, GPT, etc)
    • Frameworks (Nvidia AI Enterprise, Intel, Cuda, Morpheus, NeMu, etc)
    • AI Infrastructure (DELL AI Factory, SuperMicro, Nutanix AI, etc)

 

Skills and Education:

  • Bachelor’s degree in Computer Science, Engineering, or a related field; or equivalent experience 
  • Proficiency in Python and other scripting languages (Bash) for automation and tool development.
  • Familiarity with containerization technologies (Docker, Kubernetes, Rancher) as they relate to AI workloads.
  • Understanding of machine learning frameworks (TensorFlow, PyTorch) and their deployment
  • Understanding of specific coding/scripting languages e.g.: Python, JavaScript, Yaml, Json, Terraform, Ansible
  • Understanding of messaging protocols, API’s and SDK’s
  • Understanding of open-source databases
  • Fundamental understanding of TCP/IP. DNS, TLS and load balancing
  • Strong analytical and problem-solving skills with a keen attention to detail.
  • Excellent communication and teamwork abilities, with a collaborative mindset.
  • Ability to adapt to a fast-paced environment and manage multiple priorities effectively.

Personal Attributes

  • Ability to work flexible hours and respond to deployment emergencies outside of regular business hours as needed.
  • Mature, outgoing, positive individual
  • Self-managed, proactive and takes accountability 
  • Enjoys and is able to engage with diverse peoples including culture, religion and belief systems 
  • Ability to work as part of a team and good at collaboration 
  • High level of energy
  • Motivated and driven

 

Compliance Competencies

  • Perform all duties with integrity, to the highest ethical standards, and in compliance with all relevant legal, contractual and other requirements as mandated by the Company Code of Conduct.
  • Keep up to date with changes in applicable compliance obligations, controls and measures relevant to the role.

All applications in the form of a detailed CV must be forwarded to:

Recruitment Specialist
[email protected]

By sending your CV to apply for the position, you give your consent for the information to be processed and you also acknowledge and understand the purpose for which your personal information is required and will be used for. 

The Company is under no obligation to fill this position and should you not have had any feedback within 2 weeks after the closing date, you may consider your application unsuccessful. 

Order Tracking

Enter your order tracking details here. Results will be available once we have dispatched your goods from our warehouse.

Need help?