About Tothemoon Tothemoon is a user-centric, multiservice digital assets trading platform.
At Tothemoon, we prioritize what matters most in finance: reliability.
Whether it’s buying, selling, exchanging, or investing in cryptocurrencies, you can trust us to protect your financial interests and propel you towards a prosperous future.
Join a rapidly growing community of users who choose Tothemoon for their digital transactions.
We offer hands-on experience, challenging tasks, and opportunities for professional and career growth within a dynamic fintech project.
We’re looking for a specialist to test our product, including the mobile and web applications, as well as APIs and backend services.
Key Responsibilities Production infrastructure operations and development (90%) • Maintain and improve managed Kubernetes clusters (control plane, node pools, autoscaling, PDB, network policies).
• Support API and ML workloads.• Set up monitoring, alerting, logging, backups, and disaster recovery procedures.• Investigate and resolve incidents, including on-call participation.R&D and automation (10%) • Research, optimize, and automate the current infrastructure setup.
Tech Stack / Core of the Project Orchestration: Kubernetes (multi-pool, autoscaling, GPU workloads) GPU / ML: NVIDIA H100, NVIDIA stack (CUDA, drivers, nvidia-device-plugin), LLM inference Requirements Deep Kubernetes experience (3+ years): • Designing and maintaining production clusters (preferably with autoscaling, PDB, network policies).• Confident use of Deployments, StatefulSets, Ingress, RBAC, StorageClass, Helm/Kustomize.• Experience integrating Kubernetes with cloud providers (EKS, GKE, AKS, etc.).Strong Linux background: • Understanding of kernel operations, networking stack, cgroups, and namespaces.
• Ability to diagnose performance issues (CPU, memory, IO, network).GPU and high-load ML/LLM experience — a strong advantage: • Deploying and managing GPU-based applications in Kubernetes.
• Basic knowledge of CUDA, NVIDIA drivers, and nvidia-device-plugin.• Experience monitoring GPU utilization, memory, thermals, and errors.Operational and integration experience: • Integrating external services into Kubernetes (logging, monitoring, security, storage).
• Building monitoring and alerting aligned with SLO/SLA standards; incident analysis end-to-end.• Writing runbooks and automating routine operations. Why Join Us A senior-level team and a friendly, collaborative environment open to innovation and experimentation.Real technical challenges: high load, performance optimization, GPU infrastructure, and real-time workloads.
A product team , not outsourcing — your contribution directly impacts the company’s core technology.
Opportunities for professional growth and development in AI, ML infrastructure, and blockchain computing.
Supportive culture and a comfortable, modern workspace.
Conditions Format: On-site work in Almaty, Kulan Business Center.
Compensation: Competitive salary in USDT or fiat , including paid vacation and sick leave.
Benefits: Comfortable office and free lunches.
Schedule: Full-time, flexible working hours.
Powered by JazzHR