Yvaine’s Substack

Yvaine’s Substack

AI Product Interview Prep 01: What skillset does software engineer in inference require?

Yvette's avatar
Yvette
Apr 02, 2025
∙ Paid
Share

I recently completed my personal exploration, which began last September, and am now preparing for my next full-time job. Here are some logs I recorded based on my real job search experience in 2025.

Core Programming Languages

  • Python (mentioned in all positions)

  • C/C++ (Perplexity, Together AI)

  • Go or Rust (Together AI)

  • CUDA for GPU programming (OpenAI, Together AI, Perplexity)

Machine Learning Frameworks

  • PyTorch (mentioned explicitly in Together AI, implied in others)

  • TensorFlow/ONNX (Perplexity)

Inference Optimization Frameworks

  • TensorRT/TRTLLM (Perplexity, Together AI)

  • vLLM (Lambda Lab, Together AI)

  • SGLang (Together AI)

  • TGI (Text Generation Inference) (Together AI)

Distributed Systems Concepts

  • Fault-tolerance design

  • High-performance distributed systems

  • Request routing and load balancing

  • Distributed processing frameworks

  • Multi-threading

  • Memory management

  • Networking optimization

LLM-Specific Optimization Techniques

  • KV Cache systems (PagedAttention, Mooncake)

  • Continuous batching

  • Model quantization

  • Tensor parallelism

  • Pipeline parallelism

  • Mixture of Experts (MoE) parallelism

  • Speculative decoding

  • CUDA graph optimization

  • Workload scheduling

  • Efficient prompt caching

Cloud & Infrastructure

  • Kubernetes (Lambda Lab, Anthropic, Perplexity)

  • Cloud platforms (AWS, GCP, Azure)

  • Distributed file systems (3FS, HDFS, Ceph)

  • Autoscaling infrastructure

  • Resource optimization

GPU & Hardware Knowledge

  • NVIDIA GPU architecture

  • CUDA programming

  • HPC technologies (InfiniBand, MPI, NVLink)

  • Memory optimization

  • TPU/custom accelerators

  • RDMA/RoCE networking

Observability & System Health

  • Monitoring and logging systems

  • Performance benchmarking

  • Bottleneck identification

  • System observability

  • Debugging distributed systems

Model Understanding

  • Transformer architecture

  • Modern ML architectures

  • Multimodal generation models (text, vision, diffusion)

  • Model distillation

API & Integration Skills

  • API development for internal/external customers

  • Integration with other systems

Architecture & Design

  • System architecture design

  • Best practices in system design

  • Performance-critical distributed systems

Keep reading with a 7-day free trial

Subscribe to Yvaine’s Substack to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Yvaine
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture