Research the company deeply
Read their engineering blog, recent papers, and product launches. Mention specific projects in your answers — it signals real interest.
Battle-tested guidance across ML, deep learning, LLMs, statistics, SQL, and behavioral rounds.
Read their engineering blog, recent papers, and product launches. Mention specific projects in your answers — it signals real interest.
Situation, Task, Action, Result. Structure every behavioral answer this way. Practice 8-10 stories that cover leadership, conflict, failure, impact.
Don't just say 'I built a model.' Say 'I built a churn model that reduced false positives by 32%, saving the team 12 hours/week of manual review.'
Ask about the team's hardest current problem, how success is measured in the first 90 days, and what the interviewer enjoys most. Never ask anything Google would tell you.
Past → Present → Future. 'I'm an X who did Y, currently doing Z, looking for opportunities to do W.' Rehearse it until it feels natural, not robotic.
High bias = underfitting (model too simple). High variance = overfitting (model memorizes noise). Be ready to explain how regularization, more data, or simpler models address each.
Precision, recall, F1, AUC-ROC, AUC-PR. When does each matter? Hint: imbalanced classes → PR-AUC over ROC-AUC. Fraud detection → recall matters more than accuracy.
Time series → use TimeSeriesSplit, never random splits. Grouped data (same user multiple rows) → GroupKFold. Leakage from preprocessing on full dataset → always fit on train only.
A linear model with great features beats XGBoost with bad features. Talk about feature creation, encoding categoricals (target, one-hot, embeddings), handling missing values, and scaling.
Gini impurity, entropy, information gain. For regression: variance reduction (MSE). Random Forest = bagging trees on bootstrap samples + random feature subsets per split.
Use ReLU instead of sigmoid/tanh, batch normalization, residual connections (ResNet), and gradient clipping. Be ready to draw a backprop diagram.
Self-attention: Q, K, V matrices. Attention(Q,K,V) = softmax(QK^T / sqrt(d_k)) V. Multi-head attention runs N attention layers in parallel. Positional encoding adds order info.
Dropout (random zeroing), L2 weight decay, early stopping, data augmentation, label smoothing. Know when each helps.
SGD = stable, slow. Momentum = accelerates in consistent directions. Adam = adaptive learning rate per param, works out of the box. AdamW = Adam with proper weight decay.
Check learning rate (too high → exploding loss, too low → no progress), verify data loading, overfit a single batch first, check loss function correctness, inspect gradients.
RAG = inject context at inference time. Best for changing knowledge. Fine-tuning = teach style/format/domain. Combine both for production. Always mention chunking strategy and embedding model choice.
Out-of-distribution queries, weak retrieval (irrelevant chunks), context exceeding model attention, low-temperature decoding masking the issue. Solutions: better retrieval, citations, confidence scoring, function calling.
Few-shot examples, chain-of-thought ('think step by step'), output format constraints (JSON schema), role assignment, delimiters. Know the difference between prompt and system prompt.
Pinecone (managed, easy), Chroma (open-source, local), Weaviate (hybrid search), pgvector (use Postgres you already have). Compare on cost, latency, hybrid search support.
BLEU/ROUGE are weak for open generation. Use LLM-as-judge, human eval rubrics, faithfulness/groundedness for RAG, retrieval recall@k. Trace + log every call (LangSmith, Helicone).
p-value = probability of observing data this extreme IF the null hypothesis is true. It's NOT the probability the null is true. A p < 0.05 doesn't mean the effect is meaningful — talk about effect size.
Power analysis BEFORE the test (sample size, MDE). Watch for peeking (early stopping inflates false positives). Use CUPED for variance reduction. Two-tailed by default. Bonferroni if testing multiple metrics.
Sampling distribution of the mean approaches normal as n grows, regardless of underlying distribution. Why we can use t-tests on non-normal data with large samples.
Frequentist: 'p-value < 0.05'. Bayesian: 'posterior probability the effect is positive is 96%'. Bayesian is more interpretable for stakeholders and handles peeking better.
A 95% CI means 95% of intervals constructed this way contain the true parameter. It does NOT mean 'the parameter is in this interval with 95% probability' — that's the Bayesian credible interval.
ROW_NUMBER, RANK, DENSE_RANK for ranking. LAG/LEAD for previous/next row. SUM() OVER (PARTITION BY ... ORDER BY ...) for running totals. Practice 'top N per group' until automatic.
Find users who referred others, find rows where col_a = col_b but ids differ. SELECT a.id, b.id FROM t a JOIN t b ON a.email = b.email AND a.id < b.id.
When asked 'is this efficient?', talk about indexes, full scans vs index seeks, hash joins vs nested loops. Adding LIMIT doesn't help if there's a sort first.
Counter, defaultdict, itertools.groupby, bisect, heapq. Interviewers love when you reach for the right tool. Use list comprehensions for clarity, generators for memory.
Narrate your thinking. 'I'll start with brute force, then optimize.' Walk through an example before coding. Ask about constraints (input size, edge cases). Don't go silent.
Pick a real failure with real consequences. Show what you learned and how you applied it. Avoid humble brags ('I worked too hard'). Show humility, growth, and ownership.
Frame it as collaborative problem-solving, not 'I was right'. Show you listened, sought to understand, found a path forward, and maintained the relationship.
Pull toward something new, not push away from something bad. 'I'm looking for more X' beats 'My current manager is Y'. Never trash-talk previous employers.
Show ambition tied to the company's growth, not a different industry. Tech lead, IC depth, founding a team — pick one and back it with what you'd need to learn.
Never give a number first. Say 'I'd like to focus on the role first; I'm sure we can align on compensation when we get there.' When pressed, give a range with the low end above your real floor.