Evaluating LLMs for Inference, or Lessons from Teaching for Machine Learning