Large Language Models

Anthropic & Claude

Claude is my daily driver. I use both the desktop and mobile apps to quickly chat, take notes, organizing thoughts, and write. I also use Claude for meta-prompting, where I’ll ramble through a dictation of what I’m looking to do and have it write a clear prompt to feed into other models.

Development Workflows

Claude Desktop

Anthropic released a GitHub integration recently that I’ve been experimenting with. It helps me work plan new featurs, understand complex bugs, and plan refactors.

IDE Integration

In Cursor and Windsurf, I use a combination of 3.5, 3.7, 3.7 Thinking, and 3.7 Max depending on my needs. 3.7 can be challenging - it’s so heavily RL’d to solve problems that it often creates hacky fallbacks if not supervised carefully. I frequently default to Claude 3.5 for more predictable assistance.

Claude Code

I have only experimented with Claude Code, racking up a $30 bill over a few hours and struggling to rein it in as mentioned above. I expect this product to do really well, as the best engineers I know and hear from all use it. I need to upskill here.

OpenAI

o1 Pro

I use o1 Pro for deep research tasks when I want finalized, polished output. I previously experimented with one-shotting features by providing large codebase chunks with detailed requirements, but found this approach worked inconsistently and created an awkward workflow that I’ve largely abandoned.

o3 Mini / High

I use these for deep research when expecting more interactive chat back-and-forth, whereas o1 Pro is my choice when I’m going for a comprehensive finalized output with minimal interaction.

4.5

I’ve started using 4.5 on occasion for help with writing. I used to prefer Claude here but 4.5 seems like a better writer and I just haven’t fully adjusted yet. Things move fast.

Google

Gemini Pro 2

Used for ingesting large context windows, typically with code. With tools like RepoPrompt or Repomix, I dump an entire codebase or large chunks of it to evaluate open source projects, understand new codebases, plan features, or fix bugs that potentially span many files. Massive context window and fast inference makes it perfect for these kinds of tasks.

xAI & Grok

Code Analysis and Planning

I’m starting to use Grok more for code analysis and planning, though I haven’t fully integrated it into my regular workflow yet. Initial results have been promising, and I intend to use it more frequently.

Deep Research

Grok seems better at following instructions than o1 Pro, so I often use it when I’m looking to do broad survey research or if I’m looking for structured output like tables of information.

Deepseek

Reasoning on Codebases

I use r1 in Cursor “Ask” mode to plans new features or debug broken ones. It’s not my primary model for these tasks but will switch over if I’ve hit a wall with Claude Thinking.

Perplexity

Perplexity has replaced DuckDuckGo for me for basic search. Play around with their filters like social search if you haven’t.

General Advice

While I laid out my most common uses and where I tend to prefer one model over another, the truth is I hop around a lot, and often use multiple models in parallel for the same tasks, then running with whichever provided the best first response.