March 5, 2024

Reflecting on Apple

Occasionally, I get questions from friends about what it was like to work at Apple, especially the Special Projects Group (SPG). I wanted to write an article detailing my experiences and what I learned, while being mindful of NDAs. With recent news, I feel more comfortable letting this out into the wild, though I still won’t delve into any product details.

In the process of reflecting, I felt tremendous gratitude for the lucky breaks I have had. A former manager once told me, “careers are meticulously planned but opportunistically executed.” This was certainly true for me. A lot of new grads today seem to be struggling to find jobs because of the macro environment. I hope my story gives reminder that careers are often winding and there are lots of chances to get where you need to be.

Apple Park

Round and round

I joined Apple as part of a full-time program called the Siri Rotational Program (now the AI/ML Rotational Program, at time of writing) several months after graduating college. It was an 18-month stint comprised of six 3-month rotations, designed for new grads to explore their interests and gain familiarity with different teams and technologies within the AI/ML org. Engineering and project managers submitted project proposals each quarter to a tag in Radar (Apple’s internal issue tracking software), from which the “rotatoes” ranked their preferences. Conflicts in project choice were sorted out by hand behind the scenes by Mark, the bearded father figure burdened with managing the tubers of his brainchild. At the end of 18 months, engineers generally had wide latitude to join one of their previous host teams as a non-rotating engineer.

The program offered an unusual amount of flexibility to new grads as the projects and their respective teams were chosen by the rotation engineer, not the other way around. Managers submitted proposals to a stringently filtered pool, which selected for nicely scoped projects with heavy lifting behind the scenes to set rotatoes up for success, e.g. having preprocessed data and evals ready so the rotatoes could focus on fun & glamorous parts like training models. Despite the temptation to treat rotatoes as glorified interns (3 month rotations and the self-moniker didn’t help), senior engineers and managers were incredibly respectful and helpful, and treating us as full-time team members and trusting us to complete substantial projects. Off the top of my head, well-received features like Back Tap on the iPhone and bilingual Siri originated as rotation projects.

The rotation program was a lucky break for me. Coming from a background in consulting and business analytics, I switched to Computer Science in my fourth year of undergrad. I graduated with no research experience. I didn’t get into my preferred graduate CS programs. I knew little to nothing about deep learning as a field. Though they encouraged and supported me, none of my family worked in STEM or knew anything about navigating the software industry. All I knew was that I wanted—needed—to work on machine learning because my imagination was lit on fire by AlphaStar and OpenAI Five, who had crushed human pros at two of my favorite video games of all time. I was not great at technical interviews and almost didn’t pass the first technical screen. Grateful for the mulligan, I crammed and aced the final round.

Once inside, I proceeded to voraciously tackle applied ML projects that were as diverse as possible—speech recognition, text to speech, natural language understanding, search, multimodal user interfaces. I was lucky to work with so many talented engineers and researchers on cutting edge stuff, including a tiny slice of what released years later as the Vision Pro. This gradually shored up my confidence and gave me a breadth of experience across data modalities, codebases, and team cultures that few junior engineers have the opportunity to see. Looking back, a rotation program was an especially ergonomic format for unfolding my career: noticing the work that inspired me, trying it, exploring adjacent areas, and iteratively investing more in the things that made me feel alive. Nothing about the rotational stints felt overly committal or forced. I was just intuitively following a gradient towards the problems and kinds of work that spoke to me—which isn’t to say it was effortless or easy!—just that the hard work of learning machine learning felt engaging and natural.

Fun tidbits

There were some fun, ridiculous things about working at Apple Park. It’s an architectural marvel—massive curved glass walls, slatted white oak, white terrazzo. It has views of the Santa Clara mountains and gives the feeling of being inside an alien mothership that has landed in a pristinely manicured state park. They filled berms using excavated earth to raise the property out of view from surrounding Cupertino, and sculpted hills and valleys to hide unseemly parking structures from view. Everything looked like a page torn out of an issue of The Local Project. Meandering footpaths and an indoor-outdoor design made it effortless to get lost in thought, and lent a sense of mental expansiveness that I now miss while living in the city.¹

Chance

Still, at the end of 18 months, I wasn’t sure what I wanted to commit my near-term career to. The program had been shortened while I was still moving through my rotations. I was itching to choose because new rotatoes who joined after me were due to “graduate” at the same time as I was!

By a stroke of bureaucratic luck, because I had joined the rotation program between cohorts, I was lumped in with other rotation engineers who started later than I did, giving me a seventh rotation. At the same time, political struggles at a level of existence I was not privy to moved Special Projects Group into the AI/ML org under John Giannandrea. Thus opened a brief window of time where SPG managers could submit rotation project proposals! This caused quite a stir among rotatoes. For balance, the number of rotation engineers allocated to SPG that quarter was limited to 3, and due to seniority, I had first choice under the same rationale as with other tiebreaks for contested rotation projects—that graduating rotatoes would benefit from maximum flexibility and information before having to choose their forever home. This window of opportunity would close shortly thereafter when SPG stopped submitting projects to the rotation program, and was reorg’d out of AI/ML a few months later.

Special Projects Group

SPG’s reputation was one that was shrouded in mystery: brilliant engineers, all siloed away working on a top secret project, like a fractal of Apple itself.

As part of the rotation project placement process, engineers are encouraged to chat with their prospective future manager to get a sense of what their work would consist of. This was at the height of COVID-19 in early 2021 when everyone was working at home, so naturally, I hopped on a video call with Ian Goodfellow. I had no inkling who I was talking to at the time, only that he was director of the team I would be working under. I remember asking pretty good questions about the culture and which of the project proposals he considered to be highest priority—it was a complicated computer vision modeling project using GANs. I had never written a convolutional neural network in my life, so naturally I thought, “how hard could it be?”

Towards the end of the call, Ian said something to the effect of, “well, I trust that you’re a strong programmer, so I’m not going to test you with a coding problem.” A very different trajectory laid behind that door.

Learning fast

At SPG, I worked on autonomous systems. My project focused on training models to predict noisy outputs from the sense stack’s estimates of the environment, conditioning on ground truth (or simulator state at inference time). This would open up a fast way to run realistic simulation for training RL policies and evaluating imitation learning & RL models (compared to photorealistically rendering scenes and running the full sense stack). To generalize, your model has to internalize some causal physics on visibility. In a real-world setting, it’s hard to collect counterfactual ground truth data—what would our perception of the environment have been had X object not been there? Only with a sufficiently large dataset can you begin to marginalize over these scenarios. Then I was worried about architecture—would U-Nets struggle with non-local causality (i.e. an occlusion on the left side of an image might affect visibility elsewhere)? With much help from coworkers, we tried many approaches to answer these questions and progressively worked our way towards decent performance with a Laplacian CNN.

I struggled with the technical problem a lot, but struggled even more with asking for help I needed from my manager and teammates because I felt afraid that they would be annoyed or look down on me. I was just starting to get my bearings as a software engineer, and here I was self-teaching a crash course in computer vision while conducting research work in a production monorepo. There was a lot of pressure to ship, and so much uncertainty around exact requirements or what was even technically feasible. I didn’t do a good job of negotiating a clear definition of success with the customer (RL team), giving myself more room on latency constraints, or de-risking deadlines and technical hurdles by getting a baseline integrated in the simulator (another team!) first and anchoring expectations relative to that. Dev environment problems would crop up and I made the mistake of ignoring them because other tasks seemed more urgent. The modeling problem itself was hard enough, but dealing with dependencies from teams in different timezones shifting under me—the sense stack would change or something in the simulator would break—left me feeling helpless and out of the loop.

SPG was full of amazing engineers and researchers—I attribute a good chunk of my foundational knowledge as a machine learning practitioner to my time there listening and asking questions, reading papers, writing PyTorch, and running experiments. It felt like a graduate education in miniature!

But I would have learned even more if I had more frequent pairing sessions with teammates and mentors. So much opportunity was lost by working remotely for years without the personal experience or thoughtfully designed team culture to offset the downsides, and I didn’t have the agency at the time to find and act on alternative ways of working and being. Code review is great for teaching idiomatic patterns and better practices to junior engineers, but many valuable learnings happen at a tactical level from osmosis with senior engineers: whiteboarding together; picking up their terminal and IDE tricks; observing them navigate a codebase, build their own tools & libraries, sequence coding tasks, and land PRs.

There were also higher level lessons around navigating team politics, carving out your own territory, and optimizing for performance review that I wasn’t prepared to receive at the time. I regret that so many of my structured 1:1s revolved around work deliverables instead of mentorship on how to ask the right research questions, design experiments, communicate findings, and meta-learning. It was my responsibility to create space for myself, reflect on what I was most struggling with, and communicate that—but all I remember was being in a haze of stress.

Local minima

Over the course of two years, I noticed that the org struggled with management churn and changing product directions. The project was firmly in development hell, with a codebase nearly a decade old. There was this cultural miasma of uncertainty and lack of agency that permeated everything—few people spoke up to ask elephant-in-the-room type questions in meetings, but coworkers would regularly admit confusion in private, or that they had doubts their work would be useful or successful in the long run.

Engineering management was in a tough spot—I think they felt they had to keep a brave face and show confident leadership, but them never expressing doubt gave me the feeling I was being gaslit.

So why did I stay?

One reason was that switching costs are high in tech. Preparing for interviewing is nearly a part-time job, and I was unconfident about passing them without a lot of practice.

The other reason was that I felt I was in a local minima re: learning. Even at my lowest, I had a fire inside me to continuously get better at ML, but it was hard to imagine leaving and finding a considerably better learning environment at, say, grad school. Opportunity costs aside, my limited publishable work made grad school application prospects murky.

Epilogue

Eventually, I got burned out and took some time off. I think a healthy amount of stress can be beneficial, but too much can set back your morale and even your career trajectory in a way that’s hard to recover from—so I pivoted.

Happily, I joined Imbue, which is culturally the opposite in many ways—psychologically safe, candid, collaborative, playful, and emotionally spacious. I’ve felt more self-actualized, and counterintuitively, my learning rate has gone up in an environment with objectively fewer resources than Apple. The reasons are interesting and may well become its own blog post. (We’re hiring, by the way!)

I feel grateful I left before Titan was shuttered, and I feel sympathy for all the amazing folks whose work will never see the light of day. Some of this post was spent griping about working there because that sort of thing is fun, but I truly cannot express what a privilege it was.

I highly recommend working at the AI/ML rotational program as a way of learning whether you’d enjoy working as an ML engineer. And almost without exception I would endorse working with the fine folks in Apple AI/ML and former-SPG if you have the chance.

Other tidbits: Apple served pizza out of patented round boxes. Dev devices like iPhones were naturally abundant and strictly tracked, and were shredded when barely old or no longer needed, which made me sad. (Fused hardware made them unfit for refurbishment or sale.)↩︎

career

Philosophy on coding agents We’re currently building an AI coding product at Imbue. I think one important and under-valued direction in coding agents is disambiguation with