Reflecting on 3.5 Years at Apple

Occasionally, I get questions from friends about what it was like to work at Apple, in particular the Special Projects Group (SPG). I wanted to write an article detailing my experiences and what I learned, but I had to be mindful of NDAs and found myself dancing a lot with wording. With recent news, I feel more comfortable letting this out into the wild, though I still won’t delve into product details.

In the process of reflecting, I felt tremendous gratitude for the lucky breaks I have had. A former manager once told me, careers are meticulously planned but opportunistically executed.” This was certainly true for me. A lot of new grads today seem to be struggling to find jobs because of the macro environment. I hope my story gives reminder that even short careers can be winding and there are lots of chances to get where you need to be.

Apple ParkApple Park


Round and round

I joined Apple as part of a full-time program called the Siri Rotational Program (now the AI/ML Rotational Program, at time of writing) several months after graduating college. It was an 18-month stint comprised of six 3-month rotations, designed for new grads to explore their interests and gain familiarity with different teams and technologies within the AI/ML org. Engineering and project managers submitted project proposals each quarter to a tag in Radar (Apple’s internal issue tracking software), from which the rotatoes” ranked their preferences. Conflicts in project choice were sorted out by hand behind the scenes by Mark, the bearded father figure who was burdened with managing the multiplying tubers of his brainchild. At the end of 18 months, engineers generally had wide latitude to join one of their previous host teams as a non-rotating engineer.

The program offered an unusual amount of flexibility to new grads because the projects and their respective teams were chosen by the rotation engineer, not the other way around. Managers submitted proposals to a stringently filtered pool—projects were nicely scoped and a lot of teams did heavy lifting to set up rotatoes for success, e.g. having preprocessed data and evals ready so the rotatoes could do just the fun & glamorous part (training experiments). Despite the temptation to treat rotatoes as glorified interns (3 month rotations and the self-moniker didn’t help), senior engineers and managers were incredibly helpful and trusted rotatoes to complete substantial projects. Off the top of my head, features like Back Tap on the iPhone and bilingual Siri came from rotation projects.

For those reasons, the rotation program was a godsend to me. I knew I wanted to work on machine learning as a career, but I hardly knew anything about the subfields within. I couldn’t get into the most competitive graduate programs. My previous background was in consulting and business analytics, but I switched to Computer Science in my fourth year of undergrad, so I had no research experience and my programming skills were not great. I eked through the interview process with the help of a referral, and voraciously worked on applied ML projects that were as diverse as possible—speech recognition, text to speech, natural language understanding, search, and multimodal user interfaces. I was lucky to work with many talented engineers and researchers on cutting edge stuff, including a tiny slice of what released years later as the Vision Pro. It gave me a breadth of experience across data modalities, codebases, and team cultures that few junior engineers have the opportunity to see, which really built up my confidence.

There were also ridiculous things about working at Apple Park. It’s an architectural marvel—massive curved glass walls, slatted white oak, white terrazzo. It has views of the Santa Clara mountains and gives the feeling of being inside an alien mothership that has landed in a manicured state park. They filled berms using excavated earth to raise the property out of view from surrounding Cupertino, and sculpted hills and valleys to hide unseemly parking structures from view. Everything looked like a page torn out of an issue of The Local Project. The meandering footpaths and indoor-outdoor design made it effortless to get lost in thought, and lent a sense of mental expansiveness that I now miss while living in the city.1

Still, at the end of 18 months, I wasn’t sure what I wanted to commit my near-term career to. The rotation program had been shortened during my rotations; I was itching to choose because new rotatoes who joined were due to graduate” at the same time as I was!

By a stroke of bureaucratic luck, because I had joined the rotation program between cohorts, I was lumped in with other rotation engineers who started later than I did, giving me a seventh rotation. At the same time, political struggles at a level of existence I was not privy to moved Special Projects Group into the AI/ML org under John Giannandrea. Thus opened a brief window of time where SPG managers could submit rotation project proposals! This caused quite a stir among rotatoes. For balance, the number of rotation engineers allocated to SPG that quarter was limited to 3, and due to seniority, I had first choice under the same rationale as with other tiebreaks for contested rotation projects—that graduating rotatoes would benefit from maximum flexibility and information before having to choose their forever home. This window of opportunity would close shortly thereafter when SPG stopped submitting projects to the rotation program, and was reorg’d out of AI/ML a few months later.

Special Projects Group

SPGs reputation was one that was shrouded in mystery: brilliant engineers, all siloed away working on a top secret project, like a fractal of Apple itself.

As part of the rotation project placement process, engineers are encouraged to chat with their prospective future manager to get a sense of what their work would consist of. This was in early 2021 at the height of covid, when everyone was working at home, so naturally, I hopped on a video call with Ian Goodfellow. I had no inkling who I was talking to at the time, only that he was director of the team I would be working under. I remember asking pretty good questions about the culture and which of the project proposals he considered to be highest priority—it was a complicated computer vision modeling project using GANs. I had never written a convolutional neural network in my life, so naturally I thought, how hard could it be?”

Towards the end of the call, Ian said something to the effect of, well, I trust that you’re a strong programmer, so I’m not going to test you with a coding problem.” Later, seeing the rigor of interview questions asked of job applicants, I doubt I would have ever made it in to SPG through the front door.

Learning fast

Jensen Huang recently said that given the chance to do it all again, he wouldn’t—“building Nvidia turned out to have been a million times harder than I expected it to be… if we realized the pain and suffering, just how vulnerable you’re going to feel, and the challenges that you’re going to endure… I don’t think anybody would start a company.” This was basically my experience, except instead of building a trillion dollar company, I had to study a lot of ML.

At SPG, I worked on autonomous systems. My project focused on training models to realistically match the final output of the sense stack’s estimates of the environment, conditioned on simulator state. This would open up a fast way to run noisy simulation for training agents with sim-to-real transfer (compared to photorealistic rendering and running the full sense stack). This was a tough problem. To generalize, your model has to internalize some causal physics on visibility. In a real-world setting, it’s hard to collect counterfactual ground truth data—what would our perception of the environment have been had X object not been there? Over a large dataset, this problem is alleviated but not fully solved. A clever use of Laplacian Pyramid with CNN was useful for separately dealing with global and detailed information. There was also a matter of temporal consistency, which can be crudely handled by conditioning on more past frames.

It was a blessing that I dived in fully determined to earn the confidence placed in me, with no fear of the mountain I needed to climb. SPG was full of amazing engineers and researchers, and I learned an unbelievable amount working alongside them. I attribute much of my knowledge as a machine learning practitioner to my time there listening and asking questions, reading papers, writing PyTorch, and running experiments. It felt like a graduate education in miniature: I could ask about a technique from a textbook and hear from industry veterans about the caveats, when to use it, and suggestions on related papers to read.

It was also a curse for several reasons. Working there was mentally hard on me. My self confidence took a hit, not because everyone else was so smart—they were—but because I lacked prerequisite knowledge and skills. The learning curve was steep23, to the point where I considered giving up at times.

There was a lot of pressure to ship, yet so much uncertainty around what the product requirements were or what was even technically feasible. I was just getting my bearings as a software engineer, but now I had to self-teach a crash course in computer vision while conducting research work in a production monorepo with deadlines. Dev environment problems hobbled me and I made the mistake of ignoring them because other tasks seemed more urgent. The modeling problem itself was hard enough, but I really struggled with hard dependencies from teams in different timezones shifting under me—the sense stack would change or something in the simulator would break, and I would feel helpless.

The org also struggled with management churn and changing product directions. I had no full-time collaborators for a year. I eventually had to take over managing Jira tickets & milestones and run meetings with stakeholders because project managers slowly left. Ian was my direct manager on paper for about 6 months, but he was mostly busy managing others because the org was shoving entire preexisting teams under his purview once higher ups realized he was good at decomposing complex problems into reasonable goals. I think when I joined, he was managing 2 two-pizza teams, and by the time he left one year later, there were some 200 engineers in his org. Publicly, Ian transmuted his resignation into ammunition for other employees to petition Apple for remote work flexibility, which was a homie move. Privately, I suspect Ian was also relieved to take some time off and transition to an IC role elsewhere performing research again.

Local minima

Apple is famously the master of second-mover advantage. They have a top-down management culture organized by domain expertise, which often works well because Apple is able to attract and hire experienced experts in many domains. In this case, SPG was struggling with an ambitious project stuck in development hell because no experts had solved all these problems before, and SPG was organizationally incapable of righting itself.

When Ian left, I could feel the power vacuum. Direct reports were grafted directly under Ian’s VP-level manager, and a reorg didn’t happen until 6 months later. Product plans continued flip-flopping every few months. The codebase was laden with nearly a decade of tech debt. There was this cultural miasma of uncertainty and lack of agency that permeated everything—few people spoke up to ask elephant-in-the-room type questions in meetings, but coworkers would regularly admit confusion in private, or that they had doubts their work would be useful or successful in the long run. Engineering management was in a tough spot—I think they felt they had to keep a brave face and show confident leadership, but them never expressing doubt gave me the feeling I was being gaslit. Others plodded on with pet projects, seemingly impervious to the chaos, which added fuel to my imposter syndrome. The company was burning a fortune on R&D, yet middle management was choosing to clamp down spending on office snacks. It was maddening!

So why did I choose to stay on for nearly 2 years after my rotation program ended?

The other reason my stint at SPG felt cursed was that I felt I was in a professional local minima. Even at my lowest, I had a fire inside me to learn ML, but it was hard to imagine leaving and finding a considerably better learning environment. Going to grad school had an insurmountable opportunity cost compared to getting paid to work with some of the best in the industry. Even if I was willing to leave, limited opportunities to publish or talk about my work due to NDA made grad school application prospects murky. It was unclear whether I could pass interviews at other companies for similar positions.

I had so much to learn, yet at SPG, I felt unable to fully leverage the resources around me. Looking back, I lacked some agency—I eagerly asked questions, but I could have communicated better expectations surrounding work output and blocked out work hours for structured learning (e.g. following graduate course syllabi). There were too many meetings irrelevant to my work that I could have declined.

It was a struggle grokking a massive codebase and building on other teams’ code I neither owned nor fully understand. It helped once I carved out a pocket of code that I had ownership and control over. Going back in time, I would have more assertively inserted myself into that team and demanded to be a first class citizen or customer—my project was entirely dependent on it either way. (I don’t think complaining loudly to engineering management would have helped though. They seemed to be getting squeezed from above by Tim & Co. and below by attrition in the ranks.) I also found I learned the most the few times I was pair programming and talking to folks in person, but this happened very little because of remote work and covid. Looking back, this was a considerable slowdown on my growth as a junior engineer. I attended all the paper reading meetings. I never figured out how to fully draw out the huge reservoirs of knowledge inside other researchers’ heads during those readings, but watching them as they coded always yielded many insights.

Epilogue

Eventually, I got burned out and took some time off. I think a healthy amount of stress can be beneficial, but too much can set back your morale and even your career trajectory in a way that’s hard to recover from—so I pivoted.

Happily, I joined Imbue, which is culturally the diametric opposite—psychologically safe, radically candid, collaborative, playful, thoughtful, and emotionally spacious. I’ve felt much more self-actualized, and counterintuitively, my learning rate has gone up in an environment with objectively fewer resources than Apple. The reasons are interesting and may well become its own blog post. (We’re hiring, by the way!)

I feel grateful I left before Titan was shuttered, and I feel sympathy for all the amazing folks whose work will never see the light of day. A lot of this post was spent griping about working there because that sort of thing is fun, but I truly cannot express what a privilege it was. My career has been defined by the experience, and almost without exception I would endorse working with the fine folks in Apple AI/ML and former-SPG if you have the chance.


  1. Other tidbits: Apple served pizza out of patented round boxes. Dev devices like iPhones were naturally abundant and strictly tracked, but were nonchalantly shredded when old or no longer needed, which made me sad. There was fused hardware that made them unfit for refurbishment or sale.↩︎

  2. I think it was The Rise of Superman that said the ideal difficulty-to-skill ratio to achieve flow is 96%–104%. At times, I felt I was beyond 120%.↩︎

  3. I’m also reminded of various versions of the learning curve graph, replaced with games like Dota 2 or Dwarf Fortress: learning curve↩︎


Tags
career

Date
March 5, 2024