Production Engineer

Facebook

Facebook

Product
Santa Clara, CA, USA
Posted 6+ months ago

⌖We are hiring⌖ Are you interested in the following:

• Work on one of the largest networks in the world. ✅

• Build software that enables operations across that network. ✅

• Be intimately involved in the network that powers the future of AI. ✅

If so, 𝙬𝙚 𝙝𝙖𝙫𝙚 𝙢𝙪𝙡𝙩𝙞𝙥𝙡𝙚 𝙤𝙥𝙚𝙣𝙞𝙣𝙜𝙨 𝙛𝙤𝙧 𝙨𝙚𝙣𝙞𝙤𝙧 𝙋𝙧𝙤𝙙𝙪𝙘𝙩𝙞𝙤𝙣 𝙀𝙣𝙜𝙞𝙣𝙚𝙚𝙧 𝙞𝙣 𝙤𝙪𝙧 𝙩𝙚𝙖𝙢, with roles focused on Meta’s AI Network, Meta’s DC network fault management software stack and the reliability of Meta’s latest NIC. If you have a strong background in coding and systems engineering, these opportunities may be a perfect fit for you!

Please apply from here: https://lnkd.in/gYew_y3X

𝗡𝗲𝘁𝘄𝗼𝗿𝗸.𝗔𝗜

In this role, you work with a group of engineers to build and evolve our network infrastructure that connects myriads of GPUs together. In addition, we need to ensure that the network is running smoothly and meets stringent performance and availability requirements of RDMA workloads that expects a loss-less fabric interconnect. To enhance the performance of these systems, we continuously seek opportunities for improvement across our infrastructure stack, including network fabric, host networking, communication libraries, and scheduling infrastructure.

𝗡𝗲𝘁𝘄𝗼𝗿𝗸 𝗙𝗮𝘂𝗹𝘁 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁

In this role, you will work on enhancing a sophisticated pipeline designed to detect, mitigate, triage, and repair network faults across all network switches in Meta’s data centers. You'll play a key role in improving the reliability and performance of our global network infrastructure; enabling Meta’s network to be reliable for AI workloads.

𝗡𝗜𝗖 𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆

As a Production Engineer in the PE NIC team, you will be exposed to cutting-edge technology developed internally at Meta. This role focuses on optimizing our server network communication stack to improve system efficiency and reliability. You’ll have the opportunity to bridge the gap between our network and servers/GPUs that use them.

If you’re passionate about working with large-scale systems and want to be part of a team that shapes the future of Meta’s infrastructure, we encourage you to apply from https://lnkd.in/gYew_y3X