How Facebook Might Find Nervana For Machine Learning Training
There is a rumor going around that a certain hyperscaler is going to be augmenting its GPU-based machine learning training and will be adopting Intel’s Nervana Neural Network Processor (NNP) for at least some of its workloads. Some of the chattering lends itself to hyperbole, claiming that Facebook, the operator of the world’s largest social network, would do all of its machine learning training on the Nervana NNPs once they are widely deployed and, by implication, would be using Intel accelerators to do inference as well.
We are not so sure we believe that, and it would be very hard to verify that. What we know for sure is that Facebook has rearchitected its machine learning infrastructure for both training and inference so it could easily slip different kinds of compute into its infrastructure, as we discussed while attending the Open Compute Project’s Global Summit 2019 last week. One of the innovations that Facebook has created in conjunction with Microsoft and in support of other hyperscalers such as Google, Alibaba, Baidu, and Tencent, is the OCP Acceleration Module, which is a common accelerator form factor that is basically a portable socket (handle included) that plugs into a PCI-Express switch fabric. Chip makers AMD, Graphcore, Habana, Intel, Nvidia, and Xilinx have all agreed to adopt this OAM form factor, and it would not be surprising to see Google do the same thing with its TPU 3.0 machine learning accelerators. READ MORE ON: THE NEXT PLAT FORM