We all know Google. Free browser, free email with 15 GB disk space. The only way to make people switch from a paid subscription service to free is the ease of use and the integration with other services. Hence, google drive and documents are well integrated, they just work. For most people, a simple editor, spreadsheet and presentation tool is more than enough. So, it is a pure software company. That changed when AI hardware emerged. Does Alphabet or Google have ASIC design teams? Let’s find out.
Google acquisition of Motorola
Motorola was a handset maker. They owned a lot of patents. They had hardware teams. But, Google bought them and kept the patents. They sold the handset group later. Most people acknowledge the acquisition was not about hardware. Some argue it was to get Android handset makers like Samsung back in line. That is plausible. And, they wanted patents to defend against patent claims of others. Make no mistake, patents are not intended to protect inventions. Almost all big companies use them as weapons of mass destruction. A bit like the cold war. It is leverage for negotiations between the big guns. Small companies suffer from patent litigation because the big ones have the power and the patent portfolio even if their claim has no merit. It takes a long battle in court even if the claim is bogus. The money you need to go to court is not going into R&D.
Back to the issue at hand. Google wasn’t really looking for hardware. It is not their core business. They uphold good software practices. Time proves they still hold the standard high. This is unusual. The scale of the deployment of their software is such that they need it. They need software fanatics to keep the standard high. This is their greatest strength. It is also their biggest weakness.
HW accelerators for academics
AI is the domain of academics. For more than fifty years, academics developed algorithms that tried to create an outcome that is similar to what a human can do. For example, a human can recognize and classify objects. To do this with a machine, they use math. Statistics. Input data, video or pictures, and algorithm and a desired outcome or goal. Google uses our photos (Google Photos is free storage for images!) as input data for their algorithms. Let’s say the goal of their algorithm is to detect cats in pictures. They need close to 100% statistical certainty. A machine must be able to detect if there is a cat in the picture or not. On the one hand, false positives are bad. On the other hand, a picture of a cat slipping through the algorithm is bad as well.
Two to Tango
In 2015 and probably before that, Google was active with project Tango. They realized that hardware is the deciding factor in detecting objects. Movidius supplied the Myriad chip to Google. In the chip a hardware CNN, convolutional neural network is able to run a trained model. Google trains their model in their cloud computing data centers. Let’s go back to the cat example, the model can classify an image or video (which is just a stream of images) with a cat in it. If it detects an object, it can put a percentage on the probability the object is a cat. The trained model is small enough to run real-time on the Myriad chip. The chip is a so-called edge hardware accelerator for AI. In our example, an object classification inference.
CPU then GPU
General purpose processors (Intel Xeon back then, now Intel Xeon or AMD Epyc) are bad at training. Intel has cloud computing offerings and has a lot of general purpose processing power. But the massive parallel computations are a much better match for a GPU, a graphical processor unit than for a CPU. Academics switched from supercomputers (days or weeks of waiting for results) to GPU’s.
The GPU today processes HD and UHD frames that contain millions of pixels. They need massive parallel computation power. A CPU is mainly serial execution of instructions. They do have multiple cores and threats but that is puny compared to the massive parallel computations a GPU does. To cut the story short, application specific chips (ASIC or FPGA) are faster and more efficient than general purpose processors. The caveat is that application specific is -like the term says- not very good at general purpose processing. The more specific hardware is, the more performant and efficient the processing is. The trade-off is a reduction in the field of possible applications that benefit from that hardware accelerator.
Methodology meets psychology
The only way software methodology can survive is to have software fanatics in charge. That is not a bad thing. A general decides the strategy. Successful scaling requires strict rules. If you allow soldiers to violate the rules, soon the whole company turns into Microsoft. Bill gates is not a great engineer (*), he always used “good enough” as the rule. In business that makes him one of the best businessmen in recent history. I credit him for the business side, the wealth he amassed. But on the software side, he is the one that trained humanity to accept software flaws and regular updates. Something far away from the original core belief of excelling. Improve, be more efficient needs strict rules. By enforcing the rules, the quality level is maintained (simplified helicopter view). Google is exceptional, even though they grew so fast and are so big now, the quality has not imploded like a bad plum pudding.
Still, Google makes 80% from advertising (give or take). If something is free, you are the product. Like Microsoft or any other big company, they are not a charity. They use your data and turn it in cold hard cash. Especially Google engineers are in denial about that. Every engineer employee is (in theory) able to adhere to facts and proof in work situations. They were excited to join company X or Y. Their parents, family and friends envy them, because he or she got into X or Y. But when real facts turn up about their company X or Y, they suddenly face a huge disconnect between their opinion and reality. Cognitive dissonance is well known in psychology. Brutal confrontation like this, Google uses the data of its users to sell it to anyone with money, creates a sudden shock. The path of reason is left to bend the world back to the original belief.
Experts need to be pigheaded as well. Methodology is always challenged. Day in, day out. Most of the time it is by juniors that haven’t experienced the long road littered with problems that the expert walked. The expert carefully constructed a methodology to avoid pitfalls and problems he knows will come. Compare it to a superhero that avoids a catastrophy. Who will believe him when he says he avoided an epic disaster? The disaster had to happen before people would appreciate the fact the hero avoided it. A contradictio in terminis, my dear Watson. Please don’t put “fixed bug” in the version control system when you commit code. Sounds familiar? The evrsion control system is not just for keeping versions. The comment is essential for others to find where things could have gone wrong. Or to find specific changes related to functionality. The commnet must describe the change in the code. Then people started to check in with empty comments. The solution was to not allow a commit if the comment was empty. So, people use “bug”, “updated code” or whatever meaningless comment in the repository. You see, you can’t fix stupid.
Google ASIC: eTPU
The general that has a methodology, enforces it. Many soldiers will question it, but the army is not a democracy. Nor is a company for that matter. The point I am making is crucial. A software expert that adheres to the methodology, is used to criticism. Discarding it is needed to stay sane. It is easy to go from a methodologist to plain arrogance. Hubris. Google’s software stack, their tensors and tensorflow is their core. To have everyone adopt their tensors and tensorflow instead of the offerings from the competition they need to own the hardware from training to inference. The eTPU, the USB coral stick for example, is a disgrace for a core hardware person like me. Companies get sued for incomplete specifications and documentation. Google is above the law, nobody cares that their documentation is crap. That there are only two examples delivered. They send their stick to influencers on Linkedin, blogs and whatever media they can find and all those influencers do is run two demo scripts and say: “Amazing!”. It is disgusting. Talk about ethics. Talk about no evil.
Google ASIC: TPU
Since Google dominates the browser market. And it dominates the web searches as well (I keep trying Bing and DuckDuckGo but come on). Looking for the TPU creation, you get the blog post of the guy they hired to make the TPU. Amazing this and amazing that. What I don’t see is hardware expertise. They explain the high level strategy and math down to some basic architecture, but nothing about hardware. That is a possible tell. Similarly, the problem with power consumption is another tell. Power consumption is a cost. And it heats the chip, requiring active cooling. I see impressive active cooling on TPU. The eTPU runs hot on maximum frequency (they warn about that). I have deep respect for the best and most stubborn software experts. They stand by their methodology. But how good are they in hardware? I see software people ask questions about FPGA as if it is software. A HDL has a software syntax but is not software. It is describing hardware. It describes a digital circuit.
The art of ASIC design
Today, I see a lot of academics coding an open source 64-bit processor. I hate to be the one to tell you. But hardware design is not software. HDL is not software. Similarly, I am able to write application software. Seem like they are able to write HDL source code too. The difference between an expert and an amateur lies in the quality of the code. Almost certain, my software code is average at best. Not structured or optimal or reusable. Similarly , hardware design is not just HDL syntax. The design is written in such a way that power, synthesis and DFT (design for test) are taken into account at the design phase. Certainly, the point is that software experts in Google aren’t able to distinguish between an average, a good and a great ASIC designer. Academics tape out test-chips all the time. But a production chip, millions of units, needs scan chains, memBist, lifetime tests, power effiency versus performance and needs to make a profit (size!).
Premier League of ASIC design
Above all, FPGA design and ASIC test chips play in the “amateur league”. Not because I feel superior. I do, I am human after all. But also because there is a whole world out there past FPGA and test-chips. It is “ASIC in mass production”. Where yield optimization is critical. Lifetime tests need to prove the chips doesn’t break after two weeks of use. And that failure analysis is possible on returns from the field. Like the superhero that avoided a disaster. How can you know how big and excruciating the disaster would have been, if you never experienced it? Therefore, the perfect “catch 22”.
Google needs the hardware for making its software stack the default choice. Probably they don’t like to disclose too much about the kind of AI they are doing to semiconductor companies like AMD, Intel, Nvidia, … Hence, my point is that they need to be arrogant concerning their core software business and that prevents them from seeing hardware has its own experts. Google has chip designers. But it seems like they are not at the same level of expertise as their software counterparts.
(*) Allegedly, Bill Gates had a discussion with Jobs where he said something like this:
“Steve, I think it’s more like we both had this rich neighbor named Xerox and I broke into his house to steal the TV set and found out that you had already stolen it.”
Abundance of information: the coral datasheet.
Also interesting: what is the difference between FPGA, ASIC and PSoC?