As you know , FLOPS is a unit for measuring the computing power of computers in floating point operations, which is often used to measure whether one computer has more power than the other. It is especially important to measure FLOPS in the world of Top500 supercomputers in order to find out who is the best among them. However, the subject of measurement must have at least some practical application, otherwise what is the point of measuring and comparing it. Therefore, to find out the capabilities of desktops, laptops and supercomputers, there are benchmarks slightly closer to real computing tasks, for example, SPEC: SPECint and SPECfp.
And yet, FLOPS is actively used in performance evaluations and published in reports. To measure it, the Linpack test has long been used, and now the open standard benchmark LAPACK has emerged. What do these measurements tell developers of high performance and scientific applications? Is it easy to evaluate the performance of your PC in FLOPS? How to measure CPU and GPU performance in FLOPS? Will the measurements and comparisons be correct and is there an alternative to this approach? We will talk about all this below.
What is FLOPS anyway?
Let’s first understand a little about terms and definitions. So, FLOPS is the number of computational operations or instructions performed on floating point (FP) operands per second. The word “computational” is used here, since the microprocessor can execute other instructions with such operands, for example, loading from memory. Such operations do not carry any computational load and therefore are not counted.
The FLOPS value published for a specific system is primarily a characteristic of the computer itself, not a program. It can be obtained in two ways – theoretical and practical:
In theory, we know how many microprocessors are in the system and how many executable floating point devices are in each processor. They can all work at the same time and start working on the next instruction in the pipeline every cycle.
Therefore, to calculate the theoretical maximum for a given system, we only need to multiply all these values with the processor frequency – we get the number of FP operations per second.
Everything is simple, but such estimates are used, except when announcing in the press about future plans to build a supercomputer.
In practice we can know FLOPS executing Linpack benchmark. The benchmark performs the operation of matrix-matrix multiplication several dozen times and calculates the average value of the test execution time. Since the number of FP operations in the implementation of the algorithm is known in advance, dividing one value by another, we get the desired FLOPS. The Intel MKL (Math Kernel Library) library contains the LAPACK package, a package of libraries for solving linear algebra problems. The benchmark is built on top of this package. It is believed that its efficiency is at the level of 90% of theoretically possible, which allows the benchmark to be considered a “reference measurement”.
Important warning: when we measure FLOPS, by default we get 32-bit single-precision computations.
However it can vary: computations and FLOPS can be either 64-bit (FP64, double precision), and 32-bit (FP32, single precision) or 16-bit (FP16, half precision). The rule of thumb here — if the precision is 2 times higher, computing with this precision will be 2 times slower.
There are many tools available to measure FLOPS performance of a personal computer or laptop. However, all tools are based on the same operating principle.
Of the possible interfaces, there is performance analysis through the command line, through the Fortran and C ++ compilers, and so on. But we will go the easier way and will use the already compiled exe file of programs in Linpack, which is the most popular in measuring the performance of computers on Windows.
Let’s try it
The program interface is very simple and you can easily figure it out. First of all give the program the highest priority. After that, try to turn off the resource consumption of the program. You can choose how many times or minutes to run the test and how much data to wield during it. When all the settings are set, click Test. Once completed, you will most likely see the result in GFLOPS (Gigaflop per second).
Is that it? Can we be sure that’s the most accurate measurement of your PC’s power?
“Well yes, but actually no”
You have got only some synthetic benchmark of how fast your CPU can compute in a test environment. There are a few main reasons, some of them are bolder than the others:
- We haven’t checked your GPU yet (oh but we will, rest assured);
- We don’t know how well your PC can ‘feed’ the data to the CPU, or in smarter words if there is any overhead in your system;
- You use your PC not only to perform matrix multiplication,
but also to look at memes, and we want to look at them at 4K 60fps
Ok, let’s sort them out.
Reason 1: GPU FLOPS
To find out the performance of a video card in gigaflops, you need to multiply its frequency (GHz) by the number of shader processors (CUDA cores on NVidia cards, stream processors on AMD). Due to the fact that modern processors can perform more than one operation per clock cycle, the result must be multiplied by the number of these operations. Modern gaming video cards are capable of performing 2 operations per machine cycle (for example, addition + multiplication).
Let’s look at theoretical performance of various GPUs in the table below:
Let’s pay attention to a couple of peculiar details:
Firstly, the Radeon RX 580 in gigaflops performance is comparable to the GeForce GTX 1070, although in games it performs not that good. Why is this happening? It’s about drivers and optimization. If the software is optimized well, GPUs perform surprisingly well, which is observed in Ethereum mining, for what AMD cards sell like hot cakes.
Secondly, the performance of the most powerful Titan V graphics card (price $ 3000) is only 16% higher than the GeForce GTX 1080 Ti ($ 700). Most interestingly, the news has spread that the Titan V offers 110 TFLOPS performance! “Lies, deception” all over again?
Not really. In addition to the usual CUDA cores, Titan V contains tensor cores used for machine learning.
Reasons 2 and 3: Performance in day-to-day operations
To better understand how your PC performs under heavy loads or after upgrade, you can run multiple benchmarks. These are specialized programs for testing the performance of a computer in 3D applications. You can always compare the results of your machine with those of millions of other testers. However in different benchmarks those points are also different, so we can’t compare one benchmark results to some other benchmark results. Among the most popular benchmarks are the following:
For CPU – Cinebench,
For storage disks – CrystalDiskMark.
FLOPS is a unit of measurement of computing system performance that characterizes the maximum computing power of the system itself for floating point operations. FLOPS are more theoretical than practical units of measurement: even if you know the system’s FLOPS, you can’t say whether you can play Minecraft in 4K. That’s where benchmarking plays its role: you can simulate your target operations to know better if this particular hardware suits your purpose well.
We at Megamind have thought of our own unit of computing measurement — 1 Mind. To keep this article short, we will tackle our unit in the next post, as it requires a more mindful approach (pun intended).