Why does the Apple A12X processor have better benchmark results then the i7-4790T?











up vote
6
down vote

favorite
1












In my workstation I have an Intel i7-4790T that I've always thought was a pretty fast CPU. But according to Geekbench 4 the Apple A12X processor in the new iPad Pro comfortably beats it. When I run Geekbench 4 I get a single core speed of around 4,000 but on the new iPad Pro the A12X processor returns around 5,000 i.e. 25% faster. In fact even the A12 and A11 score more than my i7-4790T. On the multicore test my CPU scores a shade over 11,000 while the A12X scores 18,000, which is a whopping 60% faster.



A preliminary question is whether Geekbench is a reliable indicator of real world speed. For example the only thing I do that really stresses my CPU is video resampling with Handbrake. Handbrake isn't available for IOS, but assuming it was ported would Handbrake really process videos 60% faster on the A12X, or is the Geekbench score unrepresentative of real world performance?



But my main question is this: leaving aside exactly how the A12X and my CPU compare, how have Apple managed to get an ARM based RISC chip to be that fast? What aspects of its architecture are responsible for the high speed?



My understanding of RISC architectures is that they do less per clock cycle but their simple design means they can run at higher clock speeds. But the A12X runs at 2.5GHz while my i7 has a base speed of 2.7GHz and will boost to 3.9GHz in single core loads. So given my i7 will run at clock speeds 50% faster than the A12X how does the Apple chip manage to beat it?



From what I can find on the Internet the A12X has much more L2 cache, 8MB vs 256KB (per core) for my i7, so that's a big difference. But does this extra L2 cache really make such a big difference to the performance?



Appendix: Geekbench



The Geekbench CPU test only stresses the CPU and the CPU-memory speeds. The details of exactly how Geekbench does this are described in this PDF (136KB). The tests appear to be exactly the sort of things we do that that use lots of CPU, and it appears they would indeed be representative of Handbrake performance that I suggested as an example.



The detailed breakdown of the Geekbench results for my i7-4790T and the A12X are:



Test            i7-4790T      A12X
Crypto 3870 3727
Integer 4412 5346
Floating Point 4140 4581
Memory Score 3279 5320









share|improve this question




























    up vote
    6
    down vote

    favorite
    1












    In my workstation I have an Intel i7-4790T that I've always thought was a pretty fast CPU. But according to Geekbench 4 the Apple A12X processor in the new iPad Pro comfortably beats it. When I run Geekbench 4 I get a single core speed of around 4,000 but on the new iPad Pro the A12X processor returns around 5,000 i.e. 25% faster. In fact even the A12 and A11 score more than my i7-4790T. On the multicore test my CPU scores a shade over 11,000 while the A12X scores 18,000, which is a whopping 60% faster.



    A preliminary question is whether Geekbench is a reliable indicator of real world speed. For example the only thing I do that really stresses my CPU is video resampling with Handbrake. Handbrake isn't available for IOS, but assuming it was ported would Handbrake really process videos 60% faster on the A12X, or is the Geekbench score unrepresentative of real world performance?



    But my main question is this: leaving aside exactly how the A12X and my CPU compare, how have Apple managed to get an ARM based RISC chip to be that fast? What aspects of its architecture are responsible for the high speed?



    My understanding of RISC architectures is that they do less per clock cycle but their simple design means they can run at higher clock speeds. But the A12X runs at 2.5GHz while my i7 has a base speed of 2.7GHz and will boost to 3.9GHz in single core loads. So given my i7 will run at clock speeds 50% faster than the A12X how does the Apple chip manage to beat it?



    From what I can find on the Internet the A12X has much more L2 cache, 8MB vs 256KB (per core) for my i7, so that's a big difference. But does this extra L2 cache really make such a big difference to the performance?



    Appendix: Geekbench



    The Geekbench CPU test only stresses the CPU and the CPU-memory speeds. The details of exactly how Geekbench does this are described in this PDF (136KB). The tests appear to be exactly the sort of things we do that that use lots of CPU, and it appears they would indeed be representative of Handbrake performance that I suggested as an example.



    The detailed breakdown of the Geekbench results for my i7-4790T and the A12X are:



    Test            i7-4790T      A12X
    Crypto 3870 3727
    Integer 4412 5346
    Floating Point 4140 4581
    Memory Score 3279 5320









    share|improve this question


























      up vote
      6
      down vote

      favorite
      1









      up vote
      6
      down vote

      favorite
      1






      1





      In my workstation I have an Intel i7-4790T that I've always thought was a pretty fast CPU. But according to Geekbench 4 the Apple A12X processor in the new iPad Pro comfortably beats it. When I run Geekbench 4 I get a single core speed of around 4,000 but on the new iPad Pro the A12X processor returns around 5,000 i.e. 25% faster. In fact even the A12 and A11 score more than my i7-4790T. On the multicore test my CPU scores a shade over 11,000 while the A12X scores 18,000, which is a whopping 60% faster.



      A preliminary question is whether Geekbench is a reliable indicator of real world speed. For example the only thing I do that really stresses my CPU is video resampling with Handbrake. Handbrake isn't available for IOS, but assuming it was ported would Handbrake really process videos 60% faster on the A12X, or is the Geekbench score unrepresentative of real world performance?



      But my main question is this: leaving aside exactly how the A12X and my CPU compare, how have Apple managed to get an ARM based RISC chip to be that fast? What aspects of its architecture are responsible for the high speed?



      My understanding of RISC architectures is that they do less per clock cycle but their simple design means they can run at higher clock speeds. But the A12X runs at 2.5GHz while my i7 has a base speed of 2.7GHz and will boost to 3.9GHz in single core loads. So given my i7 will run at clock speeds 50% faster than the A12X how does the Apple chip manage to beat it?



      From what I can find on the Internet the A12X has much more L2 cache, 8MB vs 256KB (per core) for my i7, so that's a big difference. But does this extra L2 cache really make such a big difference to the performance?



      Appendix: Geekbench



      The Geekbench CPU test only stresses the CPU and the CPU-memory speeds. The details of exactly how Geekbench does this are described in this PDF (136KB). The tests appear to be exactly the sort of things we do that that use lots of CPU, and it appears they would indeed be representative of Handbrake performance that I suggested as an example.



      The detailed breakdown of the Geekbench results for my i7-4790T and the A12X are:



      Test            i7-4790T      A12X
      Crypto 3870 3727
      Integer 4412 5346
      Floating Point 4140 4581
      Memory Score 3279 5320









      share|improve this question















      In my workstation I have an Intel i7-4790T that I've always thought was a pretty fast CPU. But according to Geekbench 4 the Apple A12X processor in the new iPad Pro comfortably beats it. When I run Geekbench 4 I get a single core speed of around 4,000 but on the new iPad Pro the A12X processor returns around 5,000 i.e. 25% faster. In fact even the A12 and A11 score more than my i7-4790T. On the multicore test my CPU scores a shade over 11,000 while the A12X scores 18,000, which is a whopping 60% faster.



      A preliminary question is whether Geekbench is a reliable indicator of real world speed. For example the only thing I do that really stresses my CPU is video resampling with Handbrake. Handbrake isn't available for IOS, but assuming it was ported would Handbrake really process videos 60% faster on the A12X, or is the Geekbench score unrepresentative of real world performance?



      But my main question is this: leaving aside exactly how the A12X and my CPU compare, how have Apple managed to get an ARM based RISC chip to be that fast? What aspects of its architecture are responsible for the high speed?



      My understanding of RISC architectures is that they do less per clock cycle but their simple design means they can run at higher clock speeds. But the A12X runs at 2.5GHz while my i7 has a base speed of 2.7GHz and will boost to 3.9GHz in single core loads. So given my i7 will run at clock speeds 50% faster than the A12X how does the Apple chip manage to beat it?



      From what I can find on the Internet the A12X has much more L2 cache, 8MB vs 256KB (per core) for my i7, so that's a big difference. But does this extra L2 cache really make such a big difference to the performance?



      Appendix: Geekbench



      The Geekbench CPU test only stresses the CPU and the CPU-memory speeds. The details of exactly how Geekbench does this are described in this PDF (136KB). The tests appear to be exactly the sort of things we do that that use lots of CPU, and it appears they would indeed be representative of Handbrake performance that I suggested as an example.



      The detailed breakdown of the Geekbench results for my i7-4790T and the A12X are:



      Test            i7-4790T      A12X
      Crypto 3870 3727
      Integer 4412 5346
      Floating Point 4140 4581
      Memory Score 3279 5320






      cpu performance ipad






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 26 at 22:06









      Albin

      2,2951129




      2,2951129










      asked Nov 20 at 15:51









      John Rennie

      104312




      104312






















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          11
          down vote



          +200










          The A12X is an enormous CPU built on the latest technology, leaving far behind
          the older i7-4790T dating from 2014.



          First difference is the manufacturing process:
          The A12X is a 7 nm chip, while the i7-4790T Haswell-DT is built on older 22 nm.
          Smaller transistors mean less space, less operating power and faster signals
          across shorter chip paths.



          The A12X has a whopping 10 billion
          transistors, while the i7-4790T has only 1.4 billion.



          This allows the A12X to have six integer execution pipelines, among which
          two are complex units, two load and store units, two branch ports,
          and three FP/vector pipelines, giving a total of an estimated 13 execution ports,
          far more than the eight execution ports of the Haswell-DT architecture.



          For cache size, per core we have on the A12: Each Big core has
          L1 cache of 128kB and L2 cache of 8MB. Each Little core has 32kB of L1and 2MB of L2. There’s also an additional 8 MB of SoC-wide$ (also used for other things).



          Haswell architecture has L1 cache of 64KB per core, L2 cache of 256KB per core,
          and L3 cache of 2–40 MB (shared).



          It can be seen that the A12X beats the i7-4790T on all points and by a large margin.



          Regarding RISC vs CISC architecture, this is now a moot point on modern processors.
          Both architectures have evolved to the point where they now
          emulate each other’s features to a degree in order to mitigate weaknesses.



          I quote here the chart of comparisons to Xeon 8192, i7 6700k,
          and AMD EPYC 7601 CPUs, compiled by Reddit (link below),
          where the A12 compares well even with desktop processors:



          image



          Sources :




          • Reddit : Apple really undersold the A12 CPU

          • Intel® Core™ i7-4790T Processor

          • Wikipedia Haswell (microarchitecture)

          • Wikipedia List of Intel Core i7 microprocessors

          • WikiChip Haswell - Microarchitectures - Intel






          share|improve this answer



















          • 1




            TSMC's 7nm is more comparable to Intel's 10nm, so such represents "only" a two process generation lead. RISC/CISC does make a noticeable difference in microarchitecture. A RISC would not have a microcode cache with substantially expended instructions (~90 bits per op vs. ~38 bits per instruction with significant predecoding in a RISC Icache) to reduce decode latency/energy. The extra registers can also give a 1-2% performance boost. These RISC advantages are small compared to microarchitecture and tiny compared to process technology but they are non-zero.
            – Paul A. Clayton
            Nov 23 at 13:00










          • @PaulA.Clayton the modern RISC/CISC debate is useless. ARM CPUs have used microcode for a long time. RISCs have more registers simply because they can't access memory in normal instructions. CISC CPUs may have hundreds of registers internally due to register renaming. And where do you get that 90 and 38 bits? The average length of x86 instructions are just more than 2 bytes
            – phuclv
            Nov 24 at 14:47












          • @phuclv, Intel processors have been using microcode, since before the ARM architecture existed in user devices. Microcode was implemented after the Pentium bug. As for the CISC/RISC the Intel microcode internally processes the CISC instructions to multiple RISC instructions and pipelines them.
            – Strom
            Nov 26 at 6:16










          • thanks for adding the sources!
            – Albin
            Nov 26 at 22:03


















          up vote
          0
          down vote













          You are comparing very different architectures between the A12X and Haswell (Intel i7-4790T), and the benchmark numbers are not directly comparable, much as the two processors are not directly comparable.



          Understanding what a particular test is testing is helpful in trying to understand what the numbers mean. Going through your geek bench test, let's start at the last line.



          Per your GeekBench tests, the memory bandwidth between the A12X and the haswell chip is heavily skewed. The A12X has roughly twice the memory performance. While Memory tests typically conflate two unrelated items, latency and bandwidth, the A12X is the clear winner here.



          The next item is floating point performance. This test is trying to compare hand optimized code between different architectures. While the numbers may be skewed by the quality of the optimizations, this should be a good ballpark to overall FPU performance and is directly comparable. Here the two processors have similar results.



          The least helpful test is the test labeled integer performance. It isn't integer performance in the arithmetic sense, it is more so a collection of non-FPU generic workloads. These tests are meaningful in that they show application performance on a platform, but they aren't meaningful to say that processor A is better than processor B as they are somewhat sensitive to memory performance.



          Last is the Crypto work load. This is meaningful in the abstract, although the particular test probably isn't that useful. High performance crypto should be using AES-GCM not AES-CTR, the latter of which doesn't lend itself to to hardware acceleration as well. This is also a domain specific benchmark.



          If I were to try to say something smart about those particular numbers, let's try this;




          • The A12X has significantly improved memory bandwidth. This is partially because desktop memory seems to lag behind the then current memory technologies, but also because memory performance has improved in five years.

          • The A12X has slightly better FPU performance per core than the i7-4790T.

          • The A12X is going to run generic work loads similar to or faster than the i7-4790T.

          • The A12X is much better at domain specific work loads as it provides hardware support for new and different instructions that better reflect the needs of a tablet/cell device.


          Drawing larger conclusions based on those numbers, or making architectural claims based on those numbers is probably unwise.



          As to a generic architecture comparison, RISC vs CISC is no longer meaningful as both instruction sets are decoded into micro-ops which determine how the workload is distributed. Comparing based purely on execution ports is probably not particularly meaningful as those are high level building blocks which are not directly comparable.



          Cache is an important quantity that directly contributes to processor performance, but it is also very complicated. How cache is shared between the intel architecture and the A12X is completely different. In general, having more cache is better, but just as important is cache coherency which affects how threaded applications are able to share data between cores.



          Lastly, the processor needs to work for your workload. While the A12X may be able to support a desktop workload at some point in the future, the i7 v4 supports it now, and that makes it a superior choice for a desktop processor even though it is four to five years older than the A12X.






          share|improve this answer





















            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "3"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1377017%2fwhy-does-the-apple-a12x-processor-have-better-benchmark-results-then-the-i7-4790%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            11
            down vote



            +200










            The A12X is an enormous CPU built on the latest technology, leaving far behind
            the older i7-4790T dating from 2014.



            First difference is the manufacturing process:
            The A12X is a 7 nm chip, while the i7-4790T Haswell-DT is built on older 22 nm.
            Smaller transistors mean less space, less operating power and faster signals
            across shorter chip paths.



            The A12X has a whopping 10 billion
            transistors, while the i7-4790T has only 1.4 billion.



            This allows the A12X to have six integer execution pipelines, among which
            two are complex units, two load and store units, two branch ports,
            and three FP/vector pipelines, giving a total of an estimated 13 execution ports,
            far more than the eight execution ports of the Haswell-DT architecture.



            For cache size, per core we have on the A12: Each Big core has
            L1 cache of 128kB and L2 cache of 8MB. Each Little core has 32kB of L1and 2MB of L2. There’s also an additional 8 MB of SoC-wide$ (also used for other things).



            Haswell architecture has L1 cache of 64KB per core, L2 cache of 256KB per core,
            and L3 cache of 2–40 MB (shared).



            It can be seen that the A12X beats the i7-4790T on all points and by a large margin.



            Regarding RISC vs CISC architecture, this is now a moot point on modern processors.
            Both architectures have evolved to the point where they now
            emulate each other’s features to a degree in order to mitigate weaknesses.



            I quote here the chart of comparisons to Xeon 8192, i7 6700k,
            and AMD EPYC 7601 CPUs, compiled by Reddit (link below),
            where the A12 compares well even with desktop processors:



            image



            Sources :




            • Reddit : Apple really undersold the A12 CPU

            • Intel® Core™ i7-4790T Processor

            • Wikipedia Haswell (microarchitecture)

            • Wikipedia List of Intel Core i7 microprocessors

            • WikiChip Haswell - Microarchitectures - Intel






            share|improve this answer



















            • 1




              TSMC's 7nm is more comparable to Intel's 10nm, so such represents "only" a two process generation lead. RISC/CISC does make a noticeable difference in microarchitecture. A RISC would not have a microcode cache with substantially expended instructions (~90 bits per op vs. ~38 bits per instruction with significant predecoding in a RISC Icache) to reduce decode latency/energy. The extra registers can also give a 1-2% performance boost. These RISC advantages are small compared to microarchitecture and tiny compared to process technology but they are non-zero.
              – Paul A. Clayton
              Nov 23 at 13:00










            • @PaulA.Clayton the modern RISC/CISC debate is useless. ARM CPUs have used microcode for a long time. RISCs have more registers simply because they can't access memory in normal instructions. CISC CPUs may have hundreds of registers internally due to register renaming. And where do you get that 90 and 38 bits? The average length of x86 instructions are just more than 2 bytes
              – phuclv
              Nov 24 at 14:47












            • @phuclv, Intel processors have been using microcode, since before the ARM architecture existed in user devices. Microcode was implemented after the Pentium bug. As for the CISC/RISC the Intel microcode internally processes the CISC instructions to multiple RISC instructions and pipelines them.
              – Strom
              Nov 26 at 6:16










            • thanks for adding the sources!
              – Albin
              Nov 26 at 22:03















            up vote
            11
            down vote



            +200










            The A12X is an enormous CPU built on the latest technology, leaving far behind
            the older i7-4790T dating from 2014.



            First difference is the manufacturing process:
            The A12X is a 7 nm chip, while the i7-4790T Haswell-DT is built on older 22 nm.
            Smaller transistors mean less space, less operating power and faster signals
            across shorter chip paths.



            The A12X has a whopping 10 billion
            transistors, while the i7-4790T has only 1.4 billion.



            This allows the A12X to have six integer execution pipelines, among which
            two are complex units, two load and store units, two branch ports,
            and three FP/vector pipelines, giving a total of an estimated 13 execution ports,
            far more than the eight execution ports of the Haswell-DT architecture.



            For cache size, per core we have on the A12: Each Big core has
            L1 cache of 128kB and L2 cache of 8MB. Each Little core has 32kB of L1and 2MB of L2. There’s also an additional 8 MB of SoC-wide$ (also used for other things).



            Haswell architecture has L1 cache of 64KB per core, L2 cache of 256KB per core,
            and L3 cache of 2–40 MB (shared).



            It can be seen that the A12X beats the i7-4790T on all points and by a large margin.



            Regarding RISC vs CISC architecture, this is now a moot point on modern processors.
            Both architectures have evolved to the point where they now
            emulate each other’s features to a degree in order to mitigate weaknesses.



            I quote here the chart of comparisons to Xeon 8192, i7 6700k,
            and AMD EPYC 7601 CPUs, compiled by Reddit (link below),
            where the A12 compares well even with desktop processors:



            image



            Sources :




            • Reddit : Apple really undersold the A12 CPU

            • Intel® Core™ i7-4790T Processor

            • Wikipedia Haswell (microarchitecture)

            • Wikipedia List of Intel Core i7 microprocessors

            • WikiChip Haswell - Microarchitectures - Intel






            share|improve this answer



















            • 1




              TSMC's 7nm is more comparable to Intel's 10nm, so such represents "only" a two process generation lead. RISC/CISC does make a noticeable difference in microarchitecture. A RISC would not have a microcode cache with substantially expended instructions (~90 bits per op vs. ~38 bits per instruction with significant predecoding in a RISC Icache) to reduce decode latency/energy. The extra registers can also give a 1-2% performance boost. These RISC advantages are small compared to microarchitecture and tiny compared to process technology but they are non-zero.
              – Paul A. Clayton
              Nov 23 at 13:00










            • @PaulA.Clayton the modern RISC/CISC debate is useless. ARM CPUs have used microcode for a long time. RISCs have more registers simply because they can't access memory in normal instructions. CISC CPUs may have hundreds of registers internally due to register renaming. And where do you get that 90 and 38 bits? The average length of x86 instructions are just more than 2 bytes
              – phuclv
              Nov 24 at 14:47












            • @phuclv, Intel processors have been using microcode, since before the ARM architecture existed in user devices. Microcode was implemented after the Pentium bug. As for the CISC/RISC the Intel microcode internally processes the CISC instructions to multiple RISC instructions and pipelines them.
              – Strom
              Nov 26 at 6:16










            • thanks for adding the sources!
              – Albin
              Nov 26 at 22:03













            up vote
            11
            down vote



            +200







            up vote
            11
            down vote



            +200




            +200




            The A12X is an enormous CPU built on the latest technology, leaving far behind
            the older i7-4790T dating from 2014.



            First difference is the manufacturing process:
            The A12X is a 7 nm chip, while the i7-4790T Haswell-DT is built on older 22 nm.
            Smaller transistors mean less space, less operating power and faster signals
            across shorter chip paths.



            The A12X has a whopping 10 billion
            transistors, while the i7-4790T has only 1.4 billion.



            This allows the A12X to have six integer execution pipelines, among which
            two are complex units, two load and store units, two branch ports,
            and three FP/vector pipelines, giving a total of an estimated 13 execution ports,
            far more than the eight execution ports of the Haswell-DT architecture.



            For cache size, per core we have on the A12: Each Big core has
            L1 cache of 128kB and L2 cache of 8MB. Each Little core has 32kB of L1and 2MB of L2. There’s also an additional 8 MB of SoC-wide$ (also used for other things).



            Haswell architecture has L1 cache of 64KB per core, L2 cache of 256KB per core,
            and L3 cache of 2–40 MB (shared).



            It can be seen that the A12X beats the i7-4790T on all points and by a large margin.



            Regarding RISC vs CISC architecture, this is now a moot point on modern processors.
            Both architectures have evolved to the point where they now
            emulate each other’s features to a degree in order to mitigate weaknesses.



            I quote here the chart of comparisons to Xeon 8192, i7 6700k,
            and AMD EPYC 7601 CPUs, compiled by Reddit (link below),
            where the A12 compares well even with desktop processors:



            image



            Sources :




            • Reddit : Apple really undersold the A12 CPU

            • Intel® Core™ i7-4790T Processor

            • Wikipedia Haswell (microarchitecture)

            • Wikipedia List of Intel Core i7 microprocessors

            • WikiChip Haswell - Microarchitectures - Intel






            share|improve this answer














            The A12X is an enormous CPU built on the latest technology, leaving far behind
            the older i7-4790T dating from 2014.



            First difference is the manufacturing process:
            The A12X is a 7 nm chip, while the i7-4790T Haswell-DT is built on older 22 nm.
            Smaller transistors mean less space, less operating power and faster signals
            across shorter chip paths.



            The A12X has a whopping 10 billion
            transistors, while the i7-4790T has only 1.4 billion.



            This allows the A12X to have six integer execution pipelines, among which
            two are complex units, two load and store units, two branch ports,
            and three FP/vector pipelines, giving a total of an estimated 13 execution ports,
            far more than the eight execution ports of the Haswell-DT architecture.



            For cache size, per core we have on the A12: Each Big core has
            L1 cache of 128kB and L2 cache of 8MB. Each Little core has 32kB of L1and 2MB of L2. There’s also an additional 8 MB of SoC-wide$ (also used for other things).



            Haswell architecture has L1 cache of 64KB per core, L2 cache of 256KB per core,
            and L3 cache of 2–40 MB (shared).



            It can be seen that the A12X beats the i7-4790T on all points and by a large margin.



            Regarding RISC vs CISC architecture, this is now a moot point on modern processors.
            Both architectures have evolved to the point where they now
            emulate each other’s features to a degree in order to mitigate weaknesses.



            I quote here the chart of comparisons to Xeon 8192, i7 6700k,
            and AMD EPYC 7601 CPUs, compiled by Reddit (link below),
            where the A12 compares well even with desktop processors:



            image



            Sources :




            • Reddit : Apple really undersold the A12 CPU

            • Intel® Core™ i7-4790T Processor

            • Wikipedia Haswell (microarchitecture)

            • Wikipedia List of Intel Core i7 microprocessors

            • WikiChip Haswell - Microarchitectures - Intel







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 27 at 7:26

























            answered Nov 23 at 11:25









            harrymc

            249k10257550




            249k10257550








            • 1




              TSMC's 7nm is more comparable to Intel's 10nm, so such represents "only" a two process generation lead. RISC/CISC does make a noticeable difference in microarchitecture. A RISC would not have a microcode cache with substantially expended instructions (~90 bits per op vs. ~38 bits per instruction with significant predecoding in a RISC Icache) to reduce decode latency/energy. The extra registers can also give a 1-2% performance boost. These RISC advantages are small compared to microarchitecture and tiny compared to process technology but they are non-zero.
              – Paul A. Clayton
              Nov 23 at 13:00










            • @PaulA.Clayton the modern RISC/CISC debate is useless. ARM CPUs have used microcode for a long time. RISCs have more registers simply because they can't access memory in normal instructions. CISC CPUs may have hundreds of registers internally due to register renaming. And where do you get that 90 and 38 bits? The average length of x86 instructions are just more than 2 bytes
              – phuclv
              Nov 24 at 14:47












            • @phuclv, Intel processors have been using microcode, since before the ARM architecture existed in user devices. Microcode was implemented after the Pentium bug. As for the CISC/RISC the Intel microcode internally processes the CISC instructions to multiple RISC instructions and pipelines them.
              – Strom
              Nov 26 at 6:16










            • thanks for adding the sources!
              – Albin
              Nov 26 at 22:03














            • 1




              TSMC's 7nm is more comparable to Intel's 10nm, so such represents "only" a two process generation lead. RISC/CISC does make a noticeable difference in microarchitecture. A RISC would not have a microcode cache with substantially expended instructions (~90 bits per op vs. ~38 bits per instruction with significant predecoding in a RISC Icache) to reduce decode latency/energy. The extra registers can also give a 1-2% performance boost. These RISC advantages are small compared to microarchitecture and tiny compared to process technology but they are non-zero.
              – Paul A. Clayton
              Nov 23 at 13:00










            • @PaulA.Clayton the modern RISC/CISC debate is useless. ARM CPUs have used microcode for a long time. RISCs have more registers simply because they can't access memory in normal instructions. CISC CPUs may have hundreds of registers internally due to register renaming. And where do you get that 90 and 38 bits? The average length of x86 instructions are just more than 2 bytes
              – phuclv
              Nov 24 at 14:47












            • @phuclv, Intel processors have been using microcode, since before the ARM architecture existed in user devices. Microcode was implemented after the Pentium bug. As for the CISC/RISC the Intel microcode internally processes the CISC instructions to multiple RISC instructions and pipelines them.
              – Strom
              Nov 26 at 6:16










            • thanks for adding the sources!
              – Albin
              Nov 26 at 22:03








            1




            1




            TSMC's 7nm is more comparable to Intel's 10nm, so such represents "only" a two process generation lead. RISC/CISC does make a noticeable difference in microarchitecture. A RISC would not have a microcode cache with substantially expended instructions (~90 bits per op vs. ~38 bits per instruction with significant predecoding in a RISC Icache) to reduce decode latency/energy. The extra registers can also give a 1-2% performance boost. These RISC advantages are small compared to microarchitecture and tiny compared to process technology but they are non-zero.
            – Paul A. Clayton
            Nov 23 at 13:00




            TSMC's 7nm is more comparable to Intel's 10nm, so such represents "only" a two process generation lead. RISC/CISC does make a noticeable difference in microarchitecture. A RISC would not have a microcode cache with substantially expended instructions (~90 bits per op vs. ~38 bits per instruction with significant predecoding in a RISC Icache) to reduce decode latency/energy. The extra registers can also give a 1-2% performance boost. These RISC advantages are small compared to microarchitecture and tiny compared to process technology but they are non-zero.
            – Paul A. Clayton
            Nov 23 at 13:00












            @PaulA.Clayton the modern RISC/CISC debate is useless. ARM CPUs have used microcode for a long time. RISCs have more registers simply because they can't access memory in normal instructions. CISC CPUs may have hundreds of registers internally due to register renaming. And where do you get that 90 and 38 bits? The average length of x86 instructions are just more than 2 bytes
            – phuclv
            Nov 24 at 14:47






            @PaulA.Clayton the modern RISC/CISC debate is useless. ARM CPUs have used microcode for a long time. RISCs have more registers simply because they can't access memory in normal instructions. CISC CPUs may have hundreds of registers internally due to register renaming. And where do you get that 90 and 38 bits? The average length of x86 instructions are just more than 2 bytes
            – phuclv
            Nov 24 at 14:47














            @phuclv, Intel processors have been using microcode, since before the ARM architecture existed in user devices. Microcode was implemented after the Pentium bug. As for the CISC/RISC the Intel microcode internally processes the CISC instructions to multiple RISC instructions and pipelines them.
            – Strom
            Nov 26 at 6:16




            @phuclv, Intel processors have been using microcode, since before the ARM architecture existed in user devices. Microcode was implemented after the Pentium bug. As for the CISC/RISC the Intel microcode internally processes the CISC instructions to multiple RISC instructions and pipelines them.
            – Strom
            Nov 26 at 6:16












            thanks for adding the sources!
            – Albin
            Nov 26 at 22:03




            thanks for adding the sources!
            – Albin
            Nov 26 at 22:03












            up vote
            0
            down vote













            You are comparing very different architectures between the A12X and Haswell (Intel i7-4790T), and the benchmark numbers are not directly comparable, much as the two processors are not directly comparable.



            Understanding what a particular test is testing is helpful in trying to understand what the numbers mean. Going through your geek bench test, let's start at the last line.



            Per your GeekBench tests, the memory bandwidth between the A12X and the haswell chip is heavily skewed. The A12X has roughly twice the memory performance. While Memory tests typically conflate two unrelated items, latency and bandwidth, the A12X is the clear winner here.



            The next item is floating point performance. This test is trying to compare hand optimized code between different architectures. While the numbers may be skewed by the quality of the optimizations, this should be a good ballpark to overall FPU performance and is directly comparable. Here the two processors have similar results.



            The least helpful test is the test labeled integer performance. It isn't integer performance in the arithmetic sense, it is more so a collection of non-FPU generic workloads. These tests are meaningful in that they show application performance on a platform, but they aren't meaningful to say that processor A is better than processor B as they are somewhat sensitive to memory performance.



            Last is the Crypto work load. This is meaningful in the abstract, although the particular test probably isn't that useful. High performance crypto should be using AES-GCM not AES-CTR, the latter of which doesn't lend itself to to hardware acceleration as well. This is also a domain specific benchmark.



            If I were to try to say something smart about those particular numbers, let's try this;




            • The A12X has significantly improved memory bandwidth. This is partially because desktop memory seems to lag behind the then current memory technologies, but also because memory performance has improved in five years.

            • The A12X has slightly better FPU performance per core than the i7-4790T.

            • The A12X is going to run generic work loads similar to or faster than the i7-4790T.

            • The A12X is much better at domain specific work loads as it provides hardware support for new and different instructions that better reflect the needs of a tablet/cell device.


            Drawing larger conclusions based on those numbers, or making architectural claims based on those numbers is probably unwise.



            As to a generic architecture comparison, RISC vs CISC is no longer meaningful as both instruction sets are decoded into micro-ops which determine how the workload is distributed. Comparing based purely on execution ports is probably not particularly meaningful as those are high level building blocks which are not directly comparable.



            Cache is an important quantity that directly contributes to processor performance, but it is also very complicated. How cache is shared between the intel architecture and the A12X is completely different. In general, having more cache is better, but just as important is cache coherency which affects how threaded applications are able to share data between cores.



            Lastly, the processor needs to work for your workload. While the A12X may be able to support a desktop workload at some point in the future, the i7 v4 supports it now, and that makes it a superior choice for a desktop processor even though it is four to five years older than the A12X.






            share|improve this answer

























              up vote
              0
              down vote













              You are comparing very different architectures between the A12X and Haswell (Intel i7-4790T), and the benchmark numbers are not directly comparable, much as the two processors are not directly comparable.



              Understanding what a particular test is testing is helpful in trying to understand what the numbers mean. Going through your geek bench test, let's start at the last line.



              Per your GeekBench tests, the memory bandwidth between the A12X and the haswell chip is heavily skewed. The A12X has roughly twice the memory performance. While Memory tests typically conflate two unrelated items, latency and bandwidth, the A12X is the clear winner here.



              The next item is floating point performance. This test is trying to compare hand optimized code between different architectures. While the numbers may be skewed by the quality of the optimizations, this should be a good ballpark to overall FPU performance and is directly comparable. Here the two processors have similar results.



              The least helpful test is the test labeled integer performance. It isn't integer performance in the arithmetic sense, it is more so a collection of non-FPU generic workloads. These tests are meaningful in that they show application performance on a platform, but they aren't meaningful to say that processor A is better than processor B as they are somewhat sensitive to memory performance.



              Last is the Crypto work load. This is meaningful in the abstract, although the particular test probably isn't that useful. High performance crypto should be using AES-GCM not AES-CTR, the latter of which doesn't lend itself to to hardware acceleration as well. This is also a domain specific benchmark.



              If I were to try to say something smart about those particular numbers, let's try this;




              • The A12X has significantly improved memory bandwidth. This is partially because desktop memory seems to lag behind the then current memory technologies, but also because memory performance has improved in five years.

              • The A12X has slightly better FPU performance per core than the i7-4790T.

              • The A12X is going to run generic work loads similar to or faster than the i7-4790T.

              • The A12X is much better at domain specific work loads as it provides hardware support for new and different instructions that better reflect the needs of a tablet/cell device.


              Drawing larger conclusions based on those numbers, or making architectural claims based on those numbers is probably unwise.



              As to a generic architecture comparison, RISC vs CISC is no longer meaningful as both instruction sets are decoded into micro-ops which determine how the workload is distributed. Comparing based purely on execution ports is probably not particularly meaningful as those are high level building blocks which are not directly comparable.



              Cache is an important quantity that directly contributes to processor performance, but it is also very complicated. How cache is shared between the intel architecture and the A12X is completely different. In general, having more cache is better, but just as important is cache coherency which affects how threaded applications are able to share data between cores.



              Lastly, the processor needs to work for your workload. While the A12X may be able to support a desktop workload at some point in the future, the i7 v4 supports it now, and that makes it a superior choice for a desktop processor even though it is four to five years older than the A12X.






              share|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                You are comparing very different architectures between the A12X and Haswell (Intel i7-4790T), and the benchmark numbers are not directly comparable, much as the two processors are not directly comparable.



                Understanding what a particular test is testing is helpful in trying to understand what the numbers mean. Going through your geek bench test, let's start at the last line.



                Per your GeekBench tests, the memory bandwidth between the A12X and the haswell chip is heavily skewed. The A12X has roughly twice the memory performance. While Memory tests typically conflate two unrelated items, latency and bandwidth, the A12X is the clear winner here.



                The next item is floating point performance. This test is trying to compare hand optimized code between different architectures. While the numbers may be skewed by the quality of the optimizations, this should be a good ballpark to overall FPU performance and is directly comparable. Here the two processors have similar results.



                The least helpful test is the test labeled integer performance. It isn't integer performance in the arithmetic sense, it is more so a collection of non-FPU generic workloads. These tests are meaningful in that they show application performance on a platform, but they aren't meaningful to say that processor A is better than processor B as they are somewhat sensitive to memory performance.



                Last is the Crypto work load. This is meaningful in the abstract, although the particular test probably isn't that useful. High performance crypto should be using AES-GCM not AES-CTR, the latter of which doesn't lend itself to to hardware acceleration as well. This is also a domain specific benchmark.



                If I were to try to say something smart about those particular numbers, let's try this;




                • The A12X has significantly improved memory bandwidth. This is partially because desktop memory seems to lag behind the then current memory technologies, but also because memory performance has improved in five years.

                • The A12X has slightly better FPU performance per core than the i7-4790T.

                • The A12X is going to run generic work loads similar to or faster than the i7-4790T.

                • The A12X is much better at domain specific work loads as it provides hardware support for new and different instructions that better reflect the needs of a tablet/cell device.


                Drawing larger conclusions based on those numbers, or making architectural claims based on those numbers is probably unwise.



                As to a generic architecture comparison, RISC vs CISC is no longer meaningful as both instruction sets are decoded into micro-ops which determine how the workload is distributed. Comparing based purely on execution ports is probably not particularly meaningful as those are high level building blocks which are not directly comparable.



                Cache is an important quantity that directly contributes to processor performance, but it is also very complicated. How cache is shared between the intel architecture and the A12X is completely different. In general, having more cache is better, but just as important is cache coherency which affects how threaded applications are able to share data between cores.



                Lastly, the processor needs to work for your workload. While the A12X may be able to support a desktop workload at some point in the future, the i7 v4 supports it now, and that makes it a superior choice for a desktop processor even though it is four to five years older than the A12X.






                share|improve this answer












                You are comparing very different architectures between the A12X and Haswell (Intel i7-4790T), and the benchmark numbers are not directly comparable, much as the two processors are not directly comparable.



                Understanding what a particular test is testing is helpful in trying to understand what the numbers mean. Going through your geek bench test, let's start at the last line.



                Per your GeekBench tests, the memory bandwidth between the A12X and the haswell chip is heavily skewed. The A12X has roughly twice the memory performance. While Memory tests typically conflate two unrelated items, latency and bandwidth, the A12X is the clear winner here.



                The next item is floating point performance. This test is trying to compare hand optimized code between different architectures. While the numbers may be skewed by the quality of the optimizations, this should be a good ballpark to overall FPU performance and is directly comparable. Here the two processors have similar results.



                The least helpful test is the test labeled integer performance. It isn't integer performance in the arithmetic sense, it is more so a collection of non-FPU generic workloads. These tests are meaningful in that they show application performance on a platform, but they aren't meaningful to say that processor A is better than processor B as they are somewhat sensitive to memory performance.



                Last is the Crypto work load. This is meaningful in the abstract, although the particular test probably isn't that useful. High performance crypto should be using AES-GCM not AES-CTR, the latter of which doesn't lend itself to to hardware acceleration as well. This is also a domain specific benchmark.



                If I were to try to say something smart about those particular numbers, let's try this;




                • The A12X has significantly improved memory bandwidth. This is partially because desktop memory seems to lag behind the then current memory technologies, but also because memory performance has improved in five years.

                • The A12X has slightly better FPU performance per core than the i7-4790T.

                • The A12X is going to run generic work loads similar to or faster than the i7-4790T.

                • The A12X is much better at domain specific work loads as it provides hardware support for new and different instructions that better reflect the needs of a tablet/cell device.


                Drawing larger conclusions based on those numbers, or making architectural claims based on those numbers is probably unwise.



                As to a generic architecture comparison, RISC vs CISC is no longer meaningful as both instruction sets are decoded into micro-ops which determine how the workload is distributed. Comparing based purely on execution ports is probably not particularly meaningful as those are high level building blocks which are not directly comparable.



                Cache is an important quantity that directly contributes to processor performance, but it is also very complicated. How cache is shared between the intel architecture and the A12X is completely different. In general, having more cache is better, but just as important is cache coherency which affects how threaded applications are able to share data between cores.



                Lastly, the processor needs to work for your workload. While the A12X may be able to support a desktop workload at some point in the future, the i7 v4 supports it now, and that makes it a superior choice for a desktop processor even though it is four to five years older than the A12X.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered yesterday









                Claris

                6291613




                6291613






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Super User!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1377017%2fwhy-does-the-apple-a12x-processor-have-better-benchmark-results-then-the-i7-4790%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    AnyDesk - Fatal Program Failure

                    How to calibrate 16:9 built-in touch-screen to a 4:3 resolution?

                    Актюбинская область