Requirements/ System Specifications.
Argon2 Password hashing function package:
https://github.com/P-H-C/phc-winner-argon2
Aarch64 Fedora 28 version of Linux operating system
Cortex-A57 8 core processor
One set of Dual-Channel DIMM DDR3 8GB RAM (16GB in total)
New Plan
The new plan will be to change the benchmark program to use the internal system timer to calculate the run time (The time required to run the program/ specific piece of code) instead of relying on RDTSC (time counter register, x86_64 type processors only).
Here is the original benchmark program file (bench.c):
/* * Argon2 reference source code package - reference C implementations * * Copyright 2015 * Daniel Dinu, Dmitry Khovratovich, Jean-Philippe Aumasson, and Samuel Neves * * You may use this work under the terms of a Creative Commons CC0 1.0 * License/Waiver or the Apache Public License 2.0, at your option. The terms of * these licenses can be found at: * * - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0 * - Apache 2.0 : http://www.apache.org/licenses/LICENSE-2.0 * * You should have received a copy of both of these licenses along with this * software. If not, they may be obtained at the above URLs. */ #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <time.h> #ifdef _MSC_VER #include <intrin.h> #endif #include "argon2.h" static uint64_t rdtsc(void) { #ifdef _MSC_VER return __rdtsc(); #else #if defined(__amd64__) || defined(__x86_64__) uint64_t rax, rdx; __asm__ __volatile__("rdtsc" : "=a"(rax), "=d"(rdx) : :); return (rdx << 32) | rax; #elif defined(__i386__) || defined(__i386) || defined(__X86__) uint64_t rax; __asm__ __volatile__("rdtsc" : "=A"(rax) : :); return rax; #else #error "Not implemented!" #endif #endif } /* * Benchmarks Argon2 with salt length 16, password length 16, t_cost 3, and different m_cost and threads */ static void benchmark() { #define BENCH_OUTLEN 16 #define BENCH_INLEN 16 const uint32_t inlen = BENCH_INLEN; const unsigned outlen = BENCH_OUTLEN; unsigned char out[BENCH_OUTLEN]; unsigned char pwd_array[BENCH_INLEN]; unsigned char salt_array[BENCH_INLEN]; #undef BENCH_INLEN #undef BENCH_OUTLEN uint32_t t_cost = 3; uint32_t m_cost; uint32_t thread_test[4] = {1, 2, 4, 8}; argon2_type types[3] = {Argon2_i, Argon2_d, Argon2_id}; memset(pwd_array, 0, inlen); memset(salt_array, 1, inlen); for (m_cost = (uint32_t)1 << 10; m_cost <= (uint32_t)1 << 22; m_cost *= 2) { unsigned i; for (i = 0; i < 4; ++i) { double run_time = 0; uint32_t thread_n = thread_test[i]; unsigned j; for (j = 0; j < 3; ++j) { clock_t start_time, stop_time; uint64_t start_cycles, stop_cycles; uint64_t delta; double mcycles; argon2_type type = types[j]; start_time = clock(); start_cycles = rdtsc(); argon2_hash(t_cost, m_cost, thread_n, pwd_array, inlen, salt_array, inlen, out, outlen, NULL, 0, type, ARGON2_VERSION_NUMBER); stop_cycles = rdtsc(); stop_time = clock(); delta = (stop_cycles - start_cycles) / (m_cost); mcycles = (double)(stop_cycles - start_cycles) / (1UL << 20); run_time += ((double)stop_time - start_time) / (CLOCKS_PER_SEC); printf("%s %d iterations %d MiB %d threads: %2.2f cpb %2.2f " "Mcycles \n", argon2_type2string(type, 1), t_cost, m_cost >> 10, thread_n, (float)delta / 1024, mcycles); } printf("%2.4f seconds\n\n", run_time); } } } int main() { benchmark(); return ARGON2_OK; }
I will change the bench.c file by removing the rdtsc function. The rdtsc function only starts and ends the timer to count the time the program code will run. I will also remove any of the code that is affected by the change; marked in red.
The main part/chunk of the program is this section:
argon2_hash(t_cost, m_cost, thread_n, pwd_array, inlen, salt_array, inlen, out, outlen, NULL, 0, type, ARGON2_VERSION_NUMBER);
The code is within a for loop that will continuously run until stopped by the user (using CTRL+C/ kill command). The code will generate each of the three types of argon2 hashing before returning a calculated run time. The three argon2 hashing types are argon2_d, argon2_i, and argon2_id.
The function/ builtin feature of Linux for the system timer is called clock_gettime. The link also contains an example that I will use to get the run time I need for my test. Here is the example:
/* * This program calculates the time required to * execute the program specified as its first argument. * The time is printed in seconds, on standard out. */ #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <time.h> #define BILLION 1000000000L; int main( int argc, char **argv ) { struct timespec start, stop; double accum; if( clock_gettime( CLOCK_REALTIME, &start) == -1 ) { perror( "clock gettime" ); exit( EXIT_FAILURE ); } system( argv[1] ); if( clock_gettime( CLOCK_REALTIME, &stop) == -1 ) { perror( "clock gettime" ); exit( EXIT_FAILURE ); } accum = ( stop.tv_sec - start.tv_sec ) + ( stop.tv_nsec - start.tv_nsec ) / BILLION; printf( "%lf\n", accum ); return( EXIT_SUCCESS ); }
The code that I require are marked in red.
The final code will look like this:
/* * Argon2 reference source code package - reference C implementations * * Copyright 2015 * Daniel Dinu, Dmitry Khovratovich, Jean-Philippe Aumasson, and Samuel Neves * * You may use this work under the terms of a Creative Commons CC0 1.0 * License/Waiver or the Apache Public License 2.0, at your option. The terms of * these licenses can be found at: * * - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0 * - Apache 2.0 : http://www.apache.org/licenses/LICENSE-2.0 * * You should have received a copy of both of these licenses along with this * software. If not, they may be obtained at the above URLs. */ #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <time.h> #include <unistd.h> #define BILLION 1000000000L; #ifdef _MSC_VER #include <intrin.h> #endif #include "argon2.h" /* static uint64_t rdtsc(void) { #ifdef _MSC_VER return __rdtsc(); #else #if defined(__amd64__) || defined(__x86_64__) uint64_t rax, rdx; __asm__ __volatile__("rdtsc" : "=a"(rax), "=d"(rdx) : :); return (rdx << 32) | rax; #elif defined(__i386__) || defined(__i386) || defined(__X86__) uint64_t rax; __asm__ __volatile__("rdtsc" : "=A"(rax) : :); return rax; #else #error "Not implemented!" #endif #endif } */ /* * Benchmarks Argon2 with salt length 16, password length 16, t_cost 3, and different m_cost and threads */ static void benchmark() { #define BENCH_OUTLEN 16 #define BENCH_INLEN 16 const uint32_t inlen = BENCH_INLEN; const unsigned outlen = BENCH_OUTLEN; unsigned char out[BENCH_OUTLEN]; unsigned char pwd_array[BENCH_INLEN]; unsigned char salt_array[BENCH_INLEN]; #undef BENCH_INLEN #undef BENCH_OUTLEN struct timespec start, stop; double accum; uint32_t t_cost = 3; uint32_t m_cost; uint32_t thread_test[4] = {1, 2, 4, 8}; argon2_type types[3] = {Argon2_i, Argon2_d, Argon2_id}; memset(pwd_array, 0, inlen); memset(salt_array, 1, inlen); for (m_cost = (uint32_t)1 << 10; m_cost <= (uint32_t)1 << 22; m_cost *= 2) { unsigned i; for (i = 0; i < 4; ++i) { double run_time = 0; uint32_t thread_n = thread_test[i]; unsigned j; for (j = 0; j < 3; ++j) { /*clock_t start_time, stop_time; uint64_t start_cycles, stop_cycles; uint64_t delta; double mcycles;*/ argon2_type type = types[j]; /*start_time = clock(); start_cycles = rdtsc();*/ if( clock_gettime( CLOCK_REALTIME, &start) == -1 ) { perror( "clock gettime" ); exit( EXIT_FAILURE ); } else { clock_gettime(CLOCK_REALTIME, &start); } argon2_hash(t_cost, m_cost, thread_n, pwd_array, inlen, salt_array, inlen, out, outlen, NULL, 0, type, ARGON2_VERSION_NUMBER); /*stop_cycles = rdtsc(); stop_time = clock();*/ /*delta = (stop_cycles - start_cycles) / (m_cost); mcycles = (double)(stop_cycles - start_cycles) / (1UL << 20); run_time += ((double)stop_time - start_time) / (CLOCKS_PER_SEC);*/ if( clock_gettime( CLOCK_REALTIME, &stop) == -1 ) { perror( "clock gettime" ); exit( EXIT_FAILURE ); } else { clock_gettime(CLOCK_REALTIME, &stop); } accum = ( (double)stop.tv_sec - start.tv_sec ) + ( (double)stop.tv_nsec - start.tv_nsec ); double mcycles = accum / (1UL << 20); uint64_t delta = accum / (m_cost); printf("%s %d iterations %d MiB %d threads: %2.2f cpb %2.2f " "Mcycles \n", argon2_type2string(type, 1), t_cost, m_cost >> 10, thread_n, (float)delta / 1024, mcycles); run_time = 0; run_time += accum / BILLION; } printf("%2.4f seconds\n\n", run_time); } } } int main() { benchmark(); return ARGON2_OK; }
NOTE: /* */ is a comment block/ force the program compiler to ignore this section of code.
I will now explain what the code does.
#include <time.h> #include <unistd.h> #define BILLION 1000000000L;
The #include will use the library of codes that I do not have to manually write. The code is already included in the GNU gcc C language compiler during installation.
NOTE: The format of the code is enclosed in triangular brackets (<>).
The #define is a code that will create a variable/ place to hold something that I will use later in the program.
NOTE: The format requires a name then the value.
The next set of code:
struct timespec start, stop; double accum;
The struct code will call a structure(Set of pre-made code with a specific format). This code will call the timespec structure that will allow my program to use the start and stop commands.
NOTE: The format of the struct code require the structure’s name followed by the commands. This also require the line to be closed with the semi-colon(;), like in most C/C++ language code.
The code double is a variable that will hold a value that will be used later in the program.
NOTE: The format of the code will require a specific name type followed by the variable’s name. Example: double is the variable type, accum is the variable’s name.
Here is the next piece of code:
double run_time = 0;
This is another variable that I will assign a value of zero.
NOTE: I have placed this code within a for loop to constantly reset the run_time variable. I will have to reset the time counter each time the program runs the main chunk of code mentioned before.
The next piece of code:
if( clock_gettime( CLOCK_REALTIME, &start) == -1 ) { perror( "clock gettime" ); exit( EXIT_FAILURE ); } else { clock_gettime(CLOCK_REALTIME, &start); }
The red highlighted part is from the example found here:(https://users.pja.edu.pl/~jms/qnx/help/watcom/clibref/qnx/clock_gettime.html). The code will check if the program cannot access the system time and return an error to the user.
I added an else code to start the timer if the system timer is accessible.
NOTE:The format of the else code is always after an if code/statement.
The next section of code:
if( clock_gettime( CLOCK_REALTIME, &stop) == -1 ) {
perror( "clock gettime" );
exit( EXIT_FAILURE );
}
else
{
clock_gettime(CLOCK_REALTIME, &stop);
}
I will stop the timer after the main chunk of code is done running. This code is similar to the code to start the timer. The if code/statement will check for any errors that might occur if the system timer cannot be stopped.
The next section of code:
accum = ( (double)stop.tv_sec - start.tv_sec ) + ( (double)stop.tv_nsec - start.tv_nsec );
The calculation of the run time is set to the variable accum.
NOTE: This is similar to the code from the example (https://users.pja.edu.pl/~jms/qnx/help/watcom/clibref/qnx/clock_gettime.html) but I have removed the / BILLION at the end because I will need the number in the original form for the next lines of code.
double mcycles = accum / (1UL << 20); uint64_t delta = accum / (m_cost);
The variable mcycles will take the value of variable accum and divide it by (1UL << 20). The explanation of 1UL is found here (https://stackoverflow.com/questions/14467173/bit-setting-in-ansi-c). It is an unsigned long integer value. The << 20 is the bit shift code that will move the position of the bit/ value to the left twenty times. Similar to basic algebra, the brackets are performed first. This variable is suppose to count memory cycles.
The variable delta is a uint64_t variable type. It is an unsigned 64-bit integer variable. The delta variable will calculate the efficiency of the program. The value is from the timed value divided by the memory cost (2^n) value.
The mcost variable is from within the for loop checking the conditions found here:
for (m_cost = (uint32_t)1 << 10; m_cost <= (uint32_t)1 << 22; m_cost *= 2)
NOTE: The variable type uint32_t is an unsigned 32-bit integer variable.
The next section of code:
run_time = 0; run_time += accum / BILLION;
The run_time variable is set to 0 again because the GNU gcc C language compiler kept complaining about the variable not being used. (This maybe an issue later; I will have to check it later)
I will set the variable run_time with the value of itself combined with the value of (accum divided by BILLION). This is where the required code from the example found here (https://users.pja.edu.pl/~jms/qnx/help/watcom/clibref/qnx/clock_gettime.html). The equation will change the value into a number closer to a second. The result will be a really fast number without the equation. Logically, a processor(CPU) can process information in the Gigahertz range(1,000,000,000 per line of code). This will mean the equation should divide the timed result by one billion to get a number in seconds.
The final section of code is:
printf("%2.4f seconds\n\n", run_time);
This line of code will output the calculated time value to the user. The value is counted by two decimal positions to the left and four decimal positions to the right of the decimal.
Result:
The result was strange as the calculated time had negative values. Also the benchmark program ran really fast compared to the original benchmark program performed on a x86_64 processor system.
This is to build the program using the builtin Makefile included in the argon2 package.
Building without optimizations cc -std=c89 -O2 -Wall -g -Iinclude -Isrc -pthread src/argon2.c src/core.c src/blake2/blake2b.c src/thread.c src/encoding.c src/ref.c src/bench.c -o bench
NOTE: I have changed the built flag to -O2 instead of the -O3.
The next test(On a x86_64 architecture; Basic Test of original program)
The x86_64 system have these hardware:
Intel(R) Xeon(R) CPU E5-1630 v4 @ 3.70GHz
Four sticks of 8GB DIMM DDR4 RAM at 2.4 GHz (32 GB of RAM in total)
x86_64 Fedora 28 version of Linux Operating System
I will first show the x86_64 system results:
37312731171934821544Argon2i 3 iterations 1 MiB 1 threads: 10574.77 cpb 10574.77 Mcycles 1247866662622202070Argon2d 3 iterations 1 MiB 1 threads: 10573.72 cpb 10573.72 Mcycles 8121691903307694325Argon2id 3 iterations 1 MiB 1 threads: 10571.92 cpb 10571.92 Mcycles 0.0100 seconds 14977167733997818648Argon2i 3 iterations 1 MiB 2 threads: 10576.28 cpb 10576.28 Mcycles 2187773163388595072Argon2d 3 iterations 1 MiB 2 threads: 10572.17 cpb 10572.17 Mcycles 28735233341075268257Argon2id 3 iterations 1 MiB 2 threads: 10573.05 cpb 10573.05 Mcycles 0.0171 seconds 35601863931760751719Argon2i 3 iterations 1 MiB 4 threads: 10571.92 cpb 10571.93 Mcycles 42457421402446018348Argon2d 3 iterations 1 MiB 4 threads: 10571.65 cpb 10571.65 Mcycles 6359495663131182784Argon2id 3 iterations 1 MiB 4 threads: 10571.64 cpb 10571.64 Mcycles 0.0220 seconds 13211386193827312231Argon2i 3 iterations 1 MiB 8 threads: 10582.07 cpb 10582.07 Mcycles 2017380744228771863Argon2d 3 iterations 1 MiB 8 threads: 10582.25 cpb 10582.25 Mcycles 2713660177924949126Argon2id 3 iterations 1 MiB 8 threads: 10582.15 cpb 10582.15 Mcycles 0.0401 seconds 34098789171626307623Argon2i 3 iterations 2 MiB 1 threads: 5293.53 cpb 10587.05 Mcycles 41112087152325301561Argon2d 3 iterations 2 MiB 1 threads: 5292.41 cpb 10584.83 Mcycles 5152598753020665429Argon2id 3 iterations 2 MiB 1 threads: 5290.67 cpb 10581.34 Mcycles 0.0193 seconds 12106284303712976859Argon2i 3 iterations 2 MiB 2 threads: 5289.21 cpb 10578.43 Mcycles 1902890366109077386Argon2d 3 iterations 2 MiB 2 threads: 5288.64 cpb 10577.29 Mcycles 2593963985801686024Argon2id 3 iterations 2 MiB 2 threads: 5289.38 cpb 10578.75 Mcycles 0.0248 seconds 32865988571490376646Argon2i 3 iterations 2 MiB 4 threads: 5287.49 cpb 10574.99 Mcycles 39752915512178772514Argon2d 3 iterations 2 MiB 4 threads: 5287.35 cpb 10574.71 Mcycles 3687074192867256577Argon2id 3 iterations 2 MiB 4 threads: 5287.40 cpb 10574.80 Mcycles 0.0308 seconds 10572264843566742997Argon2i 3 iterations 2 MiB 8 threads: 5292.63 cpb 10585.26 Mcycles 17567059124266120428Argon2d 3 iterations 2 MiB 8 threads: 5292.58 cpb 10585.16 Mcycles 2456075227669645387Argon2id 3 iterations 2 MiB 8 threads: 5292.16 cpb 10584.33 Mcycles 0.0521 seconds 31545606251397912479Argon2i 3 iterations 4 MiB 1 threads: 2653.18 cpb 10612.73 Mcycles 38828398532114725197Argon2d 3 iterations 4 MiB 1 threads: 2650.45 cpb 10601.79 Mcycles 3047237062830488385Argon2id 3 iterations 4 MiB 1 threads: 2650.19 cpb 10600.76 Mcycles 0.0368 seconds 10204770293535003798Argon2i 3 iterations 4 MiB 2 threads: 2647.51 cpb 10590.04 Mcycles 17249317724238457938Argon2d 3 iterations 4 MiB 2 threads: 2647.27 cpb 10589.09 Mcycles 2428403684647788185Argon2id 3 iterations 4 MiB 2 threads: 2647.47 cpb 10589.87 Mcycles 0.0431 seconds 31327130301342861294Argon2i 3 iterations 4 MiB 4 threads: 2645.27 cpb 10581.06 Mcycles 38277567492037602970Argon2d 3 iterations 4 MiB 4 threads: 2645.19 cpb 10580.78 Mcycles 2276116172732562416Argon2id 3 iterations 4 MiB 4 threads: 2645.23 cpb 10580.91 Mcycles 0.0495 seconds 9225480733436003113Argon2i 3 iterations 4 MiB 8 threads: 2647.25 cpb 10589.02 Mcycles 16259668664138962378Argon2d 3 iterations 4 MiB 8 threads: 2647.14 cpb 10588.58 Mcycles 2328880090548145081Argon2id 3 iterations 4 MiB 8 threads: 2647.44 cpb 10589.76 Mcycles 0.0744 seconds 30330952961289279353Argon2i 3 iterations 8 MiB 1 threads: 1328.12 cpb 10624.97 Mcycles 37742017952023717309Argon2d 3 iterations 8 MiB 1 threads: 1327.33 cpb 10618.61 Mcycles 2136636212757009041Argon2id 3 iterations 8 MiB 1 threads: 1327.19 cpb 10617.52 Mcycles 0.0498 seconds 9469966913485517122Argon2i 3 iterations 8 MiB 2 threads: 1326.61 cpb 10612.92 Mcycles 16754634484211165330Argon2d 3 iterations 8 MiB 2 threads: 1326.28 cpb 10610.23 Mcycles 2401273148642647180Argon2id 3 iterations 8 MiB 2 threads: 1326.35 cpb 10610.84 Mcycles 0.0771 seconds 31275726791350457380Argon2i 3 iterations 8 MiB 4 threads: 1324.15 cpb 10593.21 Mcycles 38353773162057823887Argon2d 3 iterations 8 MiB 4 threads: 1324.10 cpb 10592.79 Mcycles 2478461132765912450Argon2id 3 iterations 8 MiB 4 threads: 1324.18 cpb 10593.42 Mcycles 0.0866 seconds 9558952213482208994Argon2i 3 iterations 8 MiB 8 threads: 1325.16 cpb 10601.28 Mcycles 16721510404198952967Argon2d 3 iterations 8 MiB 8 threads: 1325.22 cpb 10601.75 Mcycles 2388942706619242490Argon2id 3 iterations 8 MiB 8 threads: 1325.04 cpb 10600.28 Mcycles 0.1194 seconds 31042168541427536599Argon2i 3 iterations 16 MiB 1 threads: 668.06 cpb 10688.99 Mcycles 39124498882214906246Argon2d 3 iterations 16 MiB 1 threads: 666.82 cpb 10669.10 Mcycles 4048743402995991599Argon2id 3 iterations 16 MiB 1 threads: 666.44 cpb 10663.08 Mcycles 0.0951 seconds 11859953773762067761Argon2i 3 iterations 16 MiB 2 threads: 665.54 cpb 10648.73 Mcycles 1952051050242384679Argon2d 3 iterations 16 MiB 2 threads: 666.10 cpb 10657.54 Mcycles 27273213391008354924Argon2id 3 iterations 16 MiB 2 threads: 665.54 cpb 10648.67 Mcycles 0.1325 seconds 34933301581740323590Argon2i 3 iterations 16 MiB 4 threads: 663.51 cpb 10616.20 Mcycles 42252147552471078403Argon2d 3 iterations 16 MiB 4 threads: 663.45 cpb 10615.13 Mcycles 6610278363201576241Argon2id 3 iterations 16 MiB 4 threads: 663.43 cpb 10614.86 Mcycles 0.1505 seconds 13915654333937508899Argon2i 3 iterations 16 MiB 8 threads: 663.75 cpb 10620.00 Mcycles 2127460927377542831Argon2d 3 iterations 16 MiB 8 threads: 663.70 cpb 10619.15 Mcycles 28623010321111202089Argon2id 3 iterations 16 MiB 8 threads: 663.63 cpb 10618.02 Mcycles 0.1908 seconds 35961333192018513362Argon2i 3 iterations 32 MiB 1 threads: 336.98 cpb 10783.46 Mcycles 2084438022925688382Argon2d 3 iterations 32 MiB 1 threads: 336.98 cpb 10783.37 Mcycles 11156363583834542778Argon2id 3 iterations 32 MiB 1 threads: 337.03 cpb 10784.95 Mcycles 0.1887 seconds 2024511488406553503Argon2i 3 iterations 32 MiB 2 threads: 335.78 cpb 10745.00 Mcycles 28914570101263429122Argon2d 3 iterations 32 MiB 2 threads: 335.48 cpb 10735.39 Mcycles 37483569582110660894Argon2id 3 iterations 32 MiB 2 threads: 335.19 cpb 10726.17 Mcycles 0.2657 seconds 3006611252891444371Argon2i 3 iterations 32 MiB 4 threads: 333.21 cpb 10662.76 Mcycles 10814032083673982863Argon2d 3 iterations 32 MiB 4 threads: 333.26 cpb 10664.48 Mcycles 1863928371162819675Argon2id 3 iterations 32 MiB 4 threads: 333.30 cpb 10665.70 Mcycles 0.2917 seconds 2647894462945096885Argon2i 3 iterations 32 MiB 8 threads: 333.25 cpb 10664.09 Mcycles 34300370941726852693Argon2d 3 iterations 32 MiB 8 threads: 333.24 cpb 10663.72 Mcycles 42118400902507645479Argon2id 3 iterations 32 MiB 8 threads: 333.21 cpb 10662.75 Mcycles 0.3958 seconds 6975984133638080775Argon2i 3 iterations 64 MiB 1 threads: 171.82 cpb 10996.26 Mcycles 1828026929462843471Argon2d 3 iterations 64 MiB 1 threads: 171.66 cpb 10986.06 Mcycles 29477903061591876255Argon2id 3 iterations 64 MiB 1 threads: 171.79 cpb 10994.90 Mcycles 0.3656 seconds 40768340232642930179Argon2i 3 iterations 64 MiB 2 threads: 170.63 cpb 10920.52 Mcycles 8329564143653322544Argon2d 3 iterations 64 MiB 2 threads: 170.03 cpb 10881.71 Mcycles 1843354044359643162Argon2id 3 iterations 64 MiB 2 threads: 169.89 cpb 10873.02 Mcycles 0.4956 seconds 28446343191230071866Argon2i 3 iterations 64 MiB 4 threads: 167.94 cpb 10748.23 Mcycles 37150261572105910736Argon2d 3 iterations 64 MiB 4 threads: 168.02 cpb 10753.43 Mcycles 2958699032978678975Argon2id 3 iterations 64 MiB 4 threads: 167.98 cpb 10750.53 Mcycles 0.5300 seconds 11686546543836486008Argon2i 3 iterations 64 MiB 8 threads: 167.75 cpb 10736.24 Mcycles 2026438434392927139Argon2d 3 iterations 64 MiB 8 threads: 167.66 cpb 10730.16 Mcycles 28778995791260930582Argon2id 3 iterations 64 MiB 8 threads: 167.91 cpb 10745.94 Mcycles 0.7020 seconds
Here are the results of the new code change:
Argon2i 3 iterations 1 MiB 1 threads: 5.53 cpb 5.53 Mcycles Argon2d 3 iterations 1 MiB 1 threads: 5.14 cpb 5.15 Mcycles Argon2id 3 iterations 1 MiB 1 threads: 4.63 cpb 4.63 Mcycles 0.0049 seconds Argon2i 3 iterations 1 MiB 2 threads: 3.57 cpb 3.57 Mcycles Argon2d 3 iterations 1 MiB 2 threads: 3.23 cpb 3.23 Mcycles Argon2id 3 iterations 1 MiB 2 threads: 3.29 cpb 3.30 Mcycles 0.0035 seconds Argon2i 3 iterations 1 MiB 4 threads: 2.62 cpb 2.62 Mcycles Argon2d 3 iterations 1 MiB 4 threads: 2.53 cpb 2.53 Mcycles Argon2id 3 iterations 1 MiB 4 threads: 2.59 cpb 2.59 Mcycles 0.0027 seconds Argon2i 3 iterations 1 MiB 8 threads: 4.20 cpb 4.20 Mcycles Argon2d 3 iterations 1 MiB 8 threads: 4.14 cpb 4.14 Mcycles Argon2id 3 iterations 1 MiB 8 threads: 4.41 cpb 4.41 Mcycles 0.0046 seconds Argon2i 3 iterations 2 MiB 1 threads: 5.43 cpb 10.86 Mcycles Argon2d 3 iterations 2 MiB 1 threads: 5.20 cpb 10.40 Mcycles Argon2id 3 iterations 2 MiB 1 threads: 4.67 cpb 9.33 Mcycles 0.0098 seconds Argon2i 3 iterations 2 MiB 2 threads: 2.93 cpb 5.85 Mcycles Argon2d 3 iterations 2 MiB 2 threads: 2.84 cpb 5.69 Mcycles Argon2id 3 iterations 2 MiB 2 threads: 2.86 cpb 5.72 Mcycles 0.0060 seconds Argon2i 3 iterations 2 MiB 4 threads: 1.96 cpb 3.91 Mcycles Argon2d 3 iterations 2 MiB 4 threads: 1.94 cpb 3.89 Mcycles Argon2id 3 iterations 2 MiB 4 threads: 1.95 cpb 3.90 Mcycles 0.0041 seconds Argon2i 3 iterations 2 MiB 8 threads: 2.56 cpb 5.12 Mcycles Argon2d 3 iterations 2 MiB 8 threads: 2.51 cpb 5.01 Mcycles Argon2id 3 iterations 2 MiB 8 threads: 2.53 cpb 5.06 Mcycles 0.0053 seconds Argon2i 3 iterations 4 MiB 1 threads: 5.52 cpb 22.10 Mcycles Argon2d 3 iterations 4 MiB 1 threads: 5.00 cpb 19.98 Mcycles Argon2id 3 iterations 4 MiB 1 threads: 4.70 cpb 18.79 Mcycles 0.0197 seconds Argon2i 3 iterations 4 MiB 2 threads: 2.78 cpb 11.11 Mcycles Argon2d 3 iterations 4 MiB 2 threads: 2.68 cpb 10.74 Mcycles Argon2id 3 iterations 4 MiB 2 threads: 2.70 cpb 10.79 Mcycles 0.0113 seconds Argon2i 3 iterations 4 MiB 4 threads: 1.66 cpb 6.63 Mcycles Argon2d 3 iterations 4 MiB 4 threads: 1.64 cpb 6.56 Mcycles Argon2id 3 iterations 4 MiB 4 threads: 1.65 cpb 6.61 Mcycles 0.0069 seconds Argon2i 3 iterations 4 MiB 8 threads: 2.37 cpb 9.47 Mcycles Argon2d 3 iterations 4 MiB 8 threads: 2.24 cpb 8.95 Mcycles Argon2id 3 iterations 4 MiB 8 threads: 1.89 cpb 7.57 Mcycles 0.0079 seconds Argon2i 3 iterations 8 MiB 1 threads: 5.78 cpb 46.22 Mcycles Argon2d 3 iterations 8 MiB 1 threads: 5.29 cpb 42.36 Mcycles Argon2id 3 iterations 8 MiB 1 threads: 4.89 cpb 39.12 Mcycles 0.0410 seconds Argon2i 3 iterations 8 MiB 2 threads: 2.70 cpb 21.64 Mcycles Argon2d 3 iterations 8 MiB 2 threads: 2.67 cpb 21.32 Mcycles Argon2id 3 iterations 8 MiB 2 threads: 0.00 cpb -932.22 Mcycles -0.9775 seconds Argon2i 3 iterations 8 MiB 4 threads: 1.53 cpb 12.27 Mcycles Argon2d 3 iterations 8 MiB 4 threads: 1.52 cpb 12.14 Mcycles Argon2id 3 iterations 8 MiB 4 threads: 1.52 cpb 12.14 Mcycles 0.0127 seconds Argon2i 3 iterations 8 MiB 8 threads: 1.84 cpb 14.72 Mcycles Argon2d 3 iterations 8 MiB 8 threads: 1.77 cpb 14.19 Mcycles Argon2id 3 iterations 8 MiB 8 threads: 1.74 cpb 13.91 Mcycles 0.0146 seconds Argon2i 3 iterations 16 MiB 1 threads: 5.97 cpb 95.55 Mcycles Argon2d 3 iterations 16 MiB 1 threads: 5.50 cpb 88.01 Mcycles Argon2id 3 iterations 16 MiB 1 threads: 5.21 cpb 83.43 Mcycles 0.0875 seconds Argon2i 3 iterations 16 MiB 2 threads: 2.87 cpb 45.87 Mcycles Argon2d 3 iterations 16 MiB 2 threads: 2.83 cpb 45.24 Mcycles Argon2id 3 iterations 16 MiB 2 threads: 2.84 cpb 45.39 Mcycles 0.0476 seconds Argon2i 3 iterations 16 MiB 4 threads: 1.58 cpb 25.29 Mcycles Argon2d 3 iterations 16 MiB 4 threads: 1.56 cpb 24.91 Mcycles Argon2id 3 iterations 16 MiB 4 threads: 1.56 cpb 24.98 Mcycles 0.0262 seconds Argon2i 3 iterations 16 MiB 8 threads: 1.78 cpb 28.54 Mcycles Argon2d 3 iterations 16 MiB 8 threads: 1.78 cpb 28.55 Mcycles Argon2id 3 iterations 16 MiB 8 threads: 1.77 cpb 28.28 Mcycles 0.0297 seconds Argon2i 3 iterations 32 MiB 1 threads: 6.18 cpb 197.69 Mcycles Argon2d 3 iterations 32 MiB 1 threads: 0.00 cpb -758.56 Mcycles Argon2id 3 iterations 32 MiB 1 threads: 6.12 cpb 195.79 Mcycles 0.2053 seconds Argon2i 3 iterations 32 MiB 2 threads: 3.38 cpb 108.24 Mcycles Argon2d 3 iterations 32 MiB 2 threads: 3.34 cpb 106.87 Mcycles Argon2id 3 iterations 32 MiB 2 threads: 3.36 cpb 107.44 Mcycles 0.1127 seconds Argon2i 3 iterations 32 MiB 4 threads: 1.92 cpb 61.53 Mcycles Argon2d 3 iterations 32 MiB 4 threads: 1.89 cpb 60.38 Mcycles Argon2id 3 iterations 32 MiB 4 threads: 1.89 cpb 60.60 Mcycles 0.0635 seconds Argon2i 3 iterations 32 MiB 8 threads: 1.85 cpb 59.29 Mcycles Argon2d 3 iterations 32 MiB 8 threads: 1.96 cpb 62.65 Mcycles Argon2id 3 iterations 32 MiB 8 threads: 0.00 cpb -893.19 Mcycles -0.9366 seconds Argon2i 3 iterations 64 MiB 1 threads: 6.29 cpb 402.50 Mcycles Argon2d 3 iterations 64 MiB 1 threads: 6.22 cpb 397.86 Mcycles Argon2id 3 iterations 64 MiB 1 threads: 0.00 cpb -554.57 Mcycles -0.5815 seconds Argon2i 3 iterations 64 MiB 2 threads: 3.45 cpb 220.73 Mcycles Argon2d 3 iterations 64 MiB 2 threads: 3.41 cpb 218.22 Mcycles Argon2id 3 iterations 64 MiB 2 threads: 3.42 cpb 218.81 Mcycles 0.2294 seconds Argon2i 3 iterations 64 MiB 4 threads: 0.00 cpb -830.95 Mcycles Argon2d 3 iterations 64 MiB 4 threads: 1.90 cpb 121.72 Mcycles Argon2id 3 iterations 64 MiB 4 threads: 1.90 cpb 121.88 Mcycles 0.1278 seconds Argon2i 3 iterations 64 MiB 8 threads: 1.93 cpb 123.78 Mcycles Argon2d 3 iterations 64 MiB 8 threads: 1.97 cpb 126.37 Mcycles Argon2id 3 iterations 64 MiB 8 threads: 1.81 cpb 115.84 Mcycles 0.1215 seconds
The Aarch64 program(My code changes) runs extremely fast. The speed increase also produce the mentioned negative time value problem. It does not make sense to have a negative time value as time is always running and moving forward. The original program(x86_64 only) had a noticable delay before outputting the results. It can also be seen in the Mcycles(Memory cycles) and cpb(memory cost) values.
(I will continue the testing in Project: Part3, Progress 3)