Project: Part2 – Initial build testing on argon2 package using C and Assembler language(Progress 5)

Following the last blog called “Project: Part2 – Initial build testing on argon2 package using C and Assembler language(Progress 4)

I will be testing using the following criteria (The crossed out points are completed in previous progress blogs)(Red highlighted are the current topic(s)):

  • Check how many inline assembler code already exist
  • Check how many dedicated/ separate assembler files exist in the package (File Extension: s or S)
  • Check how many files use the C programming language (File Extension: c)
  • Use a profiling tool to check how optimized the program is in the current build package (Tools such as: gprof, stap, etc.)
  • Build the package and check the default results by testing the build against a password file. The file will be made using a Microsoft Excel formula.
  • Check that the results are relatively consistent. (This will be from program compile time, program run time, and the program file size) (This blog’s topic)
  • Introduce a minor or major change that will optimize or reduce performance of the original program. This is done by changing the character(s) of the password file generated by the Microsoft Excel formula.
  • Compare the results of the changes and the original program results.
  • Building/ Testing the original argon2 package

This is the system specifications:

Argon2 package on an Aarch64 Fedora 28 Linux operating system. The system has 8 CPU cores; Cortex-A57 Model.

My current directory is /home/username/projects/phc-winner-argon2/.

I will be testing the argon2 program with the already made compiling file “Makefile” to check the following:

  • Program compile time
  • Program run time
  • Program file size

I will be using this command to build the program:

time make argon2

Note: The time command will record the user run time, system run time, and the real run time(user run time and system run time combined)

I will be using this command to check the file size:

ls -l

I will also need to remove the existing argon2 program. The GNU gcc compiler builds a new argon2 program when the old/ existing argon2 program is removed. This is the command:

rm argon2

I will not begin testing.

Test 1

Start a new build by removing any existing argon2 program.

rm argon2

Build and time the argon2 program:

time make argon2

This is the result:

Building without optimizations
cc -std=c89 -O3 -Wall -g -Iinclude -Isrc -pthread src/argon2.c src/core.c src/blake2/blake2b.c src/thread.c src/encoding.c src/ref.c src/run.c -o argon2

real 0m2.506s
user 0m2.336s
sys 0m0.163s
  • real 0m2.506s is the total time it took for the user to run the program and the system to send the program to the CPU to be processed (in seconds).
  • user 0m2.336s is the total time it took for the user to run the program(in seconds).
  • sys 0m0.164s is the total time it took for the system to send the program to the CPU to be processed(in seconds).

File size: 252944 bits (about 253 kilo bits)

Note: Notice the highlighted -O3. This normally is the optimization level in GNU gcc compiler for maximum optimization.

Test 2

I will instead try to change the highlighted -O3 in Test 1 to a lower level of optimization called -O2.

I will need to edit the “Makefile” using this command:

vi Makefile

Move the cursor in front of the -O3 and press the Insert key. Type the -O2. Press the ESC key. Type 😡. This will change the “Makefile” to test if a lower level of optimization will have an impact on any of the argon2 program.

Start a new build by removing any existing argon2 program.

rm argon2

Build and time the argon2 program:

time make argon2

This is the result:

Building without optimizations
cc -std=c89 -O2 -Wall -g -Iinclude -Isrc -pthread src/argon2.c src/core.c src/blake2/blake2b.c src/thread.c src/encoding.c src/ref.c src/run.c -o argon2

real 0m1.783s
user 0m1.566s
sys 0m0.214s

File size: 206464 bits (about 206 kilo bits)

Note: Notice that the time for real, user, and sys have all lowered. The file size is also smaller than in Test 1. This test was performed in the middle of the night. This means there should be low amount of users accessing the system. I can check the number of users using the command: w. (Reference: https://www.cyberciti.biz/faq/unix-linux-list-current-logged-in-users/).

Output:

01:38:13 up 74 days, 4:56, 1 user, load average: 0.06, 0.03, 0.01
USER TTY LOGIN@ IDLE JCPU PCPU WHAT
username pts/0 23:01 0.00s 0.14s 0.02s w

I will now check the argon2 program with the current setting of -O2 optimization level.

The command to test:

sudo time operf echo -n "Ch(329nE" | ./argon2 somesalt -t 2 -m 16 -p 4 -l 24

Result:

operf: Profiler started

Profiling done.
0.05user 0.12system 0:00.25elapsed 70%CPU (0avgtext+0avgdata 5120maxresident)k
0inputs+40outputs (3major+2052minor)pagefaults 0swaps
Type: Argon2i
Iterations: 2
Memory: 65536 KiB
Parallelism: 4
Hash: 6271154a35ed64acc752368ca97460c0e295a404d0ba0d2a
Encoded: $argon2i$v=19$m=65536,t=2,p=4$c29tZXNhbHQ$YnEVSjXtZKzHUjaMqXRgwOKVpATQug0q
0.320 seconds
Verification ok

Detailed Report(using opreport -d):

Using /home/username/projects/phc-winner-argon2/oprofile_data/samples/ for samples directory.
CPU: ARM Cortex-A57, speed 750 MHz (estimated)
Counted CPU_CYCLES events (Cycle) with a unit mask of 0x00 (No unit mask) count 100000
vma samples % image name symbol name
00008ce0 2 40.0000 ld-2.27.so do_lookup_x
00008dc0 1 50.0000
00008e14 1 50.0000
000096c0 1 20.0000 ld-2.27.so _dl_lookup_symbol_x
000096f4 1 100.000
00109ea8 1 20.0000 libc-2.27.so _dl_addr
00109f78 1 100.000
000c2d10 1 20.0000 libc-2.27.so write
000c2d3c 1 100.000

Note: This program is split into 4 threads from the -p 4 option. This optimization level split the processes into multiple CPU cores compared to the baseline tests in the previous blog (Project: Part2 – Initial build testing on argon2 package using C and Assembler language(Progress 4)).

Test 3

I will change the optimization level back to -O3.

I will need to edit the “Makefile” using this command:

vi Makefile

Move the cursor in front of the -O2 and press the Insert key. Type the -O3. Press the ESC key. Type :x. This will change the “Makefile” to test if a lower level of optimization will have an impact on any of the argon2 program.

Start a new build by removing any existing argon2 program.

rm argon2

Build and time the argon2 program:

time make argon2

This is the result:

Building without optimizations
cc -std=c89 -O3 -Wall -g -Iinclude -Isrc -pthread src/argon2.c src/core.c src/blake2/blake2b.c src/thread.c src/encoding.c src/ref.c src/run.c -o argon2

real 0m2.498s
user 0m2.307s
sys 0m0.189s

File Size: 252944 bits (about 253 kilo bits)

Note: The time is about the same as the results from the previous blog (Project: Part2 – Initial build testing on argon2 package using C and Assembler language(Progress 4)).

Test 4

This test will see what will happen if I add another character into the Microsoft Generated Password “Ch(329nE”. I will add a 0 to the end of the generated password.

This is the new password that I will use for testing, “Ch(329nE0”.

The command to test:

sudo time operf echo -n "Ch(329nE0" | ./argon2 somesalt -t 2 -m 16 -p 4 -l 24

Result:

operf: Profiler started

Profiling done.
0.04user 0.12system 0:00.23elapsed 75%CPU (0avgtext+0avgdata 5120maxresident)k
0inputs+40outputs (3major+2046minor)pagefaults 0swaps
Type: Argon2i
Iterations: 2
Memory: 65536 KiB
Parallelism: 4
Hash: 9a7a5b4a3055934595908ba9bad1e959c1f5cc7fbdf37394
Encoded: $argon2i$v=19$m=65536,t=2,p=4$c29tZXNhbHQ$mnpbSjBVk0WVkIuputHpWcH1zH+983OU
0.297 seconds
Verification ok

The initial build result:

operf: Profiler started 
Profiling done. 0.03user 0.14system 0:00.21elapsed 82%CPU (0avgtext+0avgdata 5196maxresident)k
0inputs+48outputs (3major+2051minor)pagefaults 0swaps 
Type: Argon2i 
Iterations: 2 
Memory: 65536 KiB 
Parallelism: 4 
Hash: 6271154a35ed64acc752368ca97460c0e295a404d0ba0d2a 
Encoded: $argon2i$v=19$m=65536,t=2,p=4$c29tZXNhbHQ$YnEVSjXtZKzHUjaMqXRgwOKVpATQug0q 
0.296 seconds 
Verification ok

Test 5

I did one more test for timing of the argon2 program. Here is the result:

Profiling done.
0.06user 0.11system 0:00.21elapsed 84%CPU (0avgtext+0avgdata 5172maxresident)k
0inputs+40outputs (2major+2061minor)pagefaults 0swaps
Type: Argon2i
Iterations: 2
Memory: 65536 KiB
Parallelism: 4
Hash: 6271154a35ed64acc752368ca97460c0e295a404d0ba0d2a
Encoded: $argon2i$v=19$m=65536,t=2,p=4$c29tZXNhbHQ$YnEVSjXtZKzHUjaMqXRgwOKVpATQug0q
0.297 seconds
Verification ok

Test 6

I did notice that the argon2 package default testing program, “smoke test”, did not include any optimization for Aarch64 type architecture.

This will not work on an Aarch64 architecture.

Here is the section of code found in the Makefile:

static uint64_t rdtsc(void) {
#ifdef _MSC_VER
return __rdtsc();
#else
#if defined(__amd64__) || defined(__x86_64__)
uint64_t rax, rdx;
__asm__ __volatile__("rdtsc" : "=a"(rax), "=d"(rdx) : :);
return (rdx << 32) | rax;
#elif defined(__i386__) || defined(__i386) || defined(__X86__)
uint64_t rax;
__asm__ __volatile__("rdtsc" : "=A"(rax) : :);
return rax;
#else
#error "Not implemented!"
#endif
#endif
}

This is the command to build the testing program of argon2:

make build

The testing program will run the test with randomly generated input and various settings until the user stops the program using CTRL+C.

Here is a sample output(on a x86_64 architecture system):

Argon2i 3 iterations 64 MiB 2 threads: 4.67 cpb 299.20 Mcycles
Argon2d 3 iterations 64 MiB 2 threads: 4.40 cpb 281.41 Mcycles
Argon2id 3 iterations 64 MiB 2 threads: 4.35 cpb 278.59 Mcycles
0.4259 seconds

Each test of the program will check argon2 type, number of iterations,  memory usage(in Megabytes), value of total run time divided by memory usage, and number of CPU cycles.

I also found that the ARMv8 and other Aarch64 architecture system has drop support for a function similar to rdtsc, a time stamp counter, found on this on this site: https://stackoverflow.com/questions/32374599/mcr-and-mrc-does-not-exist-on-aarch64.

Another method is to enable performance counters on the Aarch64 system for assembler language use. Here is a link that explains it: https://stackoverflow.com/questions/34590846/enabling-performance-monitoring-register-to-user-access-mode. This does require privileged mode(administrative rights) to enable the performance counters.

Conclusion

Comparing Test 4 and the initial build result show that the CPU is used a lot, 75-85% in about 0.3 seconds. This could be a lot of CPU usage for hashing a password but the program argon2 executes in a short amount of time and does not need to use a continuous amount of system resources. Also, the GNU gcc compiler is already running in -O3 optimization level. The “Makefile” is already optimize for Linux, Darwin, CYGWIN, MINGW, MSYS, and SunOS. The next step is to  check the code for any unnecessary code.

(I will continue the next step in Project Part 3, Progress 1)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s