Project: Part2 – Initial build testing on argon2 package using C and Assembler language(Progress 4)

Following the last blog called “Project: Part2 – Initial build testing on argon2 package using C and Assembler language(Progress 3)

I will be testing using the following criteria (The crossed out points are completed in previous progress blogs):

  • Check how many inline assembler code already exist
  • Check how many dedicated/ separate assembler files exist in the package (File Extension: s or S)
  • Check how many files use the C programming language (File Extension: c)
  • Use a profiling tool to check how optimized the program is in the current build package (Tools such as: gprof, stap, etc.) (This blog’s topic)
  • Build the package and check the default results by testing the build against a password file. The file will be made using a Microsoft Excel formula.
  • Check that the results are relatively consistent. (This will be from program compile time, program run time, and the program file size)
  • Introduce a minor or major change that will optimize or reduce performance of the original program. This is done by changing the character(s) of the password file generated by the Microsoft Excel formula.
  • Compare the results of the changes and the original program results.
  • Building/ Testing the original argon2 package

This post will be talking about profiling tool to evaluate the optimization/ performance level of the argon2 package on an Aarch64 Fedora 28 Linux operating system. The system has 8 CPU cores; Cortex-A57 Model.

I will be using the profiling tool named oprofile. This tool is open source and there is a manual that comes with the software. The oprofile site is in the following link: http://oprofile.sourceforge.net/news/.

I can access the manual using this command in Linux:

man operf

oprofile is a tool that tracks the performance of the system and generates a report.

The basic command template for oprofile is:

operf [ options ] [ --system-wide | --pid <pid> | [ command [ args ]]

Here are some of the options for oprofile:

–system-wide option in the operf command will require elevated permissions, such as administrative permissions, before this software will run.  This option will make the software sample performance of the entire system. It is also recommended to run the command while in the root directory (the / directory). The reason to run the command in the root directory is to avoid normal users from storing the sampled data in the current user’s directory.

–pid <pid> option is for sampling by process id(pid). All operations performing on the system will be using CPU cycles and require the CPU to perform a task, and each task are separated into process id. This option will continuously sample the running process until it is stop by the user, either by CTRL+C or killing the process using the following command:

kill -SIGINT <operf-PID>

[Command [args]] is simply any command that you want to sample the performance of.

I will perform the profiling in my current directory as I do not want to save the profiling sampled data in the root directory.

My current directory is:

/home/username/projects/phc-winner-argon2/

I will be running the command with these requirements:

  • Run as a root user
  • Use the Microsoft Excel generated password(From my “Project: Part2 – Initial build testing on argon2 package using C and Assembler language(Progress 3)” blog)
Ch(329nE
echo -n "password" | ./argon2 somesalt -t 2 -m 16 -p 4 -l 24

This will be the command to run oprofile:

 sudo operf echo -n "Ch(329nE" | ./argon2 somesalt -t 2 -m 16 -p 4 -l 24

Here is an explanation of what the command does:

  • The sudo command will run the command as a root elevated user.
  • operf is oprofile command.
  • echo is to send the following double enclosed quotes to the host as standard output.
  • -n is the option in the echo command to remove the trailing newline.
  • “Ch(329nE” is the Microsoft Excel Generated Password I generated in my “Project: Part2 – Initial build testing on argon2 package using C and Assembler language(Progress 3)” blog.
  • | is a piping command to send the left side commands as input for the right side of the piping command.
  • ./argon2 is the command to run the built argon2 program in my current directory of /home/username/projects/phc-winner-argon2/.
  • somesalt is used as the salt. The definition that I found for salt is from Wikipedia(https://en.wikipedia.org/wiki/Argon2). Salt is used for hashing a password.
  • -t 2 is number of iterations to perform.
  • -m 16 is the memory size in kilobytes.
  • -p 4 is for parallelism. This sets the number of threads this process will use.
  • -l 24 sets the hash output length by number of bytes.

Note:

Initial Testing

This was my output:

operf: Profiler started

Profiling done.
Type:           Argon2i
Iterations:     2
Memory:         65536 KiB
Parallelism:    4
Hash:           6271154a35ed64acc752368ca97460c0e295a404d0ba0d2a
Encoded:        $argon2i$v=19$m=65536,t=2,p=4$c29tZXNhbHQ$YnEVSjXtZKzHUjaMqXRgwOKVpATQug0q
0.295 seconds
Verification ok

I will now test using the built-in timer command in Linux. The following is the command:

sudo time operf echo -n "Ch(329nE" | ./argon2 somesalt -t 2 -m 16 -p 4 -l 24

Note: Notice the positioning of the commands. I have placed a time command after the sudo command and in front of the operf command.

This will run the time command to keep track of time that the user and system require to run.

This is my output:

sudo time operf echo -n "Ch(329nE" | ./argon2 somesalt -t 2 -m 16 -p 4 -l 24
operf: Profiler started

Profiling done.
0.03user 0.14system 0:00.21elapsed 82%CPU (0avgtext+0avgdata 5196maxresident)k
0inputs+48outputs (3major+2051minor)pagefaults 0swaps
Type: Argon2i
Iterations: 2
Memory: 65536 KiB
Parallelism: 4
Hash: 6271154a35ed64acc752368ca97460c0e295a404d0ba0d2a
Encoded: $argon2i$v=19$m=65536,t=2,p=4$c29tZXNhbHQ$YnEVSjXtZKzHUjaMqXRgwOKVpATQug0q
0.296 seconds
Verification ok

There is a new line in the output highlighted in red. This outputs the recorded time the command took to run.

  • 0.03user is the time the command took to send to the system(in seconds).
  • 0.14system is the time it took for the system to run the command through the CPU (in seconds).
  • 0:00.21elapsed is the time to return the results to the host(my console to see the output)

A new directory should be created called oprofile_data under the current directory.

This can be seen by running the list directory command:

ls

I will now check the report generated by the previous oprofile command.

The command is:

opreport

This is my result:

Using /home/username/projects/phc-winner-argon2/oprofile_data/samples/ for samples directory.
CPU: ARM Cortex-A57, speed 750 MHz (estimated)
Counted CPU_CYCLES events (Cycle) with a unit mask of 0x00 (No unit mask) count 100000
CPU_CYCLES:100000|
samples| %|
------------------
30 100.000 echo
CPU_CYCLES:100000|
samples| %|
------------------
26 86.6667 kallsyms
3 10.0000 ld-2.27.so
1 3.3333 libc-2.27.so

Initial Testing (Continued)

I will be running the profiling tool oprofile two more times to get a baseline.

oprofile test 2

Command:

sudo time operf echo -n "Ch(329nE" | ./argon2 somesalt -t 2 -m 16 -p 4 -l 24

Result:

operf: Profiler started

Profiling done.
0.05user 0.12system 0:00.25elapsed 69%CPU (0avgtext+0avgdata 5184maxresident)k
0inputs+32outputs (2major+2050minor)pagefaults 0swaps
Type: Argon2i
Iterations: 2
Memory: 65536 KiB
Parallelism: 4
Hash: 6271154a35ed64acc752368ca97460c0e295a404d0ba0d2a
Encoded: $argon2i$v=19$m=65536,t=2,p=4$c29tZXNhbHQ$YnEVSjXtZKzHUjaMqXRgwOKVpATQug0q
0.297 seconds
Verification ok

Check Report(using command: opreport):

Using /home/username/projects/phc-winner-argon2/oprofile_data/samples/ for samples directory.
CPU: ARM Cortex-A57, speed 750 MHz (estimated)
Counted CPU_CYCLES events (Cycle) with a unit mask of 0x00 (No unit mask) count 100000
CPU_CYCLES:100000|
samples| %|
------------------
25 100.000 echo
CPU_CYCLES:100000|
samples| %|
------------------
21 84.0000 kallsyms
4 16.0000 ld-2.27.so
oprofile test 3

Command:

sudo time operf echo -n "Ch(329nE" | ./argon2 somesalt -t 2 -m 16 -p 4 -l 24

Result:

operf: Profiler started

Profiling done.
0.02user 0.15system 0:00.23elapsed 75%CPU (0avgtext+0avgdata 5132maxresident)k
0inputs+40outputs (3major+2050minor)pagefaults 0swaps
Type: Argon2i
Iterations: 2
Memory: 65536 KiB
Parallelism: 4
Hash: 6271154a35ed64acc752368ca97460c0e295a404d0ba0d2a
Encoded: $argon2i$v=19$m=65536,t=2,p=4$c29tZXNhbHQ$YnEVSjXtZKzHUjaMqXRgwOKVpATQug0q
0.297 seconds
Verification ok

Check Report(using command: opreport):

Using /home/username/projects/phc-winner-argon2/oprofile_data/samples/ for samples directory.
CPU: ARM Cortex-A57, speed 750 MHz (estimated)
Counted CPU_CYCLES events (Cycle) with a unit mask of 0x00 (No unit mask) count 100000
CPU_CYCLES:100000|
samples| %|
------------------
25 100.000 echo
CPU_CYCLES:100000|
samples| %|
------------------
21 84.0000 kallsyms
3 12.0000 ld-2.27.so
1 4.0000 libc-2.27.so

I realized that the opreport command was not using the option for a detailed output.

This is the command to show a detailed report:

opreport -d

This is my result:

Using /home/username/projects/phc-winner-argon2/oprofile_data/samples/ for samples directory.
CPU: ARM Cortex-A57, speed 750 MHz (estimated)
Counted CPU_CYCLES events (Cycle) with a unit mask of 0x00 (No unit mask) count 100000
vma samples % image name symbol name
000096c0 1 25.0000 ld-2.27.so _dl_lookup_symbol_x
00009710 1 100.000
0000ac88 1 25.0000 ld-2.27.so _dl_relocate_object
0000b170 1 100.000
00008ce0 1 25.0000 ld-2.27.so do_lookup_x
00008ebc 1 100.000
00109ea8 1 25.0000 libc-2.27.so _dl_addr
00109f74 1 100.000

I can see two image names and 4 symbol names used for the argon2 program.

Image Names: ld-2.27.so, libc-2.27.so

Symbol Names:_dl_lookup_symbol_x, _dl_relocate_object, do_lookup_x, _dl_addr

Each of the symbol name takes 25% of the 100000 CPU cycles.

The CPU spends 25% of the 100000 CPU cycles to lookup the symbol to hash; 25% for relocating the symbol x; 25% for looking up the symbol x again; 25% to set the output into an address. This is the report generated when profiling using a 75% CPU load/ usage of the system.

(To be continued at Progress 5 blog)

 

 

 

 

 

 

 

Leave a comment