Cybex-P Archive Module

The Cybex-P Archive Module is the next step in the process of threat data after being previously pushed to the cache data lake by the Cybex-P API Module. This module has the prime responsibility to pull data from the cache data lake and parse it into a TAHOE object. Once parsed, it is then pushed and stored to it’s database server in which it will await to be used by the Cybex-P Analytics Module.

The Cybex-P Archive Module is compromised of two modules that work in cohesion with each other on different servers. The Archive Cluster and the Archive Database.

graph LR A(Cache Data Lake) --data gets pulled--> B B[Archive Cluster] -- Converted to TAHOE and Pushed--> C C[Archive Database]
  • Archive Cluster

    • Decryption of data

    • Threat data to TAHOE object conversion

  • Archive Database

    • Storage of the newly parse TAHOE object

Cybex-P Archive Module Repositories

  • archive [Source Code]

    • Data decryption

    • Data Archiving

    • Module Initialization and Execution

  • parsemain [Source Code]

    • Data Parsing

      • Turning a piece of data into a Raw TAHOE object

archive

archive is the core source code of the Cybex-P Archive Module since alot of the main functionality takes place here. Everything from data decryption to archiving takes place in this module.

Key Functions:

  • decrypt_file(file_in, fpriv_name = “priv.pem”)

  • archive_one(event, cache_coll, fs, pkey_fp, parsemain_func)

  • archive(cacheconfig, force_process = False, exec_once = False)

  • decrypt_file():

    • This function is only used by archive_one() and is used to decrypt threat data that has been previously encrypted with the public key of the Cybex-P Archive Module. When called, a file of the set of threat data is passed in along with the private key of the of the module (by default, the private key is under “priv.pem”). decrypt_file() go ahead and validate the file to ensure correct format. Once validated, the private key is pulled and the session key, nonce, tags, and ciphertext are extract from the private key. The session key is then decrypted with the private RSA key and the data gets decrypted with the AES session key. The threat data is then returned in utf-8 formatting.

  • archive_one():

    • archive_one() is responsible for taking a single event provided to it and administrating decryption and TAHOE parsing methods to process the data. Like the decrypt_file() function, this function is only used by one other function: archive() [See below]. This function utilizes decrypt_file() in its definition take the provided byte data and return the raw threat data. A call is then made to the parsemain() function (defined as parsemain_func within the definition of this function) to take the provided threat data and parse it into a raw TAHOE object. When the data proceeds to get parsed, the result of the parsing will come back in 1 of 3 states administrated by parsemain():

      • SUCCESS - The parsing was successful and will be updated to the Archive module database with its reference hash

      • NOT_SUPPORTED - An attempt at parsing the passed data was made but the type of data that was passed is an unsupported sub-type. The data will be sent to the database with the “skip” set to True.

      • ERROR - An attempt at parsing the passed data was made but an error was caught during the the function call. With an error case, a object of Nonetype is returned and updated to the database with the “skip” flag set to true.

  • archive():

    • archive() Is the main controlling function of the archive source code. When executed, the function while go into an infinite while loop that will consistently be checking for a provided cache configuration and proceed to set the path to the Archive database and tunnel to grab data from the cache data lake.

    • The rest of archive() is then an infinitely running while loop that is consistently querying the the cache data lake for available raw threat data provided from the Cybex-P API Module. However, to save resources and optimize performance, archive() has a an exponential backoff method that is ran after every query attempt.

    • The exponential backoff method is ran after every query and is meant to sleep the system for a certain amount of seconds in order to decrease the rate of querying in order to keep a stable rate of querys. If success is achieved after a single query and archive attempt, the amount of accounted failed attempts is reset to 0.

    • n_failed_attempts -> The number of failed attempts so far after every recent failed query.

    • exponential_backoff(n_failed_attempts)

    • E.G: If n_failed_attempts is currently set to 3, meaning 3 querys were made and ended up being failed attempts, the exponential backoff function sleeps the archive module for a certain amount of seconds based on the following function:

    • s = min(3600, (2 ** n) + (random.randint(0, 1000) / 1000)) time.sleep(s) #Where n = n_failed_attempt

    • Otherwise, if a successful query was made, n_failed_attempts is set back to 0 which will lead the archive module to increasing the rate of query and archive attempts again.

parsemain

The parsemain source code is a key sub-component that is utilized by archive to handle the responsibility of parsing threat data to the raw TAHOE objects that will eventually be used by the Cybex-P Analytics Module.

Key functions:

  • parsemain(typtag, org id, timezone, data)

  • parsemain():

    • parsemain is the sole function that is responsible for taking a single event of threat data and parsing it into a raw TAHOE object. This function is only used by archive_one() from the archive source code.

    • When called, the typtag will be used to pull the raw sub type from the list of available sub types within the Cybex-P database. if typtag is valid and we receive a valid raw_sub_type, we can then go ahead and call TAHOEs Raw() and parse the following into a TAHOE instance that will get posted to the Archive Datababase:

    • raw = Raw(raw_sub_type, data, orgid, timezone)

    • Recall to the previous explanation within archive_one() above; based on the outcome of a valid raw sub type, parsemain will administrate 1 of 3 states to the function call:

      • ParseState.SUCCESS - a valid raw sub type was found and the data was successfully parsed into a raw TAHOE object.

      • ParseState.NOT_SUPPORTED - an unknown typtag was supplied, therefore there was now available raw sub type.

      • ParseState.ERROR - An error happened and was caught

Miscellaneous

  • Private Key

    • Private key of the cybex-p archive module

    • TAKE STEPS TO ENSURE THAT CANNOT BE EASILY ACCESSIBLE/READ

  • cybexp-archive.service

    • systemd service file that maintains the cybex-p archive module