Before running a module, make sure that the base and the module images are built.

Python and pip version

This project is built using Python3, therefore any python command specified in this guide refers to python3. If in doubt whether the python command is executing the right version it is better to explicitelly specify the version the following way python3 inteast of using python. Same applies with the python package magaer pip, use pip3 as teh command.

Configuration file

Before you can use each module, you must create a configuration file. A sample configuration file is provided with each module. Each sample configuration file has comments describing what fields do. Here we will talk about the more important fields.

After creating and configuring the configuration file, one is ready to run the module. When executing the wrapper scripts, pass the configuration file as an argument. For more help, see each wrapper script’s help output for complete and more accurate usage information.

Server URLS

The format is a follows:

procotol://host:port

Note

There is no trailing /

The protocol can be either http or https. By default, the protocol will be http; one can configure https by adding a reverse proxy like Nginx in front of each server.

Setting up the encryption policy

Policy matching is done according to the Collector’s configuration file. One Specifies rules to match fields in each record. Each rule has an associated CPABE policy that will be applied to the field if matching. If no rule matches, the default policy is applied. One can use an exact match or regex match for a field. Encryption is done field-wise. Aside from encryption policy, the policy administrator must also specify what field will be indexed for encrypted searching. This can be done in a similar fashion as encryption rules.

Please use the -a on the collector wrapper scrip to figure out available attributes. Then one can use and, or, parenthesis to create the policy. Example of policy: "DEFAULT and RESEARCH". With this example policy, the person querying must be "DEFAULT and RESEARCH"to decrypt that field.

You should provide the configuration file in the YAML language. Please use the provided template to avoid any misconfiguration. Here is a link to a YAML tutorial.

Warning

A syntactically correct configuration with wrong semantics can yield incorrect policy parsing, resulting in incorrectly encrypted data. We advised to run on Collector with the debug, process_only_one, and show_enc_policy flags with test data to confirm correct policy matching, after which you can disable.

More on Encryption policy

To avoid confusion, here is a layout with data types for the policy portion of the configuration (not in official YAML format): TODO FIX INDENTING!!

policy:
   default:
      # if no other policy matches below, this will be used
      # if an index rules matches but no exact or regex rules match
      #  then this will be used as CPABE policy
      abe_pol: String
   
   index:
   # this will exact match fields to create searchable encrypted fields
      # this will match exact strings with fields
      exact: # list of "match: string"
         - match: String
         - match: String
   
   # this will match fields with regex to create searchable encrypted fields
   # this is a list of "match: string", where the string is the regex
      regex: 
         - match: string
         - match: string

   # these rules are for exact matching of each field
   exact:   # list of exact matching strings and their CPABE policy (or remove flag)
      - match: string
        abe_pol: string
      - match: string
        abe_pol: string

   # these rules are for regex matching of each field
   regex: # list of regex strings and their CPABE policy (or remove flag)
      - match: string
        abe_pol: string
      - match: string
        remove: bool # set to True else you will need to specify "abe_pol"

Run

Execute the respective wrapper script at the root of the module. Each wrapper script supports the -h and the --help flag, which will output their respective help/usage information.

for example, the query help information looks as follows:

$ ./query.sh -h
Create and run a docker container to query the encrypted backend. By default the container is left behind,
vacuum recommended. Build at least once or when code has changed.

Usage:
  ./query.sh [Options] config-file query output-file
  ./query.sh --build --build-only

Positional arguments:
  config-file           configuration file yaml
  query                 string representing any query value
  output-file           file where the unencrypted information is placed

Options arguments:
  -b, --build           build docker image
  --build-only          exit after building
  -s, --shell           run shell, ignores most flags
  -f, --from-time EPOCH integer epoch used as > filter for the query
  -t, --to-time EPOCH   integer epoch used as < filter for the query
  -l, --left-inclusive  modify the --from-time to be inclusive >=
  -r, --right-inclusive modify the --to-time to be inclusive <=
  -c, --vacuum          remove container upon exit. If more than one container
                        of this type exists, it will remove all
  -h, --help            print this help

To run the collector:

./collector.sh config.yaml

when querying one can run as follows:

./query.sh config.yaml '123.123.123.123' output_file
# view output 
cat output_file

It is recommended to run Collector, KMS, and backend in a detached manner, for example, using Tmux.

The KMS and backend can be run as follows:

./kms.sh config.yaml
./backend.sh config.yaml

The Collector and Query use a simple Docker run procedure (not in daemon mode), so ctrl+c or a signal can terminate the process, just like any other. This makes it easy to incorporate into scripts and systemd timers and services.

Permission denied

If bash scripts are not running because of permissions, please make them executable with sudo chmod +x <script-name.sh>. Alternatively run them using bash or sh, like so sh <script-name.sh>.

Key Files Problems

If for some reason the module is stopped (either by a signal or error) while key files are bing fetched, it will produce bad key files. Next time the code runs it will try to read from said key files before trying to get them from the KMS server. This condition results in corrupted key files, which will lead to either the program crashing or to data that is encrypted incorrectly. To fix this dispose of the keys from secrets folder and run again (will fetch from KMS server). To prevent this, do not early terminate module.

Running KMS and Backend as daemons

The KMS and Backend modules use docker-compose. Docker-compose handles automatic start on boot (as specified in the docker-compose.yaml file) as daemons. To achieve this, we can run each module (KMS, backend), wait for the script to deploy the containers and initialize them correctly and then reboot the system.

Alternative if we are not going to reboot the system for the time being, after issuing ./kms.sh config.yaml or ./backend.sh config.yaml one can simply CTRL+c (after each script is done initializing the containers) to temporarily stop the services and free the terminal. Since the Docker service runs at boot, our services will automatically start at boot next time that happens. Since we stopped out services and are not rebooting for the time being, we can start them back up manually and in the background with sudo docker-compose start (this has to be executed from within each module’s root directory).

Retrieving logs of KMS and backend modules

if they are running in the foreground, it is as easy as looking in the terminal.

If they are running in the background, one can use the sudo docker logs <container-id> command to see its logs. One can find the container id of a particular container/module with sudo docker ps --all command.

Checking the status of the containers

To check the status of a module, one can issue the following command in the terminal:

sudo docker ps --all

This command will show all containers in the system, including ones that are currently not running. This command is helpful to determine if a container/module is running or not and if it has suddenly stopped/exited (and how long that happened).

More on deploying Collector-client & Collector

The Collector is a multi-part module. The Collector itself is the encryption module; the gatherers ship the information to the Collector encryption module to be encrypted and sent over to the backend. To run the Collector correctly, one has to set up the encryption policy.

One has to start the Collector using collector.sh, then one can run multiple collector clients. Another good reason for separating the encryption portion from the data collection process is that it can allow multiple sources to be encrypted under the same policy. Collectors-clients are meant to be easy to code to fit the user/organization’s liking. The only requirement that client collectors have is that they must connect to the Collector via a TCP socket and send one JSON record per connection. We have provided a sample Collector client inside the Collector’s root directory. This client grabs JSON records from a file and sends them to the Collector/encryption module.

The dependencies for this client are are python3, python3-venv, and python modules specified in requirements.txt (install as specified below).

Install collector dependencies(for Debian based, e.g., Ubuntu):

sudo apt install -y python3 python3-venv

To run the collector client:

cd collector-client

# make python environment (do once)
python3 -m venv env 

# then source environment, do when wanting to run client
#  or to install dependencies
source env/bin/activate
# note (env) at the left-hand side of the terminal prompt

# install dependencies, do once
pip install -r requirements.txt 

# then run client
python3 collector-client.py --host <collector-host> --port <collector-port> <json-file>
# when running a collector on the same machine this script will default by using default ports and hostname
python3 collector-client.py <json-file>

Note: collector-client.py has a flag --json-key, it make it so it only sends that spesific key-data pair. For example:

{"_id": {"$oid": "5d659c2d34ecea6c769ec9ff"}, "itype": "raw", "data": {"session": "17e8b9e02971", "geoip": {"ip": "165.227.91.185", "latitude": 40.7214, "longitude": -74.0052, "location": {"lat": 40.7214, "lon": -74.0052}, "continent_code": "NA", "timezone": "America/New_York", "country_code3": "US", "country_code2": "US", "postal_code": "10013", "country_name": "United States", "city_name": "New York", "dma_code": 501, "region_name": "New York", "region_code": "NY"}, "source": "/home/cowrie/cowrie/var/log/cowrie/cowrie.json", "@timestamp": "2019-08-27T20:51:18.993Z", "tags": ["beats_input_codec_plain_applied", "geoip", "beats_input_codec_json_applied"], "timestamp": "2019-08-27T20:51:16.595926Z", "host": {"name": "ssh-peavine"}, "duration": 120.02226901054382, "eventid": "cowrie.session.closed", "msg": "Connection lost after 120 seconds", "beat": {"hostname": "ssh-peavine", "version": "6.5.4", "name": "ssh-peavine"}, "offset": 3165054, "@version": "1", "src_ip": "165.227.91.185", "prospector": {"type": "log"}, "sensor": "ssh-peavine"}, "orgid": "identity--f27df111-ca31-4700-99d4-2635b6c37851", "timezone": "US/Pacific", "sub_type": "x-unr-honeypot", "_hash": "ce1125ef568028d65ad444dd9a6dacf6a860b7a3d0a43ebabc0b03fb258c80db", "uuid": "raw--8d30f29e-9285-4790-ac57-bbba5aa56fbb", "filters": ["filter--ad8c8d0c-0b25-4100-855e-06350a59750c"]}

The data that we are after is inside the data key. Therefore we would use the tool as follows:

python3 collector-client.py --json-key data <json-file>

this will result in only sending the following:

{"session": "17e8b9e02971", "geoip": {"ip": "165.227.91.185", "latitude": 40.7214, "longitude": -74.0052, "location": {"lat": 40.7214, "lon": -74.0052}, "continent_code": "NA", "timezone": "America/New_York", "country_code3": "US", "country_code2": "US", "postal_code": "10013", "country_name": "United States", "city_name": "New York", "dma_code": 501, "region_name": "New York", "region_code": "NY"}, "source": "/home/cowrie/cowrie/var/log/cowrie/cowrie.json", "@timestamp": "2019-08-27T20:51:18.993Z", "tags": ["beats_input_codec_plain_applied", "geoip", "beats_input_codec_json_applied"], "timestamp": "2019-08-27T20:51:16.595926Z", "host": {"name": "ssh-peavine"}, "duration": 120.02226901054382, "eventid": "cowrie.session.closed", "msg": "Connection lost after 120 seconds", "beat": {"hostname": "ssh-peavine", "version": "6.5.4", "name": "ssh-peavine"}, "offset": 3165054, "@version": "1", "src_ip": "165.227.91.185", "prospector": {"type": "log"}, "sensor": "ssh-peavine"}

Collector

The Collector index creation works well on the first level of the JSON structure(make sure that indices are created only on the first level). It can encrypt on more levels on the tree, but index retrieval/querying of the data might not be deterministic(due to sorting of the JSON structure) if the index was multilevel; therefore, it may result in unsearchable data.

Force key retrieval

If for some reason keys are not working it may be necessary for keys to be fetched again from the KMS server (this applies to collector and query), one can:

  1. Stop the module

  2. Delete the contents from secrets folder for that module

  3. Restart the module

Keys folder

Do not delete the actual folder as the folder structure is important.

KMS keys

Do not delete keys from KMS secret folder. If system keys are deleted and there happens to be encrypted data in the backend and new data is submitted with different set of keys (KMS regenerates new keys upon not finding them) it will create incompatible data.

Collector & collector-client status indicators

In the least verbose mode collector and collector-client print letters indicating their current state. It is important to note that both of these programs are mutithreaded so output will not be in any guaranteed order. The indicators are a follows:

collector:

Indicator

Description

N

A worked thread established a new connection with a collector-client

R

A worked thread read data from client successfully

E

A worked thread encrypted the data successfully

P

A worker thread posted encrypted data to backend successfully

F

A worker thread failed to posted encrypted data to backend, will retry

L

Worker thread gave up on posting encrypted data, too many fail attempts

collector-client:

Indicator

Description

i

A new piece of data has been added to the work queue

F

All the data has been added to the work queue

.

A piece of data was successfully sent to the collector

D

A worked thread has exited (usually after finishing all its work)

Date format

Date format can be any of the major date formats. This works when submitting data for encryption and when quering data using date ranges, not restricted to just UNIX epoch. This includes things like timezones as well. For more information on supported date format visit the dateutil parser documentation.

Troubleshooting

  • If an error regarding unix:///var/run/docker.sock or permission denied, please make sure docker daemon/service is running.

  • If a module can not connect to another, make sure that the following configurations are correct:

    • the bind interface of the server

    • the address of the server in the client configuration

    • for example, if the server is bound to its IP, it can not be accessed via localhost, even if they are the same machine. It is recommended to bind the servers to a local interface like “127.0.0.1:<port> so that they are only accessible locally and the use of proxypass using a reverse proxy (line Nginx) bound to the public interface, this would allow you to enable HTTPS.

      • the address format (for clients) is as follows:

      • protocol://hostname:port

      • no slash at the end