Before running a module, make sure that the base and the module images are built. ```{admonition} Python and pip version :class: note This project is built using Python3, therefore any python command specified in this guide refers to python3. If in doubt whether the `python` command is executing the right version it is better to explicitelly specify the version the following way `python3` inteast of using `python`. Same applies with the python package magaer pip, use `pip3` as teh command. ``` # Configuration file Before you can use each module, you must create a configuration file. A sample configuration file is provided with each module. Each sample configuration file has comments describing what fields do. Here we will talk about the more important fields. After creating and configuring the configuration file, one is ready to run the module. When executing the wrapper scripts, pass the configuration file as an argument. For more help, see each wrapper script's help output for complete and more accurate usage information. ## Server URLS The format is a follows: ``` procotol://host:port ``` ```{admonition} Note :class: warning There is no trailing `/` ``` The protocol can be either `http` or `https`. By default, the protocol will be `http`; one can configure `https` by adding a reverse proxy like Nginx in front of each server. ## Setting up the encryption policy Policy matching is done according to the Collector's configuration file. One Specifies rules to match fields in each record. Each rule has an associated CPABE policy that will be applied to the field if matching. If no rule matches, the default policy is applied. One can use an exact match or regex match for a field. Encryption is done field-wise. Aside from encryption policy, the policy administrator must also specify what field will be indexed for encrypted searching. This can be done in a similar fashion as encryption rules. Please use the `-a` on the collector wrapper scrip to figure out available attributes. Then one can use `and`, `or`, `parenthesis` to create the policy. Example of policy: `"DEFAULT and RESEARCH"`. With this example policy, the person querying must be `"DEFAULT and RESEARCH"`to decrypt that field. You should provide the configuration file in the YAML language. Please use the provided template to avoid any misconfiguration. Here is a link to a [YAML tutorial](https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html). ```{admonition} Warning :class: warning A syntactically correct configuration with wrong semantics can yield incorrect policy parsing, resulting in incorrectly encrypted data. We advised to run on Collector with the `debug`, `process_only_one`, and `show_enc_policy` flags with test data to confirm correct policy matching, after which you can disable. ``` ## More on Encryption policy To avoid confusion, here is a layout with data types for the policy portion of the configuration (not in official YAML format): TODO FIX INDENTING!! ``` policy: default: # if no other policy matches below, this will be used # if an index rules matches but no exact or regex rules match # then this will be used as CPABE policy abe_pol: String index: # this will exact match fields to create searchable encrypted fields # this will match exact strings with fields exact: # list of "match: string" - match: String - match: String # this will match fields with regex to create searchable encrypted fields # this is a list of "match: string", where the string is the regex regex: - match: string - match: string # these rules are for exact matching of each field exact: # list of exact matching strings and their CPABE policy (or remove flag) - match: string abe_pol: string - match: string abe_pol: string # these rules are for regex matching of each field regex: # list of regex strings and their CPABE policy (or remove flag) - match: string abe_pol: string - match: string remove: bool # set to True else you will need to specify "abe_pol" ``` # Run Execute the respective wrapper script at the root of the module. Each wrapper script supports the `-h` and the `--help` flag, which will output their respective help/usage information. for example, the query help information looks as follows: ``` $ ./query.sh -h Create and run a docker container to query the encrypted backend. By default the container is left behind, vacuum recommended. Build at least once or when code has changed. Usage: ./query.sh [Options] config-file query output-file ./query.sh --build --build-only Positional arguments: config-file configuration file yaml query string representing any query value output-file file where the unencrypted information is placed Options arguments: -b, --build build docker image --build-only exit after building -s, --shell run shell, ignores most flags -f, --from-time EPOCH integer epoch used as > filter for the query -t, --to-time EPOCH integer epoch used as < filter for the query -l, --left-inclusive modify the --from-time to be inclusive >= -r, --right-inclusive modify the --to-time to be inclusive <= -c, --vacuum remove container upon exit. If more than one container of this type exists, it will remove all -h, --help print this help ``` To run the collector: ```bash ./collector.sh config.yaml ``` when querying one can run as follows: ```bash ./query.sh config.yaml '123.123.123.123' output_file # view output cat output_file ``` It is recommended to run Collector, KMS, and backend in a detached manner, for example, using Tmux. The KMS and backend can be run as follows: ```bash ./kms.sh config.yaml ``` ```bash ./backend.sh config.yaml ``` The Collector and Query use a simple Docker run procedure (not in daemon mode), so `ctrl+c` or a signal can terminate the process, just like any other. This makes it easy to incorporate into scripts and systemd timers and services. ```{admonition} Permission denied :class: note If bash scripts are not running because of permissions, please make them executable with `sudo chmod +x `. Alternatively run them using `bash` or `sh`, like so `sh `. ``` ```{admonition} Key Files Problems :class: caution If for some reason the module is stopped (either by a signal or error) while key files are bing fetched, it will produce bad key files. Next time the code runs it will try to read from said key files before trying to get them from the KMS server. This condition results in corrupted key files, which will lead to either the program crashing or to data that is encrypted incorrectly. To fix this dispose of the keys from `secrets` folder and run again (will fetch from KMS server). To prevent this, do not early terminate module. ``` # Running KMS and Backend as daemons The KMS and Backend modules use docker-compose. Docker-compose handles automatic start on boot (as specified in the docker-compose.yaml file) as daemons. To achieve this, we can run each module (KMS, backend), wait for the script to deploy the containers and initialize them correctly and then reboot the system. Alternative if we are not going to reboot the system for the time being, after issuing `./kms.sh config.yaml` or `./backend.sh config.yaml` one can simply `CTRL+c` (after each script is done initializing the containers) to temporarily stop the services and free the terminal. Since the Docker service runs at boot, our services will automatically start at boot next time that happens. Since we stopped out services and are not rebooting for the time being, we can start them back up manually and in the background with `sudo docker-compose start` (this has to be executed from within each module's root directory). ## Retrieving logs of KMS and backend modules if they are running in the foreground, it is as easy as looking in the terminal. If they are running in the background, one can use the `sudo docker logs ` command to see its logs. One can find the container id of a particular container/module with `sudo docker ps --all` command. # Checking the status of the containers To check the status of a module, one can issue the following command in the terminal: ```bash sudo docker ps --all ``` This command will show all containers in the system, including ones that are currently not running. This command is helpful to determine if a container/module is running or not and if it has suddenly stopped/exited (and how long that happened). # More on deploying Collector-client & Collector The Collector is a multi-part module. The Collector itself is the encryption module; the gatherers ship the information to the Collector encryption module to be encrypted and sent over to the backend. To run the Collector correctly, one has to set up the encryption policy. One has to start the Collector using `collector.sh`, then one can run multiple collector clients. Another good reason for separating the encryption portion from the data collection process is that it can allow multiple sources to be encrypted under the same policy. Collectors-clients are meant to be easy to code to fit the user/organization's liking. The only requirement that client collectors have is that they must connect to the Collector via a TCP socket and send one JSON record per connection. We have provided a sample Collector client inside the Collector's root directory. This client grabs JSON records from a file and sends them to the Collector/encryption module. The dependencies for this client are are `python3`, `python3-venv`, and python modules specified in `requirements.txt` (install as specified below). Install collector dependencies(for Debian based, e.g., Ubuntu): ```bash sudo apt install -y python3 python3-venv ``` To run the collector client: ```bash cd collector-client # make python environment (do once) python3 -m venv env # then source environment, do when wanting to run client # or to install dependencies source env/bin/activate # note (env) at the left-hand side of the terminal prompt # install dependencies, do once pip install -r requirements.txt # then run client python3 collector-client.py --host --port # when running a collector on the same machine this script will default by using default ports and hostname python3 collector-client.py ``` Note: `collector-client.py` has a flag `--json-key`, it make it so it only sends that spesific key-data pair. For example: ```json {"_id": {"$oid": "5d659c2d34ecea6c769ec9ff"}, "itype": "raw", "data": {"session": "17e8b9e02971", "geoip": {"ip": "165.227.91.185", "latitude": 40.7214, "longitude": -74.0052, "location": {"lat": 40.7214, "lon": -74.0052}, "continent_code": "NA", "timezone": "America/New_York", "country_code3": "US", "country_code2": "US", "postal_code": "10013", "country_name": "United States", "city_name": "New York", "dma_code": 501, "region_name": "New York", "region_code": "NY"}, "source": "/home/cowrie/cowrie/var/log/cowrie/cowrie.json", "@timestamp": "2019-08-27T20:51:18.993Z", "tags": ["beats_input_codec_plain_applied", "geoip", "beats_input_codec_json_applied"], "timestamp": "2019-08-27T20:51:16.595926Z", "host": {"name": "ssh-peavine"}, "duration": 120.02226901054382, "eventid": "cowrie.session.closed", "msg": "Connection lost after 120 seconds", "beat": {"hostname": "ssh-peavine", "version": "6.5.4", "name": "ssh-peavine"}, "offset": 3165054, "@version": "1", "src_ip": "165.227.91.185", "prospector": {"type": "log"}, "sensor": "ssh-peavine"}, "orgid": "identity--f27df111-ca31-4700-99d4-2635b6c37851", "timezone": "US/Pacific", "sub_type": "x-unr-honeypot", "_hash": "ce1125ef568028d65ad444dd9a6dacf6a860b7a3d0a43ebabc0b03fb258c80db", "uuid": "raw--8d30f29e-9285-4790-ac57-bbba5aa56fbb", "filters": ["filter--ad8c8d0c-0b25-4100-855e-06350a59750c"]} ``` The data that we are after is inside the `data` key. Therefore we would use the tool as follows: ```bash python3 collector-client.py --json-key data ``` this will result in only sending the following: ```json {"session": "17e8b9e02971", "geoip": {"ip": "165.227.91.185", "latitude": 40.7214, "longitude": -74.0052, "location": {"lat": 40.7214, "lon": -74.0052}, "continent_code": "NA", "timezone": "America/New_York", "country_code3": "US", "country_code2": "US", "postal_code": "10013", "country_name": "United States", "city_name": "New York", "dma_code": 501, "region_name": "New York", "region_code": "NY"}, "source": "/home/cowrie/cowrie/var/log/cowrie/cowrie.json", "@timestamp": "2019-08-27T20:51:18.993Z", "tags": ["beats_input_codec_plain_applied", "geoip", "beats_input_codec_json_applied"], "timestamp": "2019-08-27T20:51:16.595926Z", "host": {"name": "ssh-peavine"}, "duration": 120.02226901054382, "eventid": "cowrie.session.closed", "msg": "Connection lost after 120 seconds", "beat": {"hostname": "ssh-peavine", "version": "6.5.4", "name": "ssh-peavine"}, "offset": 3165054, "@version": "1", "src_ip": "165.227.91.185", "prospector": {"type": "log"}, "sensor": "ssh-peavine"} ``` ```{admonition} Collector :class: warning The Collector index creation works well on the first level of the JSON structure(make sure that indices are created only on the first level). It can encrypt on more levels on the tree, but index retrieval/querying of the data might not be deterministic(due to sorting of the JSON structure) if the index was multilevel; therefore, it may result in unsearchable data. ``` # Force key retrieval If for some reason keys are not working it may be necessary for keys to be fetched again from the KMS server (this applies to `collector` and `query`), one can: 1. Stop the module 2. Delete the contents from `secrets` folder for that module 3. Restart the module ```{admonition} Keys folder :class: caution Do not delete the actual folder as the folder structure is important. ``` ```{admonition} KMS keys :class: caution Do not delete keys from KMS `secret` folder. If system keys are deleted and there happens to be encrypted data in the backend and new data is submitted with different set of keys (KMS regenerates new keys upon not finding them) it will create incompatible data. ``` # Collector & collector-client status indicators In the least verbose mode `collector` and `collector-client` print letters indicating their current state. It is important to note that both of these programs are mutithreaded so output will not be in any guaranteed order. The indicators are a follows: collector: | Indicator | Description | |:---------:|-------------------------------------------------------------------------| | N | A worked thread established a new connection with a collector-client | | R | A worked thread read data from client successfully | | E | A worked thread encrypted the data successfully | | P | A worker thread posted encrypted data to backend successfully | | F | A worker thread failed to posted encrypted data to backend, will retry | | L | Worker thread gave up on posting encrypted data, too many fail attempts | collector-client: | Indicator | Description | |:---------:|-------------------------------------------------------------------| | i | A new piece of data has been added to the work queue | | F | All the data has been added to the work queue | | . | A piece of data was successfully sent to the collector | | D | A worked thread has exited (usually after finishing all its work) | # Date format Date format can be any of the major date formats. This works when submitting data for encryption and when quering data using date ranges, not restricted to just UNIX epoch. This includes things like timezones as well. For more information on supported date format visit the [dateutil parser documentation](https://dateutil.readthedocs.io/en/stable/parser.html). # Troubleshooting - If an error regarding `unix:///var/run/docker.sock` or permission denied, please make sure docker daemon/service is running. - If a module can not connect to another, make sure that the following configurations are correct: - the bind interface of the server - the address of the server in the client configuration - for example, if the server is bound to its IP, it can not be accessed via localhost, even if they are the same machine. It is recommended to bind the servers to a local interface like "`127.0.0.1`:`` so that they are only accessible locally and the use of proxypass using a reverse proxy (line Nginx) bound to the public interface, this would allow you to enable HTTPS. - the address format (for clients) is as follows: - protocol://hostname:port - no slash at the end