Data Portability Overview
Introduction
The data portability feature provides an efficient way to bulk export raw event data from F5 Distributed Cloud App Infrastructure Protection (AIP) into your Amazon Web Services (AWS) Simple Storage Service (S3) bucket(s). From there, you can integrate Distributed Cloud AIP data with your own tools, where you can streamline security operations without adding more screens to your workflows. You can also choose to store Distributed Cloud AIP data in your own long-term storage locations to meet applicable compliance requirements.
Note
Data portability is a near real-time event export from your system to the S3 bucket. This means it is not possible to export historical events to S3. The Distributed Cloud AIP platform saves raw events for 3 days and cannot retroactively retrieve events from beyond that timeframe.
Set Up Data Portability Integration
To set up the data portability integration between AWS and Distributed Cloud AIP, you need to:
- Create an AWS IAM role specifically for the data portability integration
- Use the API endpoint to set up the data portability integration
Prerequisites
- Administrator access to the AWS Console
- Access to the Distributed Cloud AIP console
Set up IAM Role
- Log into the AWS Console as an administrator.
- Go to Services > Security, Identity, & Compliance.
- Select IAM. The Welcome to the Identity and Access Management page displays.
- Create a new policy.
- In the left navigation pane, click Policies. The Policy page displays.
- Click the Create policy button. The Create Policy page displays.
- Click the JSON tab. The JSON field displays.
- Copy and paste the following text into the JSON field:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:ListBucket", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::<Receive-Events-Bucket-Name>", "arn:aws:s3:::<Receive-Events-Bucket-Name>/*" ] } ] }
Notes:
- Replace <Receive-Events-Buckets-Name> with the name of your S3 bucket that will receive the events from Distributed Cloud AIP.
- EMR is used to stream the data into the S3 bucket, which then renames the files. In order to rename, we need permissions to move those files (to rename), and then to delete old files.
- Click the Review Policy button. The Review policy page displays.
- In the Name field, type a name for the policy. Distributed Cloud AIP suggests naming the policy something like “TSDataPortabilityPolicy.”
- Optionally, in the Description field, type a description of this IAM policy.
- Click the Create policy button. You return to the Policy page. A confirmation message on creation of the new IAM policy displays at the top of the page.
- Create a new IAM role.
- In the left navigation pane, click Roles. The Roles page displays.
- Click the Create role button. The Create role page displays.
- Click the Another AWS Account button. New fields display.
- In the Account ID field, type “896126563706”.
- Select the Require external ID (Best practice when a third party will assume this role) checkbox. An additional field displays.
- In the External ID field, type an identifier that is easy for you to track. You need this ID in later steps.
- Click the Next: Permissions button. The Attach permissions policies page displays.
- In the Filter policies field, type the name of the policy you created in step 4f and press ENTER.
- Select the checkbox next to the policy name.
- Click the Next: Tags button. The Add tags (optional) page displays.
- Optionally, add key / value tag pairs to the role.
- Click the Next: Review button. The Review page displays.
- In the Role name field, type a name for the IAM role. Distributed Cloud AIP recommends the name be descriptive of the role, such as “TSDataPortabilityRole.”
- Optionally, in the Description field, type a description of this IAM role.
- Click the Create Role button. You return to the Roles page. A confirmation message on creation of the new IAM role displays at the top of the page.
- On the Roles page, in the Search field, type the name of the role you created in step 5o.
- Click the role name. The Summary page displays.
- Take a note of the Role ARN field, you will use this string in the next section.
- Continue to the next section to create the API endpoint and link it to this IAM role.
Prerequisites
- User ID, organization ID and REST API Key (find these values in the Distributed Cloud AIP UI (user interface) in Settings > Keys. You use these values for Hawk Authentication.)
- Role ARN string and the External ID associated with the role (created in the previous section)
- Your AWS S3 bucket name and, if applicable, prefix (folder)
- Your AWS S3 bucket region
- REST Client (Distributed Cloud AIP recommends Insomnia)
Set up API Endpoint
See Call API Endpoints with Insomnia for instructions on how to use Insomnia, or see Update S3 Export Enrollment API endpoint (API documentation) for more information.
Data Structure and Display
Distributed Cloud AIP batches raw events before exporting them to your AWS S3 bucket.
Your AWS S3 bucket receives a batched Distributed Cloud AIP event file every three to five minutes.
Distributed Cloud AIP batches raw events when one of two events occur:
- Raw events batched more than three minutes ago.
- 105,000 items of raw event data entered Distributed Cloud AIP since the last batch occurred.
Distributed Cloud AIP waits approximately 60 seconds to collect any batches ready for export. Distributed Cloud AIP then processes the batches into individual newline-delimited gzipped JSON files. Distributed Cloud AIP then exports the batch file to your AWS S3 bucket.
Distributed Cloud AIP uses a consistent folder structure to deliver the batch JSON file to your AWS S3 bucket. The folder structure is:
s3://<bucket-name>/<optional-prefix>/<orgId>/<YYYY/MM/DD>/<event-batch-type>/events-<random-number>.ndjson.gz
- <bucket-name> – Replaced with the name of the S3 bucket in which to store Distributed Cloud AIP events.
- <optional-prefix> – Optionally, replace with the top-level path within your AWS S3 bucket. If you do not have or do not want to use a top-level path, then Distributed Cloud AIP ignores this part of the folder structure.
- <orgId> – Replaced with either your Distributed Cloud AIP organization ID or the AWS organization ID of a consumer whose data you receive on their behalf
- <YYYY/MM/DD> – Replaced with the year, month, and day at which Distributed Cloud AIP ingested the batched JSON file. The datestamp is in UTC.
- <event-batch-type> – Replaced with the event source type. Currently, the only value is agent-events.
- events-<random-number> – Replaced with a unique, randomly generated filename that begins with the prefix: "events".
Raw Event Format
The raw event data passed from Distributed Cloud AIP is structured in a specific way. See Raw Event Format for more information.