Adobe Data Feeds: How to use a data lake and Amazon Athena for analytic insights

Thomas Spicer
Openbridge
Published in
5 min readFeb 22, 2019

--

Getting Started

Let’s get started setting up a data pipeline for Adobe data feeds to your target data lake or warehouse.

Step 1: Logging Into Adobe Data Feeds Configuration

Log into Adobe. Once you are in Adobe Analytics, select the “Data Feed” under Admin:

Next, you will have an option to “add” a new Data Feed:

You will see a screen that allows you to configure your feed. It will look something like this:​

Great, you are now ready to set up a data feed!

Step 2: Configure Your Data Feed

The first part you will need to set up is called “Feed Information”:

For interval, you can set recurring daily or hourly. We suggest daily as it will make QA simpler in the event of an issue.

Next, you will see options for your “Destination.” This is the location that Adobe will be pushing these feeds:

IMPORTANT: The path contains a unique ID that is provided to you by Openbridge. If the path or the ID is not present, your pipeline will not work.

Next, you will want to set different options for the feed. These provide a definition for the data that will come through in the Adobe data feed columns.

The first is to remove escaped characters:

Next, set for delivery packaging options as shown:

Lastly, you will need to select the columns to be included. We suggest using the Adobe Analytics clickstream data column reference called “All Columns Premium 2018”.​

IMPORTANT: Once you set a column template, ALL report suites being delivered to the configured destination must follow the same column template. If you select a column template for a configured location, your data pipeline will fail.

Step 3: Testing Your Adobe Data Feeds

This should occur before Step 2. While you can run with a production setup as detailed in Step 2, we suggest testing your configuration in advance of that. This means before sending anything into a production data pipeline, we recommend checking your target feeds.

This should be done by pushing data to a preconfigured “test” S3 bucket. Nothing sent to the test bucket will be processed. The sole purpose of the test bucket is to allow you to work out the logistics and configuration details in advance.​

Once you are comfortable with your test clickstream feeds, you can switch them to the production pipeline described above.

Step 4: Ongoing Testing & Audits

Adobe says that the system may occasionally transfer a file more than once. Also, they may have a failure where the data was not delivered.

We suggest that you establish audits of delivered files. Here is the Adobe data feed columns header for a template worksheet for tracking your feeds:

FEEDNAME | FEEDID | REPORTSUITE | OWNER | STATUS | DATADATE | RUNDATE

Most are self-explanatory and are available via the Adobe UI. The DATADATE reflects the dates covered in the delivery. For example, if the feed was covering 30 days, put the covered periods here. You will want to make sure that all 30 days were delivered and processed.

If there is a delivery failure or delivery of the same data more than, once you will want to be able to have an audit trail to provide the Adobe analytics help team, Adobe may have suggestions. Still, it is usually helpful to track an audit and manifest any gaps/issues once every couple of weeks. If needed, this would be sent to Adobe customer care, who can then determine a course of action to remediate any problems. The longer you wait to highlight an issue, the longer it will take Customer Care to respond/fix. Delays mean they may have trouble finding the batch job for a given day.

Data analytics and reporting notes

Adobe describes a different set of rules that help when using tools like Tableau, Looker, Power BI, and others. These reflect specific formulas and logic for calculating certain types of metrics. For example, here is how to calculate the visitor metric:

  • Exclude all rows where exclude_hit > 0.
  • Exclude all rows with hit_source = 5,7,8,9. Keep in mind that 5, 8, and 9 are summary rows uploaded using data sources. 7 represents transaction ID data source uploads that should not be included in visit and visitor counts.
  • Combine post_visid_high with post_visid_low. Count the unique number of combinations

Openbridge makes data more accessible, more valuable with fully automated, code-free data migration from your Adobe data silos to data lakes or cloud warehouses like Azure Data Lake, AWS Redshift, AWS Redshift Spectrum, AWS Athena, and Google BigQuery

DDWant to discuss how to leverage Adobe Analytics data warehouse or data lake for your organization? Need a platform and team of experts to kickstart your data and analytics efforts? We can help! Getting traction adopting new technologies, especially if it means your team is working in different and unfamiliar ways, can be a roadblock for success. This is especially true in a self-service only world. If you want to discuss a proof-of-concept, pilot, project, or any other effort, the Openbridge platform and team of data experts are ready to help.

Reach out to us at hello@openbridge.com. Prefer to talk to someone? Set up a call with our team of data experts.

Visit us at www.openbridge.com to learn how we are helping other companies with their data efforts.

References

Check out the Adobe Analytics support docs here. To get a primer on the Adobe Data Feed interface, check out the Adobe Data Feed video. Also, we posted an Adobe Data Feed FAQ here and some best practices here.

--

--