Welcome to our new blog | This page is still under construction; check out our most recent posts!

This is a second blog post with no author

Here are a few projects from DonorsChoose teachers that celebrate AAPI Heritage Month in exciting ways!

Share:

Five years ago, our Data Science & Analytics team wrote about our centralized data stack in this post. The setup that previous members of our team built has served the organization’s needs very well over time, and we appreciate the eagerness of our predecessors in adopting the most modern tools and technologies to power the entire org’s data work and allow everyone at DonorsChoose easy access to our data.

Since then, we’ve added a few more tools and services to our data stack, and thought it was a good time to write an update.

Our Updated Data Stack

Integrating more data sources with Fivetran connectors (primarily more Amazon S3 connectors)
Transitioning some data transformations from Looker to dbt
Switching from ExactTarget to Simon Data for our Email Service Provider
Introducing Amazon Comprehend for one of our core machine learning models
Limiting access to our “open data” Looker instance in favor of more broad access to a standardized data set updated annually

A Deeper Dive

Data Ingest
Fivetran still lives at the heart of our data ingestion process. Fivetran connectors are easy to set up and have allowed us to pull data from several different sources into Redshift. Current sources include our site’s PostgreSQL database, Zendesk, Salesforce, several Amazon S3 buckets, and CSV files from our staff users. You can read a case study about our experience with Fivetran.
We use Heap Analytics to capture and aggregate end-user interaction events on our website, and Heap pipes that data directly to Redshift.
We’ve onboarded Simon Data as our new Email Service Provider. They push engagement data (eg. sends, opens, clicks) into an S3 bucket which Fivetran then pulls into Redshift. Simon Data also pulls data from Redshift (more on that in the Data Security section below).
Data Warehousing
Our data lives in a single Redshift cluster. We use 3 ds2.xlarge nodes, which are storage-efficient and cost-efficient. Currently we’re using about 25% of our allotted storage space. Redshift Advisor analyzes queries and automatically recommends specific sort and distribution keys to optimize table setup over time. Five years ago, our Data Science & Analytics team wrote about our centralized data stack in this post. The setup that previous members of our team built has served the organization’s needs very well over time, and we appreciate the eagerness of our predecessors in adopting the most modern tools and technologies to power the entire org’s data work and allow everyone at DonorsChoose easy access to our data.

Since then, we’ve added a few more tools and services to our data stack, and thought it was a good time to write an update.

Our Updated Data Stack

Integrating more data sources with Fivetran connectors (primarily more Amazon S3 connectors)
Transitioning some data transformations from Looker to dbt
Switching from ExactTarget to Simon Data for our Email Service Provider
Introducing Amazon Comprehend for one of our core machine learning models
Limiting access to our “open data” Looker instance in favor of more broad access to a standardized data set updated annually

A Deeper Dive
The data stack is a little more complicated than it was five years ago, so we’ll break it down below in some detail.

Data Ingest
Fivetran still lives at the heart of our data ingestion process. Fivetran connectors are easy to set up and have allowed us to pull data from several different sources into Redshift. Current sources include our site’s PostgreSQL database, Zendesk, Salesforce, several Amazon S3 buckets, and CSV files from our staff users. You can read a case study about our experience with Fivetran.
We use Heap Analytics to capture and aggregate end-user interaction events on our website, and Heap pipes that data directly to Redshift.
We’ve onboarded Simon Data as our new Email Service Provider. They push engagement data (eg. sends, opens, clicks) into an S3 bucket which Fivetran then pulls into Redshift. Simon Data also pulls data from Redshift (more on that in the Data Security section below).
Data Warehousing
Our data lives in a single Redshift cluster. We use 3 ds2.xlarge nodes, which are storage-efficient and cost-efficient. Currently we’re using about 25% of our allotted storage space. Redshift Advisor analyzes queries and automatically recommends specific sort and distribution keys to optimize table setup over time.

Browse Popular Topics

Looking for something specific?

Search the Blog

See posts for: