Increasingly, enterprise customers want to be able to combine data from disparate sources to answer business critical questions. 47Lining specializes in helping customers develop big data solutions that leverage AWS products like DynamoDB, Kinesis, S3, Redshift and EMR.

47Lining worked with a data provider customer on a big data billing workload to:

  1. Add a publish/subscribe Domain Event architecture to their data platform API.
    • This resulted in all API calls and other business-relevant events within the data platform Platform being aggregated in DynamoDB.
    • The rate of event publishing and capture in DynamoDB can easily scale as API usage increases.
  2. Establish a periodic process to partition account usage data
    • Each night, AWS Data Pipeline invokes a transient EMR job that partitions usage data by account and by month. EMR reads the data directly from DynamoDB.
    • Per-account data is serialized into self-describing objects for each Account- Month, and added to a Data Lake in S3
  3. Flexibly combine API Usage Data with other data sources
    • Additional data sources (performance monitoring information, per-account related product consumption) can easily be added to the Data Lake
    • Permissions to combinations of data sets can easily be controlled through IAM Roles and S3 resource policies.
    • Data can easily be shared with third parties in other AWS Accounts for Analytics to be performed on subsets of the Data Lake, without requiring that the data be moved. For our customer, the API usage data enabled metering, rating and billing for each of their customers.

case_bigData_large

Big Data from several sources can be easily combined every night. Data can be securely shared and used by multiple analytics by different internal consumers and external partners, without having to move the data.