Datacube Configuration

Datacube Configuration

Datacubes define how DataReporter models your data for interactive Turnilo exploration. As an admin, you manage datacube configurations through the Model Config system, which stores YAML definitions that map database tables to dimensions and measures.

How It Works

  1. An admin creates a Model pointing to a data source table
  2. DataReporter auto-generates a datacube config from the table schema
  3. The admin customizes the YAML (rename dimensions, add computed measures, remove unnecessary columns)
  4. The validator checks the config against the required schema
  5. The Plywood backend loads the config and serves it to Turnilo widgets

Auto-Generation

When you create a new model, DataReporter introspects the database schema and generates a starter config:

Column TypeGenerated As
VARCHAR, TEXT, CHARACTER VARYINGDimension (kind: string)
INTEGER, FLOAT, NUMERIC, BIGINTMeasure (formula: $main.sum($col))
BOOLEANDimension (kind: boolean)
TIMESTAMP, DATE, TIMESTAMPTZDimension (kind: time)

The first TIME column becomes the timeAttribute. The first numeric column becomes defaultSortMeasure.

This gives you a working starting point. You should review and customize it for your use case.

Quick Example

dataCubes:
  - name: orders
    title: Orders
    clusterName: postgres
    timeAttribute: created_at
    defaultSortMeasure: total_revenue
    defaultSelectedMeasures:
      - total_revenue
      - order_count

    attributes:
      - name: created_at
        type: TIME
        nativeType: TIMESTAMP WITH TIME ZONE
      - name: customer_email
        type: STRING
        nativeType: CHARACTER VARYING
      - name: status
        type: STRING
        nativeType: CHARACTER VARYING
      - name: revenue
        type: NUMBER
        nativeType: NUMERIC

    dimensions:
      - name: created_at
        title: Created At
        formula: $created_at
        kind: time
      - name: customer_email
        title: Customer Email
        formula: $customer_email
      - name: status
        title: Order Status
        formula: $status

    measures:
      - name: order_count
        title: Order Count
        formula: $main.count()
      - name: total_revenue
        title: Total Revenue
        formula: $main.sum($revenue)
        units: USD
        format: "$0,0.00"

Datacube Properties

Top-level properties for each datacube entry:

PropertyRequiredTypeDescription
nameYesstring (max 100)Unique identifier, used in URLs. Changing this breaks existing bookmarks.
titleYesstring (max 120)Display name shown in the UI. Safe to change.
descriptionNostring (max 256)Shown on the datacube selection screen.
clusterNameYesstringData source type. See Supported Clusters.
timeAttributeYesstringName of the primary time column. Must exist in attributes.
defaultSortMeasureYesstringMeasure used for default sorting. Must exist in attributes.
defaultSelectedMeasuresYesstring[]Measures shown by default. Each must exist in attributes.
attributesYeslistColumn definitions. See Attributes.
dimensionsYeslistDimension definitions. See Dimensions.
measuresYeslistMeasure definitions. See Measures.

Additional Properties (Advanced)

These properties are supported by the Turnilo frontend but not enforced by DataReporter’s YAML validator:

PropertyTypeDefaultDescription
sourcestring or string[]same as nameTable or data source name. Multiple values create a union.
defaultTimezonestringEtc/UTCTimezone in Olsen format (e.g., Europe/Riga).
defaultDurationstringP1DISO 8601 duration for initial time range (e.g., P3D for 3 days).
defaultPinnedDimensionsstring[]Dimensions pinned in the right panel.
subsetFormulastringPlywood filter applied transparently to all queries.
introspectionstringautofill-allSchema discovery strategy. See Introspection.
maxSplitsnumber3Maximum number of dimension splits.
maxQueriesnumber500Maximum concurrent queries.
rollupbooleanfalseEnable rollup mode (Druid).
refreshRuleobjectData refresh strategy. See Refresh Rules.

Supported Clusters

The clusterName determines which query engine DataReporter uses:

Cluster NameData SourceEngine Mapping
nativeIn-memory / JSONjson
postgresPostgreSQLpg
mysqlMySQLmysql
bigqueryGoogle BigQuerybigquery
athenaAWS Athenaathena
druidApache Druiddruid

Attributes

Attributes define the raw columns available from your data source. Every column referenced in dimensions, measures, timeAttribute, defaultSortMeasure, or defaultSelectedMeasures must have a corresponding attribute entry.

attributes:
  - name: created_at
    type: TIME
    nativeType: TIMESTAMP WITH TIME ZONE

  - name: user_id
    type: NUMBER
    nativeType: INTEGER

  - name: country
    type: STRING
    nativeType: CHARACTER VARYING

  - name: is_active
    type: BOOLEAN
    nativeType: BOOLEAN
PropertyRequiredDescription
nameYesColumn name, must match the database column exactly.
typeYesPlywood type: STRING, NUMBER, TIME, or BOOLEAN.
nativeTypeNoNative database type (e.g., INTEGER, FLOAT, CHARACTER VARYING). Informational.

Druid-specific native types: hyperUnique, thetaSketch, HLLSketch, approximateHistogram, quantilesDoublesSketch. These trigger special aggregation behavior.

Dimensions

Dimensions define the axes you can slice and filter data by. Each dimension references a column via a Plywood formula.

dimensions:
  - name: country
    title: Country
    formula: $country

  - name: signup_date
    title: Signup Date
    formula: $signup_date
    kind: time

  - name: is_premium
    title: Premium User
    formula: $is_premium
    kind: boolean

  - name: age_group
    title: Age Group
    formula: $age
    kind: number
PropertyRequiredTypeDescription
nameYesstringUnique identifier, used in URLs.
titleYesstringDisplay name in the UI.
formulaYesstringPlywood expression. Typically $columnName.
kindNostringData type: string (default), time, number, boolean.
descriptionNostring (max 100)Tooltip or help text.
multiValueNobooleanSet true for multi-value dimensions (Druid).

Dimension Kinds

KindWhen to UseUI Behavior
stringCategorical data (names, IDs, labels)Filter list, search
timeTimestamps and datesTime picker, granularity selector
numberNumeric ranges (age, price)Histogram bucketing
booleanTrue/false flagsToggle filter

Advanced Dimension Formulas

Beyond simple column references ($columnName), dimensions support:

# Lookup (Druid)
- name: country_name
  formula: $country_code.lookup('country_names')

# Regex extraction
- name: version_major
  formula: $version.extract('(\d+\.\d+)')

# Boolean expression
- name: is_usa
  formula: $country == 'United States'

Measures

Measures define the aggregations computed over your data. Each measure uses a Plywood expression against $main (the data segment).

measures:
  - name: count
    title: Row Count
    formula: $main.count()

  - name: total_revenue
    title: Total Revenue
    formula: $main.sum($revenue)
    units: USD
    format: "$0,0.00"

  - name: avg_order_value
    title: Avg Order Value
    formula: $main.sum($revenue) / $main.count()
    format: "$0,0.00"

  - name: unique_customers
    title: Unique Customers
    formula: $main.countDistinct($customer_id)

  - name: max_order
    title: Largest Order
    formula: $main.max($revenue)
    format: "$0,0.00"
PropertyRequiredTypeDescription
nameYesstringUnique identifier. Changing this breaks bookmarked URLs.
titleYesstringDisplay name in the UI.
formulaYesstringPlywood aggregation expression. See Formula Reference.
descriptionNostring (max 100)Tooltip or help text.
unitsNostringUnit label displayed next to the title (e.g., seconds, USD).
formatNostringNumber format string (numbro syntax). Default: 0,0.0 a.
lowerIsBetterNobooleanSet true when lower values are preferable (e.g., error rates).
transformationNostringnone (default), percent-of-parent, or percent-of-total.

Formula Reference

All measure formulas operate on $main, which represents the current data segment. Dimension formulas reference columns directly with $columnName.

Aggregation Functions

FormulaDescriptionExample
$main.count()Count of rowsRow count
$main.sum($col)Sum of values$main.sum($revenue)
$main.avg($col)Average of values$main.avg($price)
$main.min($col)Minimum value$main.min($latency)
$main.max($col)Maximum value$main.max($latency)
$main.countDistinct($col)Distinct count$main.countDistinct($user_id)
$main.quantile($col, p)Percentile (Druid only)$main.quantile($response_time, 0.98)

Arithmetic and Ratios

Combine aggregations with standard arithmetic:

# Revenue per order
formula: $main.sum($revenue) / $main.count()

# Conversion rate as percentage
formula: $main.sum($conversions) / $main.sum($visits) * 100

# Profit margin
formula: ($main.sum($revenue) - $main.sum($cost)) / $main.sum($revenue)

Filtered Aggregations

Apply filters within a measure:

# Revenue from US only
formula: $main.filter($country == 'United States').sum($revenue)

# High-value orders
formula: $main.filter($revenue.greaterThan(100)).count()

Column References

SyntaxMeaning
$columnNameReference to a column
$mainThe current data segment (used in measures)
$col.lookup('name')Named lookup (Druid)
$col.extract('regex')Regex extraction
$col.customTransform('name')Custom transform (Druid)

Introspection

Controls how DataReporter discovers schema information from the data source:

StrategyDescription
autofill-allAuto-create dimensions and measures from database columns (default)
autofill-dimensions-onlyAuto-create dimensions only, measures must be manual
autofill-measures-onlyAuto-create measures only, dimensions must be manual
no-autofillIntrospect columns but don’t auto-create dimensions/measures
noneSkip introspection entirely, use only what’s in the config

Refresh Rules

Control how DataReporter detects new data:

# Batch sources - queries for max time value every minute
refreshRule:
  rule: query

# Real-time sources - assumes current time is latest
refreshRule:
  rule: realtime

# Static/historical data - fixed point in time
refreshRule:
  rule: fixed
  time: "2026-01-15T00:00:00.000Z"

Validation

DataReporter validates configs in three stages:

  1. Length check — config must not exceed the maximum content size
  2. YAML syntax — must be valid YAML (errors report line and column number)
  3. Schema validation — required fields present, correct types, allowed enum values
  4. Cross-reference checktimeAttribute, defaultSortMeasure, and defaultSelectedMeasures must all exist in attributes

If validation fails, the API returns a 400 error with a descriptive message.

Troubleshooting

“Config has the following issues” : Schema validation failed. Check that all required fields are present and correctly typed. Common causes: missing timeAttribute, empty defaultSelectedMeasures, or typos in field names.

“timeAttribute ‘X’ not found in attributes” : The timeAttribute value doesn’t match any entry in the attributes list. Make sure the name matches exactly (case-sensitive).

“defaultSortMeasure ‘X’ not found in attributes” : The measure referenced in defaultSortMeasure isn’t listed in attributes. Add it or change to a measure that exists.

Datacube not showing in Turnilo : Check that the Plywood service is running and reachable. Verify the config loads at /config-turnilo. Check browser console for errors.

Wrong data types : If numeric columns show as dimensions or string columns appear as measures, update the attributes type and move the entry between dimensions and measures as appropriate.

Complete Example

A production-ready datacube for an e-commerce orders table:

dataCubes:
  - name: ecommerce_orders
    title: E-Commerce Orders
    description: Customer orders with revenue and product data
    clusterName: postgres
    timeAttribute: order_date
    defaultTimezone: Europe/Riga
    defaultDuration: P7D
    defaultSortMeasure: revenue
    defaultSelectedMeasures:
      - revenue
      - order_count
      - unique_customers

    attributes:
      - name: order_date
        type: TIME
        nativeType: TIMESTAMP WITH TIME ZONE
      - name: customer_id
        type: NUMBER
        nativeType: INTEGER
      - name: customer_email
        type: STRING
        nativeType: CHARACTER VARYING
      - name: country
        type: STRING
        nativeType: CHARACTER VARYING
      - name: product_category
        type: STRING
        nativeType: CHARACTER VARYING
      - name: revenue
        type: NUMBER
        nativeType: NUMERIC
      - name: quantity
        type: NUMBER
        nativeType: INTEGER
      - name: is_returned
        type: BOOLEAN
        nativeType: BOOLEAN

    dimensions:
      - name: order_date
        title: Order Date
        formula: $order_date
        kind: time
      - name: customer_email
        title: Customer
        formula: $customer_email
      - name: country
        title: Country
        formula: $country
      - name: product_category
        title: Product Category
        formula: $product_category
      - name: is_returned
        title: Returned
        formula: $is_returned
        kind: boolean

    measures:
      - name: order_count
        title: Orders
        formula: $main.count()
      - name: revenue
        title: Revenue
        formula: $main.sum($revenue)
        units: EUR
        format: "0,0.00"
      - name: avg_order_value
        title: Avg Order Value
        formula: $main.sum($revenue) / $main.count()
        format: "0,0.00"
      - name: total_quantity
        title: Items Sold
        formula: $main.sum($quantity)
      - name: unique_customers
        title: Unique Customers
        formula: $main.countDistinct($customer_id)
      - name: return_rate
        title: Return Rate
        formula: $main.filter($is_returned == true).count() / $main.count()
        format: "0.0%"
        lowerIsBetter: true
        transformation: none

Further Reading