ETL Utilities#
The com.cinchapi.concourse.etl package provides a small set of
building blocks for shaping incoming data into a form that
Concourse can store directly. It is designed for ingestion
pipelines that need to enforce Concourse’s data model —
multi-valued fields, flattened sequences, and consistent value
types — on heterogeneous source data.
Strainer#
A Strainer iterates a Map<String, Object> and applies a
caller-supplied action to every leaf key/value pair. The key
difference from a plain forEach is that sequence values (any
Iterable or array) are flattened: each element is presented
individually, which matches how Concourse stores multi-valued
fields.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Strainers are stateless aside from the action you supply, so they can be reused across many records. Use them anywhere an upstream system hands you a JSON-like document and you want each leaf value to become one write.
Transform#
Transform exposes a few static helpers for reshaping document
data into the collection shapes Concourse uses internally:
1 2 3 4 5 6 7 8 | |
These helpers apply the same flattening rule as Strainer:
lists and arrays become multiple entries under the same key.
The difference is that Transform returns a materialized
collection rather than invoking a callback, which is useful
when you want to inspect or mutate the normalized shape before
writing it to Concourse.
When to use ETL utilities#
- Normalizing heterogeneous feeds. If one upstream sends
scalars and another sends arrays, using
StrainerorTransformensures both are stored consistently as multi-valued fields. - Testing and validation. Locally materialize a
Multimapor record map and pass it to local Criteria evaluation to check whether a document would match a condition before writing it. - Custom importers. The built-in
import CLI uses these helpers internally; if
you write a custom importer on top of
concourse-import, you can delegate your own leaf-level handling to aStrainer.