Skip to content

ETL Utilities#

The com.cinchapi.concourse.etl package provides a small set of building blocks for shaping incoming data into a form that Concourse can store directly. It is designed for ingestion pipelines that need to enforce Concourse’s data model — multi-valued fields, flattened sequences, and consistent value types — on heterogeneous source data.

Strainer#

A Strainer iterates a Map<String, Object> and applies a caller-supplied action to every leaf key/value pair. The key difference from a plain forEach is that sequence values (any Iterable or array) are flattened: each element is presented individually, which matches how Concourse stores multi-valued fields.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Java
Strainer strainer = new Strainer((key, value) -> {
    concourse.add(key, value, record);
});

Map<String, Object> incoming = Map.of(
        "name", "Jeff",
        "tags", List.of("engineer", "founder"));
strainer.process(incoming);

// Effective writes:
//   add "name" as "Jeff" in record
//   add "tags" as "engineer" in record
//   add "tags" as "founder" in record

Strainers are stateless aside from the action you supply, so they can be reused across many records. Use them anywhere an upstream system hands you a JSON-like document and you want each leaf value to become one write.

Transform#

Transform exposes a few static helpers for reshaping document data into the collection shapes Concourse uses internally:

1
2
3
4
5
6
7
8
// Java
Map<String, Object> incoming = ...;

// Flatten sequences into a Guava Multimap
Multimap<String, Object> multi = Transform.toMultimap(incoming);

// Flatten sequences into a "record" map (each key maps to a Set)
Map<String, Set<Object>> record = Transform.toRecord(incoming);

These helpers apply the same flattening rule as Strainer: lists and arrays become multiple entries under the same key. The difference is that Transform returns a materialized collection rather than invoking a callback, which is useful when you want to inspect or mutate the normalized shape before writing it to Concourse.

When to use ETL utilities#

  • Normalizing heterogeneous feeds. If one upstream sends scalars and another sends arrays, using Strainer or Transform ensures both are stored consistently as multi-valued fields.
  • Testing and validation. Locally materialize a Multimap or record map and pass it to local Criteria evaluation to check whether a document would match a condition before writing it.
  • Custom importers. The built-in import CLI uses these helpers internally; if you write a custom importer on top of concourse-import, you can delegate your own leaf-level handling to a Strainer.