Output parameters

Generate file name

To change the name of the generated file, use output_name

info:
  output_name: generate_file_name

Format

To choose the format of the generated file, use output_format.

Parquet
info:
 output_format: parquet
CSV
info:
 output_format: csv
 delimiter: ','

Default delimiter is ',' but you can specify any character.

JSON
info:
 output_format: json
 wrap_up: false

By default, wrap_up is set to false.
When wrap_up is set to false, each line into the result file is a json object but the whole file is not a valid json.
When wrap_up is set to true, the whole file is a valid json, rows are wrapped up into an array.

Rows

To choose the number of rows in the generated file, use rows.

info:
 rows: 1000000

It can also be written with delimiters for readibilty.

info:
 rows: 1_000_000

Seed

To make the generated data deterministic (reproducible), use seed with an integer value.

info:
 seed: 12345

When a seed is specified, the same YAML configuration will always generate identical data across multiple runs. This is useful for testing, debugging, or when you need consistent datasets.

If no seed is provided, the data generation will use random values and produce different results each time.

Example with seed:

info:
 output_name: "my_data"
 output_format: csv
 rows: 1000
 seed: 42

columns:
 - name: id
   provider: Increment.integer
   start: 1
 - name: score
   provider: Random.i32
   min: 0
   max: 100

Running this configuration multiple times will always produce the same 1000 rows with identical scores.