Usage¶

import marimo as mo

Basic Usage¶

We will demonstrate basic usage of dirconf using a simple configuration containing a YAML file called config.yml and a CSV file called data.csv.

A concrete instance of this configuration already exists in the ./basic directory.

from dirconf.utils import tree

print(tree("./basic"))

./basic
├──config.yml
└──data.csv

Defining handlers¶

We first need to define two handlers that implement read and write for the two files that make up a configuration.

YAML handler¶

from os import PathLike

import yaml

class YamlFileHandler:
    def read(self, path: str | PathLike) -> dict:
        with open(path) as file:
            contents = yaml.safe_load(file)
        return contents

    def write(
        self, path: str | PathLike, data: dict, *, overwrite_ok: bool = False
    ) -> None:
        with open(path, mode="w" if overwrite_ok else "x") as file:
            yaml.safe_dump(data, file, sort_keys=False)

Let's quickly test that read works:

YamlFileHandler().read("./basic/config.yml")

{'id': 'Basic config',
 'init_state': [0.0, 0.0, 0.0],
 'params': {'a': 1.0, 'b': 2.0, 'c': 3.0},
 'switch': True}

CSV handler¶

import csv

class CsvFileHandler:
    def read(self, path: str | PathLike) -> list[list[float]]:
        data = []
        with open(path) as file:
            reader = csv.reader(file)
            for row in reader:
                data.append(row)
        return data

    def write(
        self,
        path: str | PathLike,
        data: list[list[float]],
        *,
        overwrite_ok: bool = False,
    ) -> None:
        with open(path, mode="w" if overwrite_ok else "x") as file:
            writer = csv.writer(file)
            for row in data:
                writer.writerow(row)

CsvFileHandler().read("./basic/data.csv")

[['0.44436', '0.86243'],
 ['0.77458', '0.27978'],
 ['0.38164', '0.91161'],
 ['0.02331', '0.75244'],
 ['0.13891', '0.84464']]

Subclassing `DirConfig`¶

The next step is to specify a valid configuration in terms of its files and handlers.

To do this we will use the make_dirconfig function, which produces a subclass of DirConfig whose fields correspond to the two required files.

from dirconf import make_dirconfig

BasicConfig = make_dirconfig(
    cls_name="BasicConfig",
    spec={
        "config": {"path": "config.yml", "handler": YamlFileHandler},
        "data": {"path": "data.csv", "handler": CsvFileHandler},
    },
)

Working with instances of `DirConfig`¶

String representation¶

Instances of DirConfig have a convenient string representation derived from DirConfig.tree.

config = BasicConfig()
print(config)

dirconf.config.BasicConfig
├──config ---- (path='config.yml', handler=YamlFileHandler)
└──data ------ (path='data.csv', handler=CsvFileHandler)

Reading configurations¶

Once a DirConfig is instantiated, configurations are read into a dict by passing a path to a configuration directory into the read method.

config_dict = config.read("./basic")
config_dict

{'config': {'id': 'Basic config',
            'init_state': [0.0, 0.0, 0.0],
            'params': {'a': 1.0, 'b': 2.0, 'c': 3.0},
            'switch': True},
 'data': [['0.44436', '0.86243'],
          ['0.77458', '0.27978'],
          ['0.38164', '0.91161'],
          ['0.02331', '0.75244'],
          ['0.13891', '0.84464']]}

Writing configurations¶

In-memory configurations can be written to the filesystem using the write method.

For the purpose of illustration we will modify the configuration, write it to a temporary directory, then read it back in to check that the modified config is as expected.

import tempfile

# Modify the 'a' parameter
config_dict["config"]["params"]["a"] = -1.0

# Write to a temporary directory, and check it's worked
with tempfile.TemporaryDirectory() as _temp_dir:
    config.write(_temp_dir, config_dict)
    print(tree(_temp_dir))

    reread_config_dict = config.read(_temp_dir)

reread_config_dict

/tmp/tmpxgwqg2_t
├──config.yml
└──data.csv

{'config': {'id': 'Basic config',
            'init_state': [0.0, 0.0, 0.0],
            'params': {'a': -1.0, 'b': 2.0, 'c': 3.0},
            'switch': True},
 'data': [['0.44436', '0.86243'],
          ['0.77458', '0.27978'],
          ['0.38164', '0.91161'],
          ['0.02331', '0.75244'],
          ['0.13891', '0.84464']]}

Accessing the nodes¶

Keep in mind that classes derived from DirConfig are essentially dataclasses whose fields are instances of Node (itself a dataclass!). As such, the usual way of accessing dataclass fields applies.

import dataclasses

for field in dataclasses.fields(config):
    node = getattr(config, field.name)
    print(f"{field.name}:\n\t{type(node)}\n\t{node.path}\n\t{node.handler}")

config:
    <class 'dirconf.node.Node'>
    config.yml
    <class '__main__.YamlFileHandler'>
data:
    <class 'dirconf.node.Node'>
    data.csv
    <class '__main__.CsvFileHandler'>

dataclasses.asdict(config)

{'config': {'handler': "text/plain:<class '__main__.YamlFileHandler'>",
            'path': 'text/plain:config.yml'},
 'data': {'handler': "text/plain:<class '__main__.CsvFileHandler'>",
          'path': 'text/plain:data.csv'}}

Tip

In the vast majority of situations (that I an think of) directly manipulating the nodes, paths or handlers after instantiation would be unnecessary.

Advanced Usage¶

Nested configs¶

print(tree("./nested"))

./nested
├──basic
│  ├──config.yml
│  └──data.csv
└──metadata.yml

NestedConfig = make_dirconfig(
    cls_name="NestedConfig",
    spec={
        "metadata": {"path": "metadata.yml", "handler": YamlFileHandler},
        "inner_config": {"path": "basic", "handler": BasicConfig},
    },
)

nested_config = NestedConfig()
print(nested_config)

dirconf.config.NestedConfig
├──metadata -------- (path='metadata.yml', handler=YamlFileHandler)
└──inner_config ---- (path='basic', handler=BasicConfig)
   ├──config ---- (path='config.yml', handler=YamlFileHandler)
   └──data ------ (path='data.csv', handler=CsvFileHandler)

nested_config.read("./nested")

{'inner_config': {'config': {'id': 'Nested config',
                             'init_state': [0.0, 0.0, 0.0],
                             'params': {'a': 1.0, 'b': 2.0, 'c': 3.0},
                             'switch': True},
                  'data': [['0.44436', '0.86243'],
                           ['0.77458', '0.27978'],
                           ['0.38164', '0.91161'],
                           ['0.02331', '0.75244'],
                           ['0.13891', '0.84464']]},
 'metadata': {'commit': '240dba665dad4335e391380879aa9602034e5c1c',
              'timestamp': 20250811102359,
              'user': 'joe'}}

Variable paths¶

Often configurations are flexible regarding paths and file names.

Here we do not fix the file name for data - perhaps it differs between configurations (e.g. based on a path set in config.yml).

FlexiblePathConfig = make_dirconfig(
    cls_name="FlexiblePathConfig",
    spec={
        "config": {"path": "config.yml", "handler": YamlFileHandler},
        "data": {"handler": CsvFileHandler},
    },
)

FlexiblePathConfig now requires the relative path corresponding to data to be provided upon instantiation.

Note

Under the hood, the provided path is transformed into a Node in the __post_init__ dataclass method.

config_a = FlexiblePathConfig(data="data.csv")
config_b = FlexiblePathConfig(data="observations.csv")

print(config_a)
print(config_b)

dirconf.config.FlexiblePathConfig
├──config ---- (path='config.yml', handler=YamlFileHandler)
└──data ------ (path='data.csv', handler=CsvFileHandler)
dirconf.config.FlexiblePathConfig
├──config ---- (path='config.yml', handler=YamlFileHandler)
└──data ------ (path='observations.csv', handler=CsvFileHandler)

Variable paths and handlers¶

If we can delay specifying a path until instantiation, we may also want to delay the specification of the handler.

Let's say that the config node can be either a YAML or JSON file.

We first define a JSON handler.

import json

class JsonFileHandler:
    def read(self, path: str | PathLike) -> dict:
        with open(path) as file:
            contents = json.load(file)
        return contents

    def write(
        self, path: str | PathLike, data: dict, *, overwrite_ok: bool = False
    ) -> None:
        with open(path, mode="w" if overwrite_ok else "x") as file:
            json.dump(data, file)

Now we construct a DirConfig subclass that leaves config entirely unspecified.

VariableConfig = make_dirconfig(
    cls_name="VariableConfig",
    spec={
        "config": {},
        "data": {"path": "data.csv", "handler": CsvFileHandler},
    },
)

We can now instantiate the class by passing a dict containing both the path and the handler.

variable_config = VariableConfig(
    config={"path": "config.json", "handler": JsonFileHandler}
)
print(variable_config)

dirconf.config.VariableConfig
├──config ---- (path='config.json', handler=JsonFileHandler)
└──data ------ (path='data.csv', handler=CsvFileHandler)

Registering handlers¶

To save some typing, we can register handlers to a handler registry.

from dirconf import register_handler

register_handler("yaml", YamlFileHandler, extensions=[".yml", ".yaml"])
register_handler("json", JsonFileHandler, extensions=[".json"])

Now we can refer to handlers by their key in the registry.

lazy_variable_config = VariableConfig(
    config={"path": "config.json", "Handler": "json"}
)
print(lazy_variable_config)

dirconf.config.VariableConfig
├──config ---- (path='config.json', handler=JsonFileHandler)
└──data ------ (path='data.csv', handler=CsvFileHandler)

Handler inference¶

More usefully, we leave the handler to be inferred by the file extension.

We first load the original 'basic' configuration.

yaml_config = VariableConfig(config="config.yml")
print(yaml_config)
yaml_config.read("./basic")

dirconf.config.VariableConfig
├──config ---- (path='config.yml', handler=YamlFileHandler)
└──data ------ (path='data.csv', handler=CsvFileHandler)

{'config': {'id': 'Basic config',
            'init_state': [0.0, 0.0, 0.0],
            'params': {'a': 1.0, 'b': 2.0, 'c': 3.0},
            'switch': True},
 'data': [['0.44436', '0.86243'],
          ['0.77458', '0.27978'],
          ['0.38164', '0.91161'],
          ['0.02331', '0.75244'],
          ['0.13891', '0.84464']]}

Now we load the same configuration with a .json config file.

json_config = VariableConfig(config="config.json")
print(json_config)
json_config.read("./basic_json")

dirconf.config.VariableConfig
├──config ---- (path='config.json', handler=JsonFileHandler)
└──data ------ (path='data.csv', handler=CsvFileHandler)

{'config': {'id': 'Basic config JSON',
            'init_state': [0.0, 0.0, 0.0],
            'params': {'a': 1.0, 'b': 2.0, 'c': 3.0},
            'switch': True},
 'data': [['0.44436', '0.86243'],
          ['0.77458', '0.27978'],
          ['0.38164', '0.91161'],
          ['0.02331', '0.75244'],
          ['0.13891', '0.84464']]}

Handling missing files¶

Up to this point, attempting to call read on a missing file or directory will result in an error being thrown.

In some situations we might want to handle missing data differently. Sometimes certain files are optional, and one is simply happy to skip over them if they do not exist.

Another use-case would be reading from a 'template' configuration that is incomplete, requiring additional data which we will merge into the Python dict before writing the complete configuration to a new location.

from dirconf.filter import filter_missing

@filter_missing(warn=True)
class DummyHandler:
    def read(self, path: str | PathLike) -> None:
        print("`read` has been called.")

    def write(
        self, path: str | PathLike, data: None, *, overwrite_existing: bool = False
    ) -> None:
        print("`write` has been called.")

ConfigWithOptional = make_dirconfig(
    cls_name="ConfigWithOptional",
    spec={
        "config": {"path": "config.yml", "handler": YamlFileHandler},
        "data": {"path": "data.csv", "handler": CsvFileHandler},
        "optional": {"path": "optional.file", "handler": DummyHandler},
    },
)

Let's try to read the usual configuration

config_with_optional = ConfigWithOptional()
config_dict_with_optional = config_with_optional.read("./basic")

/home/runner/work/dirconf/dirconf/src/dirconf/config.py:60: MissingWarning: read('optional.file') filtered out by test 'read=lambda path: path.exists(),'; returning `MISSING`.
  data[field.name] = handler.read(config.path)

We see that DummyHandler.read was never called, and instead we are shown a warning that read('optional.file') was filtered out by a test (which we are shown the code for). Note that this warning can be disabled by setting warn=False (the default) in filter_missing.

The configuration dict contains an entry corresponding to the optional node, but it has a special sentinel value, MISSING.

config_dict_with_optional

{'config': {'id': 'Basic config',
            'init_state': [0.0, 0.0, 0.0],
            'params': {'a': 1.0, 'b': 2.0, 'c': 3.0},
            'switch': True},
 'data': [['0.44436', '0.86243'],
          ['0.77458', '0.27978'],
          ['0.38164', '0.91161'],
          ['0.02331', '0.75244'],
          ['0.13891', '0.84464']],
 'optional': 'text/plain:MISSING'}

Now let's see what happens when we try to write the incomplete configuration.

with tempfile.TemporaryDirectory() as _temp_dir:
    config_with_optional.write(_temp_dir, config_dict_with_optional)
    print(tree(_temp_dir))

/tmp/tmpvxywesor
├──config.yml
└──data.csv

/home/runner/work/dirconf/dirconf/src/dirconf/config.py:90: MissingWarning: write('optional.file') filtered out by test: 'write=lambda path, data, **_: data is not MISSING,'; Skipping...
  handler.write(config.path, data[field.name], overwrite_ok=overwrite_ok)

Once again we are shown a warning, which explains both what was filtered out and why, and DummyHandler.write was never called.

Filtering¶

The filter_missing decorator is a special case of a more general filter class decorator, which allows the user to specify tests which trigger the 'missingness' behaviour if they fail.

We will now briefly run through some illustrative examples where filtering comes in useful.

Skipping large files¶

As an example, we will consider a situation where one or more of the files in a configuration is very large, and we want to avoid loading them into memory.

This is easy to achieve by combining the already-familiar filter_missing with a custom filter applied to the read method, using filter_read.

We will demonstrate this by creating a subclass of CsvFileHandler from earlier.

from dirconf.filter import filter_read

@filter_missing()
class PotentiallyLargeFileHandler(CsvFileHandler):
    @filter_read(
        test=lambda path: str(path) not in ["big.csv", "huge.csv"],
        label="skip large files",
    )
    def read(self, path: str | PathLike) -> None:
        return super().read(path)

    def write(
        self, path: str | PathLike, data: None, *, overwrite_existing: bool = False
    ) -> None:
        return super().write(path, data, overwrite_existing=overwrite_existing)

It should not be difficult to see why this works, but the following demonstrates it explicitly.

ConfigWithLarge = make_dirconfig(
    cls_name="ConfigWithLarge",
    spec={
        "a": {"handler": PotentiallyLargeFileHandler},
        "b": {"handler": PotentiallyLargeFileHandler},
        "c": {"handler": PotentiallyLargeFileHandler},
    },
)
print(tree("./sizes"))
config_with_large = ConfigWithLarge(a="small.csv", b="big.csv", c="huge.csv")
config_dict_with_large = config_with_large.read("./sizes")
config_dict_with_large

./sizes
├──big.csv
├──huge.csv
└──small.csv

{'a': [['0.44436', '0.86243'],
       ['0.77458', '0.27978'],
       ['0.38164', '0.91161'],
       ['0.02331', '0.75244'],
       ['0.13891', '0.84464']],
 'b': 'text/plain:MISSING',
 'c': 'text/plain:MISSING'}

Skipping absolute paths¶

from dirconf.filter import filter

@filter(
    read=lambda path: not path.is_absolute(),
    write=lambda path, data, **_: not path.is_absolute(),
)
class SaferCsvFileHandler(CsvFileHandler):
    pass

ConfigWithAbsPath = make_dirconfig(
    cls_name="ConfigWithAbsPath",
    spec={
        "config": {"path": "config.yml", "handler": YamlFileHandler},
        "data": {"path": "data.csv", "handler": SaferCsvFileHandler},
        "big_data": {
            "path": "/path/to/shared/data.csv",
            "handler": SaferCsvFileHandler,
        },
    },
)
config_with_abs_path = ConfigWithAbsPath()
print(config_with_abs_path)

dirconf.config.ConfigWithAbsPath
├──config ------ (path='config.yml', handler=YamlFileHandler)
├──data -------- (path='data.csv', handler=SaferCsvFileHandler)
└──big_data ---- (path='/path/to/shared/data.csv', handler=SaferCsvFileHandler)

:5: UserWarning: Absolute paths are not recommended and may not be supported in future (https://github.com/jmarshrossney/dirconf/issues/13). Did you mean to do this?

config_dict_with_abs_path = config_with_abs_path.read("./basic/")
config_dict_with_abs_path

{'big_data': 'text/plain:MISSING',
 'config': {'id': 'Basic config',
            'init_state': [0.0, 0.0, 0.0],
            'params': {'a': 1.0, 'b': 2.0, 'c': 3.0},
            'switch': True},
 'data': [['0.44436', '0.86243'],
          ['0.77458', '0.27978'],
          ['0.38164', '0.91161'],
          ['0.02331', '0.75244'],
          ['0.13891', '0.84464']]}

with tempfile.TemporaryDirectory() as _temp_dir:
    config_with_abs_path.write(_temp_dir, config_dict_with_abs_path)
    print(tree(_temp_dir))

/tmp/tmp8m152ys0
├──config.yml
└──data.csv

Generating metadata¶

In this example, we will read an incomplete configuration from ./basic, and only upon write will we inject metadata into the configuration.

ConfigWithMeta = make_dirconfig(
    cls_name="ConfigWithMeta",
    spec={
        "config": {"path": "config.yml", "handler": YamlFileHandler},
        "data": {"path": "data.csv", "handler": CsvFileHandler},
        "metadata": {
            "path": "metadata.csv",
            "handler": filter_missing()(CsvFileHandler),
        },
    },
)
config_with_meta = ConfigWithMeta()
config_dict_with_meta = config_with_meta.read("./basic")
config_dict_with_meta

{'config': {'id': 'Basic config',
            'init_state': [0.0, 0.0, 0.0],
            'params': {'a': 1.0, 'b': 2.0, 'c': 3.0},
            'switch': True},
 'data': [['0.44436', '0.86243'],
          ['0.77458', '0.27978'],
          ['0.38164', '0.91161'],
          ['0.02331', '0.75244'],
          ['0.13891', '0.84464']],
 'metadata': 'text/plain:MISSING'}

config_dict_with_meta["metadata"] = [
    ["created_at", "2024-01-01"],
    ["version", "1.0"],
    ["source", "notebook"],
]

with tempfile.TemporaryDirectory() as _temp_dir:
    config_with_meta.write(_temp_dir, config_dict_with_meta)
    print(tree(_temp_dir))

    reread_config = config_with_meta.read(_temp_dir)

reread_config["metadata"]

/tmp/tmpk4u4uy6v
├──config.yml
├──data.csv
└──metadata.csv

[['created_at', '2024-01-01'], ['version', '1.0'], ['source', 'notebook']]

We replaced the MISSING sentinel with actual metadata before calling write, so the filter_missing test (data is not MISSING) passed and the CSV was written normally.

Config Validation¶

A primary motivation for reading file-based configurations into Python dicts is to enable validation using Python tooling.

Here we demonstrate how to validate the configuration dict returned by read using Pydantic.

from pydantic import BaseModel

class ParamsModel(BaseModel):
    a: float
    b: float
    c: float

class ConfigModel(BaseModel):
    id: str
    params: ParamsModel
    init_state: list[float]
    switch: bool

We can now validate the 'basic' configuration from earlier:

config_dict

{'config': {'id': 'Basic config',
            'init_state': [0.0, 0.0, 0.0],
            'params': {'a': -1.0, 'b': 2.0, 'c': 3.0},
            'switch': True},
 'data': [['0.44436', '0.86243'],
          ['0.77458', '0.27978'],
          ['0.38164', '0.91161'],
          ['0.02331', '0.75244'],
          ['0.13891', '0.84464']]}

validated_config = ConfigModel(**config_dict["config"])
validated_config

ConfigModel(id='Basic config', params=ParamsModel(a=-1.0, b=2.0, c=3.0), init_state=[0.0, 0.0, 0.0], switch=True)

If the configuration contains invalid data, Pydantic will raise a clear validation error:

try:
    ConfigModel(
        id=123,
        params={"a": "not a float", "b": 2.0, "c": 3.0},
        init_state=[0, 0, 0],
        switch=True,
    )
except Exception as e:
    print(type(e).__name__)
    print(e)

ValidationError
2 validation errors for ConfigModel
id
  Input should be a valid string [type=string_type, input_value=123, input_type=int]
    For further information visit https://errors.pydantic.dev/2.13/v/string_type
params.a
  Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='not a float', input_type=str]
    For further information visit https://errors.pydantic.dev/2.13/v/float_parsing

Tip

You can integrate validation directly into your workflow by wrapping the read method:

def read_validated(config_instance, path):
    config_dict = config_instance.read(path)
    config_dict["config"] = ConfigModel(**config_dict["config"]).model_dump()
    return config_dict

This ensures that every time you load a configuration, it is automatically validated against your Pydantic model.