Using dictionaries to configure objects

The implementation of PEP 391 (Dictionary-Based Configuration for Logging) provides, under the hood, the basis for a flexible, general-purpose configuration mechanism. The class which performs the logging configuration work is DictConfigurator, and it's based on another class, BaseConfigurator. DictConfigurator knows about logging components such as Logger, Handler, Formatter, and so on, and provides some “syntax sugar” for logging configuration, but the BaseConfigurator class is generic and can be used, with a little extra work, to configure any Python object from a dictionary. The nice thing about using dictionaries is that you side-step the question of whether to use JSON, YAML, Python or some other means of storing your configuration - people can use whatever makes the most sense for them.

Ideally, a configuration approach should allow one to do the following:

Construct arbitrary objects, including sub-components which are themselves arbitrary objects.
Let parts of the configuration to refer to external objects which are accessible through normal import mechanisms.
Let parts of the configuration to refer to other parts of the configuration, so as to avoid duplication.
Offer the ability to refer to sub-configurations which are held outside a given configuration (e.g. in external files). This is commonly called “include” functionality.

I present an approach for doing these things, which is based on the existing code in the logging.config package.

Of course, constructing completely arbitrary objects from untrusted sources can lead to problems. For example, YAML allows the construction of completely arbitrary objects, and the use of YAML by Ruby on Rails led to some security exploits not that long ago. Likewise, pickle allows the creation of arbitrary objects and thus is also vulnerable to the same sorts of security exploits.

In the case of the configuration mechanism being discussed here, creation of objects is under the control of the aconfigurator object, whose implementation can, in principle, use mechanisms (such as, but not limited to, whitelists and blacklists) to control what can and can't be created.

As mentioned in the logging documentation, access to external objects which are accessible through normal import mechanisms is achieved through describing the objects using string literals in the configuration. For example, the string "ext://sys.stderr" would resolve to the object bound to sys.stderr. The resolution process calls the configurator's ext_convert method with the literal string "sys.stderr", and this method in turn calls the configurator's resolve method with the same string to find the value through the import machinery. Either of these methods can be overridden to put security mechanisms in place to prevent access to certain objects, if desired.

As also mentioned in the logging documentation, access to objects internal to the configuration (so that you can reference one object in the configuration from another) is also done through literal strings, of the form "cfg://path"where the path portion indicates how to get to the object from the top-level configuration dictionary. In the path specification, you can use attribute access and item access notation to pinpoint the object you want, e.g. you could use"cfg://handlers.email.toaddrs[0]" to get the first recipient’s email address in an SMTPHandler named email in a logging configuration.

A general purpose configurator which can build an arbitrary object from a dictionary will need:

A callable which will create and initialise the object. This will usually be a class, but it could be any callable.
Positional arguments for that callable.
Keyword arguments for that callable.
In cases where some attributes need to be set in the instance after initialisation, a dict mapping the attribute names to the values to set them to.

Of course, the arguments and attribute values can themselves be configuration dictionaries which specify objects to be created and initialised.

To handle sub-configurations held in external files, the configurator will also support literal strings of the type"inc://path/to/external/file". (The base implementation assumes these are JSON files, but this can be easily generalised to support other file types.)

The following conventions are used in configuration dictionaries which are used to configure objects:

The "()" key, if present, identifies the dictionary as a configuration dictionary for an object. The corresponding value, if not a callable, must be a string which resolves to a callable through normal import mechanisms (e.g. the literal "logging.StreamHandler"). If this key is absent, the dictionary is treated as an ordinary dictionary.
The "[]" key, if present, identifies the positional arguments for the call to the callable. This should be a list of objects or dictionaries used to configure objects. If not present, the empty tuple is used for the positional arguments.
The "." key, if present, must have a corresponding dict as a value. Each key in this dict is an attribute name (i.e. it should be a valid Python identifier), and the corresponding value is either an object or a dictionary used to configure an object.
All other keys are assumed to be keyword arguments for the callable – they should all be valid Python identifiers, and the corresponding values should be either objects or dictionaries used to configure objects.

Note that the use of special keys for callable, positional arguments and attributes means that they will never clash with keyword arguments for the callable.

To show how the scheme works, let's define a dummy class which just holds the objects passed to its initialiser:

class TestContainer(object):
    def __init__(self, *args, **kwargs):
        self.args = args
        self.kwargs = kwargs

We can then define a configuration dictionary:

config_dict = {
    'o': {
        '()': '__main__.TestContainer',
        '[]': [
            1, 2.0, '3', {
                '()': '__main__.TestContainer',
                '[]': [4, 5.0],
                'k11': 'ext://sys.stderr',
                'k12': 'cfg://o[k1]',
            },
        ],
        'k1': 'v1',
        'k2': {
            '()': '__main__.TestContainer',
            'k21': 'v21'
        },
        '.': {
            'p1': 'a',
            'p2': {
                '()': '__main__.TestContainer',
             }
        }
    }
}

We then initialise a Configurator:

>>> cfg = Configurator(config_dict)

If you're going to handle sub-configurations in external files, you'd use the following form of initialiser:

>>> cfg = Configurator(config_dict, '/path/to/external/configs')

The path to external configuration files, if not specified, defaults to the current directory. It's used, if a relative path is specified in an inc://path, to determine the absolute path to the external configuration file.

Once the configurator has been created, we can access the configuration:

>>> cfg['o']
<__main__.TestContainer object at 0x7f6aa90e7d10>
>>> o = cfg['o']
>>> o.args
(1, 2.0, '3', <__main__.TestContainer object at 0x7f8b8eaadd50>)
>>> o.kwargs
{'k2': <__main__.TestContainer object at 0x7f8b8eaadcd0>, 'k1': 'v1'}
>>> o2 = o.args[-1]
>>> o2.args
(4, 5.0)
>>> o2.kwargs
{'k12': 'v1', 'k11': <open file '<stderr>', mode 'w' at 0x7fb245c65270>}

Notice that the cfg:// reference to the configuration and the ext:// reference to the external object have been set correctly.

>>> o3 = o.kwargs['k2']
>>> o3.args
()
>>> o3.kwargs
{'k21': 'v21'}
>>> o.p1
'a'
>>> o.p2
<__main__.TestContainer object at 0x7f8b8c43af50>
>>> o4 = o.p2
>>> o4.args
()
>>> o4.kwargs
{}

The above inspections show that the objects have been constructed as expected. The configuration dictionary can be created from either JSON or YAML files.

To look at how “includes” work, we can create a sub-configuration to be included, in a file tests/included.json:

{
  "foo": "bar",
  "bar": "baz"
}

and we can refer to this in a configuration:

>>> config_dict = {'included_value': 'inc://included.json'}

Then, we instantiate a configurator:

>>> cfg = Configurator(config_dict, 'tests')

and examine the included value:

>>> cfg['included_value']
{'foo': 'bar', 'bar': 'baz'}

which is as expected.

This configuration functionality will be included in the next release of distlib. A simple test script exercising the functionality is available here; you’ll need to clone the BitBucket repository for distlib to actually run it, but you should be able to see how the functionality works just by looking at the script.

Your comments about this configuration approach are welcome, particularly regarding any missing functionality or problems you can foresee. Thanks for reading.

Reposted from: http://pymolurus.blogspot.ca/2013/04/using-dictionaries-to-configure-objects.html

Using dictionaries to configure objects

Using dictionaries to configure objects

Popular Posts

Pages

Categories

My Blog List

Blog Archive

Total Pageviews