Logging

Talisker configures a specific carefully-designed logging set up. At a high level, it configures the following with the stdlib logging module:

  • logger class that can collect structured data

  • formatter that supports structured data via logfmt

  • root handler that logs everything at INFO level to stderr

Talisker also provides additional debugging options that can be used for more information.

The stdlib logging module is old, based on a Java API, and makes heavy use of global state. Some of the implementation details for talisker are work arounds for these limitations. But stdlib logging is everywhere, and is actually very flexible, so we can make it work how we want to.

Configuring Logging

If you use Talisker’s entry points, like talisker.gunicorn or talisker.celery, then logging will be configured by default.

You can also use the more generic entrypoints that talisker provides to run any script with Talisker logging enabled.

talisker.run script.py ...

or

python -m talisker script.py ...

If you are using your own entry point, you can configure Talisker with:

import talisker
talisker.initialise()

Note that this should be done before any loggers are created via logging.getLogger() or similar, as they will not be altered.

This will set up logging, taking into account environment variables like DEBUGLOG. If you want to configure those parameters explcitly, you can do:

config = {'devel': True, 'debuglog': '/path'}
talisker.logs.configure(config)

For testing, you can use special test configuration, which sets up talisker style logging, except adds a single NullHandler to the root logger:

talisker.logs.configure_test_logging()

This means logging will work as per talisker’s set up, but you will get no log output. You can use the helpers in talisker.testing to help testing log messages.

Logger Class

Talisker sets a custom base logger class via logging.setLoggerClass(). Its only difference from logger.Logger is that it supports more explicitly storing ‘extra’ arguments to the log call. This allows the StructuredFormatter class to append an arbitrary number of flags to the formatted message. Without this, there is no way to know which fields of a LogRecord are supposed to be added as tags.

It also supports pulling in additional extra data from the current context, which is primarily used for providing request_id data for the log message.

Root Handler

Talisker simply adds a handler to the root logger to log to stderr, at the INFO log level.

  • Simple, self-contained application, no log file config

  • No file permissions needed by app

  • System handles buffering, synchronisation, persistence, rotation and shipping

  • Works in development

  • PaaS friendly

Root Logger Limitations

If you log messages against the root logger object (i.e. logging.getLogger()), then Talisker’s structured logging enhancements will not work. This is due to the fact that the stdlib logging module instanciates the root logger object at import, and has no facility for safely switching it to be another instance or class. For libraries, it is not recommended practice to use the root logger, you should be specific about your library’s top level logger anyway.

Debug Logging

Talisker also supports adding an additional root handler that logs to disk at DEBUG level. The stderr logging output is unchanged.

To enable, just set the DEBUGLOG envvar to the path you want the log file to go to:

DEBUGLOG=/path/to/logfile talisker ...

If talisker can open that file, it will add a handler to log to it at DEBUG level, and log a message at the start of your log output to say it is doing do. If it cannot open that file, it will log a message saying so, but not fail. The handler is a TimedRotatingFileHandler, set to 24 hour period with no backup copies, i.e. logs last for 24 hours at most.

This is designed to support development and production use cases.

In development, typically usage of DEBUG logs is by grepping a file, rather than viewing in the console, given the verbosity. So we write to disk where the developer has told us to, and they can grep/view the file there.

In production, operators sometimes want to turn on more logging for limited period, to debug a specific problem. But we generally don’t want to ship that extra logging. This is in part due to scaling - debug logs can be 10x more verbose than INFO, this could lead to a 10x traffic spike on your log aggregation service. Additionally, debug logs often include details that are sensitive, and that you don’t want stored centrally. So this mechanism of writing to a temporary log file helps in that scenario too, as the INFO logging on stderr that is shipped is unchanged.

Log Format

Talisker uses a default format that is designed to be human readable in development, but still structured for richer data.

The talisker logging format is as follows:

format = '%(asctime)s.%(msecs)03dZ %(levelname)s %(name)s "%(message)s"'
datefmt = "%Y-%m-%d %H:%M:%S"

which should look like this:

2016-07-14 01:02:03.456Z INFO app "hello"

This provides:

  • the default data python logging usually has

  • a more ISOish timestamp (uses . for msecs rather than , but we omit the T for readability)

  • explicit UTC timestamps (logging module uses local time by default /o)

  • explicitly quoted message (embedded ” are escaped)

Talisker can also append an arbitrary number of ‘tags’ on the end of the log line, following the logfmt idea. e.g.:

2016-07-14 01:02:03.456Z INFO app "hello" foo=bar baz="some value"

These extra tags can be specified in 2 main ways:

  1. By the developer at the call site:

      logger.info('something happened', extra={'foo': 'bar', 'context': 'I can haz it'})
    
    would output::
    
      2016-01-13 10:24:07.357Z INFO app "something happened" foo=bar, svc.context="I can haz it"
    
  2. For a specific context, e.g. for a request. Talisker uses this to add request_id to every log message for a specific request. e.g.:

    logger.info('something happened')
    

    would output:

    2016-01-13 10:24:07.357Z INFO app "something happened" request_id=<request id>
    

    You can add your own temporary context variables with a context manager:

    with talisker.logs.logging_context(foo="bar"):
        logger.info('my important message')
    

    would output:

    2016-01-13 10:24:07.357Z INFO app "my important message" foo=bar
    

Additionally, it would be expected that your log shipper should add additional tags, like hostname or service group, to the logfmt tags when shipping.

If there are any global or context keys, these will take precedence if there is a collision with developer supplied keys. The developer keys will be suffixed with a ‘_’ to preserve the info, with out stomping on the other keys.

Log Suppression

By default, talisker suppresses some loggers.

The python python py.warnings logger is set not to propagate, as these are just noise in production.

Additionally, talisker also configures the ‘requests’ logger to WARNING level. This is because the INFO level on requests is particularly verbose, and we use requests everywhere.

If you prefer to have full requests logs, you can simply set the level yourself.

e.g.:

logging.getLogger('requests').setLevel(logging.INFO)

Additional logging configuration

Talisker just sets a root handler with formatter. You are free to add your own additional loggers and handlers as needed via the normal methods, if you need to.

You can still benefit from the structured logging provided by talisker if you set your handler’s formatter to be an instance of talisker.logs.StructuredFormatter. This is a standard formatter, except it uses UTC for the time and adds the logfmt tags on the end. The default format is as specified in Log Format.

For example, suppose you want to enable debug logs for django’s db logger.

e.g:

handler = logging.FileHandler('db.log')
handler.setFormatter(talisker.logs.StructuredFormatter())
handler.setLevel(logging.DEBUG)
db = logging.getLogger('django.db.backends')
db.setLevel(logging.DEBUG)
db.setHandler(handler)

Gunicorn Logs and Configuration

When run via talisker.gunicorn, Gunicorn’s error logs use talisker’s logging.

Access logs are disabled by default, and are usually not needed when using Talisker, as it logs full structured log per request, which is a superset of the information in access logs. You can however still enable and configure gunicorn’s access logs as well if you wish to.

Talisker overrides some config options for gunicorn, mainly to do with logging. It issues warnings if the user specifies any of these configs, as they will not be applied. Specifically, the following gunicorn config items are ignored by the talisker.gunicorn runner:

  • –error-logfile, as talisker logs everything to stderr

  • –logger-class, talisker uses its custom class

  • –statsd-host and –statsd-port, as talisker uses the STATSD_DSN env var.

If you run talisker.gunicorn in devel mode, and specify –log-level=debug, it will output debug logs to the console.

Grok filters

Talisker includes a filter and patterns for parsing the logformat into logstash with grok. These are in the talisker/logstash/ directory of the source tree. They are also included in the python package as resources.