penlog
Specification
The PENLog logging format is intended to be used as a generic and reusable data format for measurement data.
The penlog format specifies an abstract data format consisting of various fields with data and metadata.
The abstract penlog format can be mapped to multiple output formats, for instance json
, or hr
, …
All available output formats are explained below.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
The penlog structured logging format consists of the following fields. Unset fields which are considered optional MUST be absent.
component
(string, OPTIONAL)The component, e.g. software module, which has issued the log message. In absence, an implementation SHOULD pull the content of the environment variable
PENLOG_COMPONENT
and MUST set it toroot
as a fallback.data
(string, REQUIRED)The log message as an UTF-8 string.
host
(string, OPTIONAL)The hostname of the machine who generated the messages. This field is OPTIONAL, since it is missing in the human readable format. It is RECOMMENDED that implementations include this field, as it increases reproducability of logging data.
id
(string, OPTIONAL)A unique message identifier.
line
(string, OPTIONAL)Information about the file and line number where this log entry was generated. The information MUST be in the form
filename:number
.filename
can be an absolute or relative path, or a filename.priority
(int, OPTIONAL)This field can be used to optionally set the priority. For priorities, the syslog priorities are used as defined by RFC5424. Implementations can indicate priorities by e.g. a separate color.
stacktrace
(string, OPTIONAL)Implementations can optionally include a stacktrace. This could be useful for debugging if fatal errors occur. Stacktraces are very specific to the used programming language, e.g. python or go. Thus, this field is just an unstructured string.
tags
(list[string], OPTIONAL)To each log entry a custom list of tags can be applied. For instance:
["autogenerated", "pre-test", "post-test", …]
. Tags MAY be key value pairs, separated by=
.timestamp
(string, REQUIRED)ISO8601 string of the current date.
type
(string, REQUIRED)The type field is a free field which can be used to assign a particular message type.
Custom fields can be added freely, in other words, additional custom fields are OPTIONAL. Their post-processing and tooling around these custom fields is up to the developer and MUST be ignored by generic converters.
JSON Format (json)
A penlog log file stored on disk is typically stored in the json
output format.
The tool hr(1)
is intended to be used – similar to cat(1)
– for viewing penlog data in the json
output format.
If encoding of a log message fails, the component MUST be set to JSON
, the type to ERROR
, and the error message MUST be included in data
.
The json
format consists of a verbatim sequence of the described JSON objects.
Each JSON object MUST be present at one line, separated by \n
(ascii 0x0a).
In order to keep decoding simple and line based, no JSON arrays or virtual, endless JSON structures are employed.
JSON pretty (json-pretty)
The json
format forces every JSON object to appear in a single line.
The json-pretty
format provides an indented, more readable json form for debugging purposes.
The actual content of json
and json-pretty
is the same.
It is adviced to use json
for data processing pipelines due to less overhead.
Human Readable Format (hr)
The syntax of the human readable format looks like the following.
Curly braces indicate a field from the JSON format.
If a field is empty it expands to an zero length string; if id
, line
, tags
, or stacktrace
are not availabe, the whole line is omitted.
A verbatim curly brace brace is expressed with two ones: {{
means {
:
{timestamp} {{{component}}} [{type}]: {prio-prefix} {data}
-> id : {id}
-> line: {line}
-> tags: {tags}
-> stacktrace:
| {stacktrace}
timestamp
The RECOMMENDED timestamp format is Go’s
StampMilli
format as defined toJan _2 15:04:05.000
.component
,type
The
component
andtype
fields MUST be padded or truncated that the colons,:
, in every single line are perfectly aligned.data
The actual log message. It MAY be truncated to fit in the current terminal size. When it is truncated an ellipsis character (
…
) MUST be appended to indicate the truncation for the user,id
The optional unique message identifier.
line
The optional filename and line number where this log entry origins from.
stacktrace
The optional stacktrace where this log entry origins from.
prio-prefix
An optional priority prefix. It is RECOMMENDED to indicate message priorities via colors. If colors are not available it MAY be desireable to indicate the priority via a short prefix. The prefixes are enclosed by brackets
[
and]
:E
A
,C
,e
,w
,n
,i
,d
. These letters stand for: emergency, alert, critical, error, warning, notice, info, debug.tags
The optional tags as comma separated values.
Tiny Human Readable Format (hr-tiny)
The hr-tiny
format is the same as hr
except that component
and type
are omitted:
Apr 2 12:48:08.906: Starting tshark with
Apr 2 12:48:09.583: Doing stuff
Example:
Apr 2 12:48:08.906 {scanner } [message]: Starting tshark with
Apr 2 12:48:09.583 {moncay } [message]: Doing stuff
If a JSON line cannot be decoded, the faulty text MUST be included in messages of type ERROR
and component JSON
:
$ python -c "import foo" 2>&1 | hr
Jun 16 08:19:01.305 {JSON } [ERROR ]: Traceback (most recent call last):
Jun 16 08:19:01.305 {JSON } [ERROR ]: File "<string>", line 1, in <module>
Jun 16 08:19:01.305 {JSON } [ERROR ]: ModuleNotFoundError: No module named 'foo'
Environment Variables
The following environment variables MAY be understood by penlog implementations.
The supported datatypes are string
and bool
.
A bool
is a special string consisting of either t, T, true, TRUE, 1
or f, F, false, FALSE, 0
.
PENLOG_COMPONENT
(string)If no component is set, the
component
field MAY be set via thePENLOG_COMPONENT
variable at the scope of an operating system process.PENLOG_CAPTURE_LINES
(bool)If this environment variable is set, implementations SHOULD emit filenames with line numbers via the
line
field.PENLOG_CAPTURE_STACKTRACES
(bool)If this environment variable is set, implementations SHOULD provide stacktraces via the
stacktrace
field.PENLOG_OUTPUT
(string)A switch for implementations to choose from several output forms. Available are:
hr
,hr-tiny
,json
,json-pretty
,systemd
.PENLOG_LOGLEVEL
(string)In order to limit the emitted logging messages, loglevels MAY be supported. If a library supports filtering based on loglevels, it MUST check this environment variable. The supported values are
critical
,error
,warning
,notice
,info
,debug
,trace
. The default MUST beinfo
. A message MUST omitted if itspriority
field contains a value greater thanPENLOG_LOGLEVEL
. A mapping between these strings and integer values is availabe in RFC5424.
See Also
hr(1)