API

This is the API documentation of Debmans. It should be stable across major releases. See the Design document for more details about the design.

Extractor

The extractor processes Debian packages and extracts specific patterns into a target directory. It uses a cache file that is named according to the package name and version to avoid the costly operation of opening the same package file multiple times.

debmans.extractor.SOURCES_COMP_FMTS = ['gz', 'bz2', 'xz']

supported compression formats for Sources files. Order does matter: formats appearing early in the list will be preferred to those appearing later

class debmans.extractor.PackageExtractor(regex=[], root='.', destdir=None, dryrun=False)[source]

extract certain files from debian packages

apt_cache = None

this takes one second to load, swallow the cost now instead of for every package

files = None

the files written during extraction

root = None

the root of the mirror to look files in

destdir = None

the directory where to write extracted files

dryrun = None

do not write anything if True

patterns = None

regex patterns of files to extract

regexes

compiled static cache of regex patterns

to regenerate this when patterns is changed, set _regexes to None.

write_file(item, data)[source]

callback to actually write files in archive

this will check for the internal regex list and write the given file in destdir, creating missing directories as needed.

only the part that is matching the pattern is extracted, unless the pattern features a path group, in which case only that part is then extracted.

extract(pkg, destdir=None, cache=True)[source]

extract matching patterns into destdir

Parameters:
  • pkg (debian.deb822.Deb822) – a package dictionnary with fields like Filename, Package and Version at least.
  • destdir (str or None to default to the path given in constructor) – where to store the extracted files
  • cache (bool) – if we should check and create the cache file (a PackageCacheFile)
Returns:

extracted files paths

Return type:

list

Raises:

PackageCorruptedError – if apt fails to extract the file

class debmans.extractor.PackageCacheFile(destdir, pkg)[source]

a cache file to see if we have inspected a package before

this creates an empty named pkgname-version in the given destdir on create. there are also facilities to check existence.

it is assumed that if there is no version change, no change is required in the man pages as well.

it is not possible to atomically check existence just yet.

this will leave stray cache files behind.

filename

the full path to the cache file

exists()[source]

if the cache file exists

create()[source]

create the cache file with the given field as content

class debmans.extractor.PackageMirror(path)[source]

inspect a Debian mirror for binary packages

this is a modified replica of debsources’s SourceMirror class. Ideally, this would be merged back into the original class as a derivative.

packages

return the mirror packages as a set of <package, version> pairs

Note

This is just like calling ls(), except there is a cache to avoid calling it multiple times.

releases

list of releases in this repository

Returns:(codename -> description) mappings. description is in the format X.Y codename (stable), unless no matching Release file was found, in which case it can be just codename, which is taken from the packages() list of suites.
Return type:dict
ls()[source]

iterate over packages found in the mirror

this will yield (suite, pkg) pairs. the suite is determined by looking at the name of the 4th directory up from where the Packages file is located, as is standard in complete apt repositories. this may yield weird codenames when working with ad-hoc repositories as the chosen name may be a bit random depending on your directory structure.

Returns:(suite, pkg) tuples for each package found.
Return type:pkg is a deb822 fragment, suite is a string.
exception debmans.extractor.DebmirrorError[source]

runtime error when using a local Debian mirror

exception debmans.extractor.PackageCorruptedError[source]

runtime error when using a local Debian mirror

Note

the documentation for click functions is incomplete. they should actually be turned into usage page and manpages, see this issue for details.

debmans.extractor.extract()

extract manpages from Debian binary packages in mirror

iterate over all binary packages found in the mirror, and extract each included manpage to the output directory.

Renderer

The Renderer module takes care of turning extracted documentation into HTML format. It uses Jinja templates and simple timestamp-based caching.

class debmans.renderer.JinjaRenderer(template, cache=True, dryrun=False)[source]

render Jinja templates using given parameters, caching and simulation

this is basically an extension of the Template class, but extended so we can easily pass paths (instead of strings) in and out.

Todo

we should probably have derived Template directly here.

template = None

template to use to render the data

cache = None

if we should check timestamps before writing

dryrun = None

if True, do not write

source = None

source file currently processed

generated_time()[source]

handy function to add timestamp to footers

render(target, **data)[source]

render template with given data

if pageinfo isn’t provided in data, it is set to the output of generated_time().

Parameters:
  • target (str) – path to the target file
  • data (dict) – set of parameters passed to render()
uptodate(target)[source]

check if the target file is newer than template

also checks the source attribute if it is set, which allos for subclasses to add a file to check.

class debmans.renderer.MarkdownRenderer(template, cache=True, dryrun=False)[source]

render markdown source files with a jinja template

render(source, target, **data)[source]

render the given source file

Parameters:
class debmans.renderer.CommandRenderer(template, command=None, cache=True, dryrun=False)[source]

a simple template-based rendering system

a file is passed as an argument to a command and the output is written into the given template, in the {{content}} Jinja2 element.

this is meant to be subclassed in command-specific renderers.

those can also not even be command-based, as long as they have the following parameters:

  • pattern: regular expression pattern for this class
  • render(source, target, **data): render the given source file into the target file, with the attached Jinja data. at least content is expected in there, but description and title are also encouraged, those should match the template.
postprocess(data)[source]

modify the data sent to the template after execution

this allows subclasses to intervene between the command call and the render call.

by default does nothing

render(source, target, **data)[source]

render the given source file using external command defined in constructor

does not call command in dryrun mode.

Todo

support %(target)s instead of standard output, if necessary?

Parameters:

:raises CommandRendererError: if command fails to convert given page

exception debmans.renderer.CommandRendererError[source]

error raised when man2html fails to render the manpage

class debmans.renderer.ManpageRenderer[source]

abstract class to store the manpage regex pattern

pattern = '/(?:(?P<suite>\\w+)/)?(?P<path>man/(?:(?P<locale>\\w+)/)?man[1-9]/(?P<name>.+)\\.(?P<section>[1-9]\\w*)(?:\\.gz))?$'

default pattern for manpages

class debmans.renderer.W3mRenderer(template, command=None, cache=True, dryrun=False)[source]

render manpages with w3m

command = '/usr/lib/w3m/cgi-bin/w3mman2html.cgi "quit=1&local=%(source)s"'

path to w3m converter

postprocess(data)[source]

process w3m parser output

class debmans.renderer.MandocRenderer(template, command=None, cache=True, dryrun=False)[source]

render pages with mandoc

Todo

this assumes cross-references are done with the .Xr macro, which is unfortunately not often the case in my tests. so some manual cross-ref will be required here.

Todo

croaks on the kodi(1) manpage, a weird redirect, which we should handle manually here. the fix, according to mandoc(1) is to chdir to the correct relative directory. looking at zshall(1), .so looks like an “include” directive.

class debmans.renderer.Man2htmlRenderer(template, command=None, cache=True, dryrun=False)[source]

render manpages with man2html

postprocess(data)[source]

process man2html output

  • it doesn’t return proper exit codes, look for Status header instead. Anything 40X is bad.
  • the title is in the NAME level two header (<h2>)
  • keep only the inside of the <body> tag
  • rewrite URLs to point to the right place
  • remove attribution
debmans.renderer.DefaultManpageRenderer

quick switch to toggle default manpage rendering implementation

alias of MandocRenderer

debmans.renderer.find_files(directory, patterns)[source]

look for file paterns in the given directory and return the right command to run

Todo

this may be slow in large directories and may be reimplemented with os.scandir() if we ever depend on Python 3.5 or later.

Returns:module, path tuples
Return type:list
debmans.renderer.match_jobs(files, patterns)[source]

dispatch the right command for the matching pattern

Parameters:
  • files (list) – list of file paths to inspect
  • patterns (list) – list of tuples (cls, regex). regex is a compiled regex patterns to match against the pathnames, cls is a CommandRenderer subclass to run
Returns:

module, path tuples

Return type:

list

debmans.renderer.render()

render documentation to HTML

this looks for patterns matching a certain regex in the given SRCDIR directory

Note

this assumes files have an extension that should be stripped. for manpages, this should generally be .gz. if manpages are not compressed, this will break section support.

Todo

document that compressed manpages are mandatory

debmans.renderer.site()

render the whole static site

Main entry point

The main entry point of debmans is in the debmans.__main__ module. This is to make it possible to call debmans directly from the source code through the Python interpreter with:

python -m debmans

All this code is here rather than in __init__.py to avoid requiring too many dependencies in the base module, which contains useful metadata for setup.py.

This uses the click module to define the base command and options, which then get passed to subcommands through the obj parameter, see pass_obj() in the click documentation.

Logger

This is a simple helper module to configure the logging module consistently.

debmans.logger.setup_logging(name='debmans', level='info', syslog=False, stream=None)[source]

setup logging module according to the arguments provided

Utilities

Those are various utilities reused in multiple modules that did not fit anywhere else.

various utilities for debmans

debmans.utils.find_parent_module()[source]

find the name of a the first module calling this module

if we cannot find it, we return the current module’s name (__name__) instead.

debmans.utils.find_static_file(path)[source]

locate a file in the distribution

this will look in the shipped files in the package

this assumes the files are at the root of the package or the source tree (if not packaged)

this does not check if the file actually exists.

Parameters:path (str) – path for the file, relative to the source tree root
Returns:the absolute path to the file
debmans.utils.mkdirp(path)[source]

make directories without error

this is a simple wrapper around os.makedirs() to avoid failing if the directory already exists.

it also logs to the DEBUG logging facility when a directory is created.