Python's Localization Engineering
Powerhouse

Lokit is a high-performance, strictly typed, and memory-efficient localization toolkit for Python. Built with a fresh interchange concept middleware API, packed with fast, memory safe parsers and powerful backend functionality.

View on GitHub

Supporting The Shift

Unlike legacy tools that wrap around XML DOM element trees in-memory, lokit represents a shift away from XML-based localization interchange formats towards native language parsing.

Lokit ingests localization formats (TMX, XLIFF, PO, XLSX, CSV, JSON, HTML, IDML) and compiles them into a strict, unified structural data model. This enables not just parsing, but robust data manipulation, semantic extraction, interchange, and advanced translation memory features out-of-the-box. Lokit focuses on streaming and asynchronous processing rather than using in-memory files. More file types, interchanges and backend framework APIs to come.

Unified Base Model

The main premise here is a common, structured and type-safe dataclass model structure that is intentionally compatible with any file format, not just localization interchange formats. This format type can be easily converted to JSON for interchange with other systems. High performing APIs for these conversions are included in Lokit with more to come.

Native Structural Modeling

Converts interchange formats into strict, unified data classes, with robust mypy type safety. This ensures the APIs are type safe upon improving performance and memory efficiency by ~20% compiling to C binary extensions throughout. Extensions are pre-compiled and part of the library for all operating systems.

Performance & Efficiency

Stress-test benchmark on large interchange files containing 500,000+ segments. Parsing tests covered TMX & XLIFF to other file formats. Results indicate a steller improvement for speed with a new standard for memory. The compelling memory effeciency allows for multiple concurrent parsing in asynchronous pipelines.

lokit15x Less Memory
Duration
13.57s
Peak Memory
135.9 MB
Legacy Standard
Duration
20.30s
Peak Memory
2,034.5 MB

Technical Features

01

Common Interchange Extraction

Automatically parses and isolates inline tags and metadata into a common properties, and formatting markers, allowing for safe manipulation of text without corrupting code.

02

Parse into Anything

The common interchange dataclasses allow for interchange to language-native objects. It also allows translation interchange files like TMX & XLIFF to be parsed to and from anything. Parsing APIs included in Lokit.

03

Backend Integration

This framework allows fast, asyncronous and concurrent parsing and processing of localizaton data, and at high speed. Lokit's memory effeciency allows for safe multiple paralell processing without spawning more clusters than needed.

04

Async Included

In legacy Python localization toolkits, pipelines are heavy, syncronous and relies on DOM trees in memory. Lokit offers asynchronous APIs as standard with streaming built-in for speed and impressive memory effeciency.