Python's Localization Engineering
Powerhouse
Lokit is a high-performance, strictly typed, and memory-efficient localization toolkit for Python. Built with a fresh interchange concept middleware API, packed with fast, memory safe parsers and powerful backend functionality.
Supporting The Shift
Unlike legacy tools that wrap around XML DOM element trees in-memory, lokit represents a shift away from XML-based localization interchange formats towards native language parsing.
Lokit ingests localization formats (TMX, XLIFF, PO, XLSX, CSV, JSON, HTML, IDML) and compiles them into a strict, unified structural data model. This enables not just parsing, but robust data manipulation, semantic extraction, interchange, and advanced translation memory features out-of-the-box. Lokit focuses on streaming and asynchronous processing rather than using in-memory files. More file types, interchanges and backend framework APIs to come.
Unified Base Model
The main premise here is a common, structured and type-safe dataclass model structure that is intentionally compatible with any file format, not just localization interchange formats. This format type can be easily converted to JSON for interchange with other systems. High performing APIs for these conversions are included in Lokit with more to come.
Native Structural Modeling
Converts interchange formats into strict, unified data classes, with robust mypy type safety. This ensures the APIs are type safe upon improving performance and memory efficiency by ~20% compiling to C binary extensions throughout. Extensions are pre-compiled and part of the library for all operating systems.
Performance & Efficiency
Stress-test benchmark on large interchange files containing 500,000+ segments. Parsing tests covered TMX & XLIFF to other file formats. Results indicate a steller improvement for speed with a new standard for memory. The compelling memory effeciency allows for multiple concurrent parsing in asynchronous pipelines.
Technical Features
Common Interchange Extraction
Automatically parses and isolates inline tags and metadata into a common properties, and formatting markers, allowing for safe manipulation of text without corrupting code.
Parse into Anything
The common interchange dataclasses allow for interchange to language-native objects. It also allows translation interchange files like TMX & XLIFF to be parsed to and from anything. Parsing APIs included in Lokit.
Backend Integration
This framework allows fast, asyncronous and concurrent parsing and processing of localizaton data, and at high speed. Lokit's memory effeciency allows for safe multiple paralell processing without spawning more clusters than needed.
Async Included
In legacy Python localization toolkits, pipelines are heavy, syncronous and relies on DOM trees in memory. Lokit offers asynchronous APIs as standard with streaming built-in for speed and impressive memory effeciency.