Secure Automata

Language-theoretic security (LangSec) is a design and programming philosophy that focuses on formally correct and verifiable input handling throughout all phases of the software development lifecycle in order to prevent a variety of abuses. pdf

LangSec offers a practical method of assurance of software free from broad and currently dominant classes of bugs and vulnerabilities related to incorrect parsing and interpretation of messages between software components such as packets, protocol messages, file formats, function parameters, etc. site

Occupy Babel explains LangSec in a few slogans addressing ill conceived aspects of our designs. page

# Interspersed

Apply full recognition to inputs before processing them!

Every piece of software that takes inputs contains a de facto recognizer for accepting valid or expected inputs and rejecting invalid or malicious ones. This recognizer code is often ad hoc, spread throughout the program, and interspersed with processing logic (a "shotgun parser"). This lends the processing logic to exploitation and programmers to false assumptions of data safety.

# Powerful

Reduce Computing Power Greed!

Unneeded computing power in input handling code is a hand-out to attackers. Reduce computing power needed for your protocols, reduce parsing exposure to the necessary minimum! The power that is not there cannot be hijacked.

Once created, the glut of computing power cannot be destroyed, because of backward compatibility; repackaging of insecurity goes on forever. Fight it at the origin: reduce the computing power demands of your protocol!

# Weird

Stop Weird Machines! Context Free or Regular!

Hard-to-parse protocols require complex parsers. Complex, buggy parsers become weird machines for exploits to run on. Help stop weird machines today: Make your protocol context-free or regular!

# Complete

No More Turing-complete Input Languages!

Protocols and file formats that are Turing-complete input languages are the worst offenders, because for them, recognizing valid or expected inputs is UNDECIDABLE: no amount of programming or testing will get it right.

A Turing-complete input language destroys security for generations of users. Avoid Turing-complete input languages!

# Ambiguous

Computational Equivalence for All Protocol Endpoints!

When software components communicate, the messages they send to each other must be interpreted by receivers exactly as they were meant by senders, otherwise ambiguity and therefore insecurity ensues. Protocol endpoints must therefore contain computationally equivalent parsers.

But, for non-deterministic pushdown automata (ambiguous context-free grammars) and above, the task of validating parser equivalence is UNDECIDABLE: no amount of programming or testing will get it right.

Ensure computational equivalence of protocol endpoints: use only regular and context-free protocols!

Curing the Vulnerable Parser Design Patterns for Secure Input Handling.

The argument here is that we must apply considerable reason to be sure that every computer process is not subject to abuse by performing actions not intended by the design.

We hint at their logic without embracing their formalism when we suggest plugins should be Too Stupid to be Wrong.

Alan Kay argues for reason beyond the simple sequence of a "story" when he suggests the Three Ways We Must Think.

Will we ever solve security or prove otherwise? thread

The Underestimated Dangers of CSV Injection. post

Puffs is a domain-specific language and library for parsing untrusted file formats safely. E.g. images, audio, video, fonts and compressed archives. github