Synta
Introduction
Synta 1 is a command-line tool for performing various operations on the regular definition format used internally at CartaBinaria to constrain file naming conventions. It is available at cartabinaria/synta. With Synta, you can:
- validate a
.synta
file’s correctness; - convert a
.synta
file into a regular expression; - convert a
.synta
file into a JSON file describing its contents.
Language Definition
The Synta language is defined by the following BNF:
<upperletter> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" |
"K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" |
"U" | "V" | "W" | "X" | "Y" | "Z"
<lowerletter> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" |
"k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" |
"u" | "v" | "w" | "x" | "y" | "z"
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<alphanum> ::= <lowerletter> | <upperletter> | <digit>
<char> ::= <alphanum> | ":" | ";" | "," | "." | "-" | "_"
<word> ::= <char> <word> | <char>
<text> ::= <word> " " <text> | <word>
<comment> ::= "; " <text> "\n" | ";" <text> "\n"
<id> ::= <lowerletter> <id> | <lowerletter>
<def> ::= <id> " = " <regexp> "\n"
<commdef> ::= <comment> <commdef> | <comment> <def>
<commdefs> ::= <commdef> <commdefs> | <commdef>
<segment> ::= "-" <id> | <opt_segment>
<opt_segment> ::= "(-" <id> ")?" | "(-" <id> <opt_segment> ")?"
<join> ::= <segment> <join> | <segment>
<main> ::= "> " <id> <join> "." <id> "\n"
<language> ::= <commdefs> <main>
where <regexp>
is a valid syntax expressing a regular expression. The regexp syntax used is that of Go, but for our purposes, it should also be compatible with that of JavaScript and Rust.
Examples
; Test type
type = written|oral
; A date in the format yyyy-mm-dd
date = \d{4}-\d{2}-\d{2}
; The row is a number
row = \d
; Any alphanumeric word
extra = (\w|\d)+
; File extension. Possible values:
; - txt, tex, md, pdf, doc, docx
ext = txt|tex|md|pdf|doc|docx
> type-date(-row)?-extra.ext
Parser Implementation
For parsing the entire file, we use a simplistic parser by manipulating the string, as parsing definitions and comments is rather straightforward. For parsing the final line, which contains the concatenation of all rules, we use a small FSA with side effects as follows:
The side effects are:
L
: variable to manage the depth levelconcat
: appends the read symbol toid
push
: depending on the depth level, adds to the structure theid
read so fargenerateOptional
: creates the structure to add optional blocks later through push
Development
Here is a list of files in the repository and their purpose:
├── data.go # Data structures for the parser
├── parser.go # Parsing logic and FSA
├── parser_test.go # Unit tests for the parser
-
Origin of the name: it is one of the top-10 names suggested by chat GPT. We used the prompt: “suggest me a short name for a tiny parser command line utility. the name must not be made up of two or more words” ↩︎