C Extended - Regular Expressions - Boilerplate

Acry, 15 August 2019

Boilerplate Code for POSIX and Perl Syntax RegEx in C

Technical Side

A long-term goal is the C-Implementation of Action Tags (AT) using Emscripten for Web-Browsers. AT, like most upmarkers, relies heavily on Regular Expressions (RegEx).

RegEx aren’t part of Standard C. There are 2 major free and open-source software (FOSS)-Implementations i know of:

The POSIX RegEx library and Perl Compatible RegEx PCRE.

The POSIX syntax is used by grep, sed, vi, etc.

Perl syntax is what most programming languages, like Java, JavaScript, Python, lean on.

I pushed my boilerplate-code to github.

Social side

– The Story behind

This morning I thought about why documents are still simulations of paper and it was funny so see that I got a post: “The Flawed History of Graphical User Interfaces” on the same vein. As Amazon came up with their Kindle, I was hoping that they would release good software that deals with text, but instead they just brought up a yet another ebook-format.

dilbert

Software-Development is time consuming, that makes it expensive. As example data wrangling, sometimes referred to as data munging, can be very tedious and especially when reverse engineering is involved. XML and XLST did’t make it as “one-size fits all” software for text conversion.

I started writing a more complex article on Document-Formats and Data-Munging, you gotta wait for it - if you are interested.

One can either focus on driving a business or munching data, if you get paid for data munching and you are happy with that, that is perfect. But, since big-money dictates what will gonna be developed and what will not (linux as a rule exception, who would invest in something that starts being profitable after 30 years?) we are often stepping on a spot. I assume you read the article above, so I continue the dialog. Big-money is impatient and wants a lot fast-money, nowadays more than ever, that is why we have leverages and options on the shareholder market. While talking of big-money, Stakeholders are involved - those are the people that can drive a direction in a company and sometimes world-wide and often they have a great sense for driving a successful business - successful in terms of profit, unfortunately those persons are either WIMP (windows, icons, menus, pointer) - Users of the past decades, wearing/using tech-devices as status symbol or aren’t IT-affine at all. So how could they know how to make progress in Software-Developing beside taking a look on development-speed?

The chain of trust

They can’t, except they have good technical directors they trust in, those must have good senior developers or team-leaders and those must have a good team, too; good communication and with good I mean an open and honest mind exchange - and against all efforts, that chain of trust is the absolute exception.

One could argue that we have great IT-Technology out there - yes we do, but I haven’t seen big technology moves in the past 30 years. No, I am not living in a cave - no matter what technology that is currently availavle, most patents ideas originate back to before 1980. Touch-pads, Voice-Recognition, Character-Recognition, Image-Recognition, VR-Glasses, Motion-Capture-Devices, Chroma-Keying, Position-Tracking, so called Artificial Intelligence - you name it. None of those ideas is younger than me.

xkcd.com/1425/

And that is a very optimistic wrong assumption, 50 years and still going on in image or character recognition.

And all those tiny, tiny baby steps of progression are real, despite the fact we are using gigantic armies of engineers with the best qualifications and all the “new” and “modern” networked technologies.

Dealing with RegEx is most often no fun at all, especially if one has to deal with all the syntax and implementation differences and details in different tools and programming languages, or if you abuse RegEx instead of using a proper algorithm - but if one wants progress, one has to understand what and why has be done before. Maybe one day we may have easier, more common text-wrangling API’s.

That is the reason why I deal with Regular expressions, to make software development better and way more enjoyable - better software for better living.