Lua grammar shenanigans

I’ve been working on some Visual Studio tooling for Lua over the week and just ran into my pet peeve of language design. The first time I ran into this problem was back at the university while writing my first parser. I took me a long time to understand why this didn’t work

The Lua 5.2 grammar in question is this

var  ::= Name | pexp '[' exp ']' | pexp '.' Name
pexp ::= var | fcall | '(' exp ')'

From the Lua 5.2 Reference Manaual. I’ve changed prefixexp to pexp and functioncall to fcall for brevity.

What so nasty about this? pexp is defined in terms of var and var in turn is defined in terms of pexp, all without consuming input. As a human this may make sense, as a computer this is a infinite loop (or stack overflow) bug waiting to happen. Moreover since both fcall and '(' exp ')' starts with left parenthesis this is an ambiguity. This is a bad grammar to to implement a parser from.

When you design your grammar, keep this in mind that you want to be able to determine the next state of the parser based on simply looking at the top symbol, commonly refered to as LL(1) grammars.

Now, var and pexp may be reused throughout the grammar but rewriting them isn’t necesarily impossible and I suppose this is what any decent parser generator will figure out to ensure that the parser isn’t ambigous or never ending (like here).

This sort of recursive grammar is really dubious. What’s more is that it’s meant to convay that you can reapply it to allow the syntax such as a.b["c"].d() to be accapted. Is that easy to understand? What we need here is a non-left-recursive grammar.

Taking a closer look at the Lua C source you’ll find this grammar (and not the one in the reference manual)

pexp = pexp2 {pexp3}
pexp2 = '(' exp ')' | Name
pexp3 = '.' Name | '[' exp ']' | ':' Name args | Name args

Notice how each production here is preceeded by the consuming of an input token. In this form there can be no infinite loops (unless you introduce infinite input). Notice the order of evaluation. I couldn’t tell how Lua resolved the perceived ambiguity between a function call and eval until I realized that a function call must be preeceeded by a name, which isn’t at all clear from the BNF grammar.

A clear cut case of declarative shenanigans (sometimes imperative is better). Even if the expressive nature of more declarative constructs are powerful and productive tools they are difficult to understand.


printf and type inference

A fair amount of bugs and security related issues can be tracked two misaligned printf format strings. An #AltDevBlog post by Jaewon Jung caught my eye, he took a type safe library approach to the problem. I think it’s a sound approach to the problem though I was initially looking for something simpler. After all, it’s is a 995 lines long header file. I’m also forced to target a considerably older c++ toolchain.

What did I do then?

While we can solve for the general case, there’s really only a handful types that we are interested in. These are:

int
long
long long
double
const char*
const wchar_t*
const void*

I’ve done away with smaller integral types, (i.e. short) and decided that if you really wanted to only print a single char you would do so by using a string containing only a single char.

So with this limited set of types we create a union type Value. Like this:

union Value 
{
	int                m_Int;
	unsigned int       m_Uint;
	long               m_Long;
	unsigned long      m_Ulong;
	long long          m_LongLong;
	unsigned long long m_UlongLong;
	double             m_Double;
	const char*        m_Str;
	const wchar_t*     m_Wstr;
	const void*        m_Ptr;
};

We then introduce the wrapper class Arg.

class Arg 
{
public:
    inline Arg(int value)
        : m_Type(kType_Int) 
    {
		m_Value.m_Int = value;
	}
	...
	Value m_Value;
	char m_Type;
};

A bunch of varargs equivalents.

void Format(const char* format, const Arg& arg0)
{
    char ts[1] = { arg0.m_Type };
    Value vs[1] = { arg0.m_Value };
    Format(format, vs, ts);
}
void Format(const char*, const Arg&);
void Format(const char*, const Arg&, const Arg&);
void Format(const char*, const Arg&, const Arg&&, const Arg&);
...

This is obviously the most verbose step but we typically almost never need more than a few varargs per format call. I’ve settled for 5 in my implementation.

And the final format function that does the work:

template <int Length>
void Format(const char* format, const char(&ts)[Length], const Value(&vs)[Length]);

The way we now implement Format is just a tad bit different. Traditional printf format strings use the percent % character followed by a type specifier to tell us where to format the result but with this change the type specifier is redundant.

We still have to scan the string for percent % characters but we simply forward what’s between the percent % character and the type specifier and substitute the type specifier for our own.

A Format call like this

Format("%s", 42)

Would internally be resolved as

Format("%i", 42)

Which is safe.

Internally we just call snprintf for each placeholder with the correct type specifier and value to render the string. We simply infer the type specifier based on the value. While this is not considered type safe what cannot happen (assuming correct implementation) is that you have a format string that results in undefined behavior (bugs/exploits).

Remember, the general idea is to impose reasonable limits and write less bugs not to solve the general case. You can always argue that you should test for these bugs (or optionally use good tools for this) but honestly how many of you really test the fail condition of every external API call?

if (!CreateFile(...))
{
    Log("%s", GetLastError()); // Oops!
    return FALSE;
}

Not saying that this isn’t something that you shouldn’t be testing but in my experience this is where the format bug will take down your program or worse be exploited and wreak havoc.

As to the performance implications of this I consider them negligible (there’s going to be copy overhead) but I’d like to compare the assembly of this approach with other type safe variants and run some tests to get a better picture. The good bits are that we relying on the standard library for most of the heavy lifting. This results in a smaller code base and we do offer the exact same functionality.

Format("%.3s", 1 / 2.0)
         ^

So, in this particular instance we’d keep the .3 between the % and %s but change %s to %f.

These are a bit more subtle:

Format("%d", 1L)    // d -> il
Format("%d", 1LL)   // d -> ill
Format("%d", 1U)    // d -> u
Format("%d", 1UL)   // d -> ul
Format("%d", 1ULL)  // d -> ull

In summary, we don’t have to re-implement the formatting stuff just fix up the mishaps (when the type specifier is not compatible with the value).


Just run the numbers!

I decided to do run some tests. I have two versions of the DJB2 hash function. It’s a crafty small piece of code.

C# version:

public static int Djb2Hash(byte[] bytes)
{
    int hash = 5381;
    for (int i = 0, length = bytes.Length; i < length; i++)
    {
        hash = (hash * 33) + bytes[i];
    }
    return hash;
}
 
public static int Djb2HashFaster(byte[] bytes)
{
    int hash = 5381;
    for (int i = 0, length = bytes.Length; i < length; i++)
    {
        int temp = hash;
        hash = ((temp << 5) + temp) + bytes[i];
    }
    return hash;
}

C++ version:

int Djb2Hash(const char* bytes)
{
    int hash = 5381;
    for (int val = *bytes; val; val = *++bytes)
    {
        hash = (hash * 33) + val;
    }
    return hash;
}
 
int Djb2HashFaster(const char* bytes)
{
    int hash = 5381;
    for (int val = *bytes; val; val = *++bytes)
    {
    	const int temp = hash;
        hash = ((temp << 5) + temp) + val;
    }
    return hash;
}

I ran the tests for 10 seconds. The code ran as many times as it could manage and I took the number of iterations to compute a throughput measurement.

C#                          C++

> test
7c9e6865  5.906 1000ops/µs  7c9e6865 0.015047 1000ops/µs
7c9e6865  4.342 1000ops/µs  7c9e6865 0.014373 1000ops/µs
1.360x                      1.046909x
> The quick brown fox jumps over the lazy dog
34cc38de 45.780 1000ops/µs  34cc38de 0.014252 1000ops/µs
34cc38de 36.901 1000ops/µs  34cc38de 0.014036 1000ops/µs
1.241x                      1.015400x

Some intresting observations. Optimized C# code ran faster than unoptimized C++ code but C++ code compiled with compiler optimizations enabled ran between 300x-3000x faster than anything C# could muster. This test does favour native code due to the heavy array index overhead in managed code. It’s sad though becuase you could easily invent abstractions that allow you to express a computation over a range of values in a type safe manner.

I had a teacher once, he told me that what you do with code doesn’t matter, as long as the algorithm isn’t good. Well that is not bad advice but this complete disregard of what big a difference the choice of platform can make. A factor of 1000x isn’t a negliable gain.

I will say this though, micro benchmarks are freaky, they just don’t converge on anything unless stressed hard enough. So when you measure this stuff, run for a longer period of time, and construct a throughput measurement, use that.

LevelDB does this in it’s benchmark suite. Andrei Alexandrescu also talked about something similar in his Going Native 2013 talk Writing Quick Code in C++, Quickly, you should watch it.


Game Development

I don’t know how I ended up reading through most of a blog post from early 2011 about game development but I did and think reddit was somehow involved. Anyway, I thought it would be an intriguing read. What followed was a hefty comment about how absurdly out of place that blog post was/is.

It wasn’t until the next day I discovered that the post was a two years old. I thought I’d publish some of my own observations on what might be different, at well managed development studio, as supposed to your average software company and why that is.

First, there are people who know the industry and there are those who don’t. The people I’m talking about have been around awhile and are among the most respectful and insightful people I’ve had the opportunity to learn from or know of. These are the people that give great presentations at conferences such as GDC and help the industry move forward as a whole.

With this in mind (and mind you, I’m not one of these people) I’d like to share some of the things that I’ve found to come into stark contrast with my existing beliefs as a more traditional software engineer.

Game code is disposable, means to an end. You’re not going to maintain that code for that long, the platform will change, the design will change. Having great people around that can write great code, is much more valuable to a game studio than having the right process or discipline. This does not mean that it does not exist, it’s a different beast altogether but what would make sense in your run-in-the-mill company doesn’t necessarily translate so well, if you have to ship your game on a specific date.

Console development, specifically PS3 development is rather absurd from a performance perspective. The code that run well on a PS3 is very different from your general purpose software. These people know what the hardware is and they design all their work around that. You have to understand that the more you can get out of a console, the more you can spend on things that will make your game great, this will give you a competitive edge. This is also the reason why game code will shock some people, it was not written to be perfectly readable it was written that way because it got the job done and it was fast.

My last point, is a bit harder for me to argue objectively about but game development grew from a culture that wasn’t motivated by greed (simply making a game for the purpose of making money). These people wanted to make a game because it was part what they believed they should do. A lot of the traditional churn that plague software development does not exist when your reason for doing what you do is so well bespoken. These people are self-motivated talented people and they get to do what they love while many of us don’t. These people make rational decisions about what they need to do next to move forward, they don’t waste a lot of time in meetings. If they did, they’d run out of business, because they need to ship the next game.

If you want to know more about the process of game development I urge you to read some of the publications available from EA DICE or Insomniac Games (they talk about this stuff becuase it make a lot of sense in the context of what they do!) I can personally recommend #GDC12 Ron Pieket: Developing Imperfect Software: The Movie.

That said, game development is not traditional software development, if you think that, you will fail at it. As to whether they are ahead or not, I’m pretty sure they’ve been ahead the rest of us from the start. You just need to know where to look.


I was wrong...

I’d like to think that more often than not, that I’m right. I’d like to think that. It doesn’t mean that I am but I can’t help to think that it caters to some primordial need within my human mind. The need to be right, or of strong conviction is within us and our reasons to be what we are, to protect and defend ourselves from whatever is foreign, be it an idea or something more concrete is instinct.

This is one thing we are. Resistant to change, as if to appear, immutable.

I don’t believe, I know people can change, and people change all the time but they do not do it in the open. Some people spend countless hours in debate, arguing this and that and to my recollection, I’ve never been persuaded right there and then that the beliefs I held were wrong (or of no significant value).

Yet, it still happens and whether we like it or not we are affected by it and we can only resist change to a point. Beyond that point, resistance is futile (to quote the Borg) and it will only serve to make us unhappy.

The value of a good idea will always outweigh the bad and no amount legislation or bureaucracy can prevent a good idea from spreading. Nothing may change but the idea is out there and that’s a start.

So, I’m a Software Engineer and of course, I write from that point of view.

Actions speak louder than words, talk is cheap, show me the code. Unfortunately, I don’t know yet how to code human beings but I do know code and lately I’ve been less worried about being wrong and more focused on what’s moves us forward. I’d recommend anyone reading this to check out Chet Faliszek presentation at Eurogamer Expo 2012 (https://www.youtube.com/watch?v=tdwzvdZFxVM) becuase that’s when it dawned up on me that my worries about being wrong had paralized me to the point where it’s inhibiting my productivity. I still get things done, it’s just that I’m not that adamant about getting the design right the first time, I’m wrong more often but we move at a faster pace.

To tie this together with another favoured quote from Gabe Newell during his talk at The LBJ School of Public Affairs (https://www.youtube.com/watch?v=t8QEOBgLBQU)

By talent – which is a word I hate – I mean the ability to be productive.

I’m rationalizing a lot of my behavior around the idea that it’s better overall, if I can be productive even though there is a larger risk of making a bad design decision somewhere and what I try to do, is to make sure that these things happen in isolation, as to not have to much of an avalanche effect.

And with that I’d like to summarize with a wonderful web comic which I think also resonates very well with what I’ve been trying to say here. But before you click the next link, read the next paragraph.

Scientists (as not to generalize the entire community) tend to simplify their problem. For good reason but to a point that it becomes difficult to fit real world problems to solutions that make sense in theory.

Feel free to follow the link now (http://abstrusegoose.com/406).

It’s refreshing to see someone bash on scientists for all of their conundrums. And hopefully I do not misrepresent the comic when I say that theory is nice but it not helpful in general and generalizations in general is not helpful either. And that when we approach a problem there’s more value in being prepared to do whatever, to be able to twist and turn to put one foot in front of the other.

And in doing so, we will have to concede on some points but what we realize is, that what we thought was essential, wasn’t. We can’t know this from the start.


ParC

ParC is a recursive descent parser compiler, it’s not finished but it’s currently what I’m working on.

I was really exicted when Microsoft announced Oslo, I thought, finally! A generic tool I can use to play with text. But, they never got any further than CTP.

M was a nice language and the whole experience created with IntelliPad was actually nice to work with, you’d play interactivly with your DSL and grammar. This was great fun. Now, I find my self needing tools like this again becuase writing the lexer, parser and syntax tree your self just blows. It’s just a lot of grunt work.

So, as I was writing that lexer, AGAIN, and I thought – maybe I can solve this problem (at least to my own satisfaction) once and for all. Thus ParC was born.

What ParC will be able to do, once finished, is to take a grammar (similar to how you’d write grammar in M) and generate the lexer, parser and syntax tree (with a visitor template) in C++. You are then free to do what you will with the syntax tree.

Most tools I know of have taken the path of mixing the grammar and code and introduce what is sometimes referred to as semantic actions. ParC does not do this, it simply focuses on building that syntax tree. The benefit of going down this road is that ParC is really, language agnostic. While the run-time and compiler will be in C++ it’s possible to target any language as long as you have a generator for that language (and there will be some plug-in architecture for this).

I personally like the idea of thinking of the parser compiler as a function that takes an input in the form of text and returns a structured data set. This is what I wish to do with ParC and every now and then I find myself wishing I had such a tool. To quickly be able to do something meaningful with just text.


SOLID Software Engineering Principles

Back in 2009, the StackOverflow podcast #39, Joel Spolsky started openly bashing the SOLID principles and TDD (Test Driven Development) and claims they were put together by someone who doesn’t really write a lot of code.

This made me a bit upset.

I listened through the Hanselminutes podcast in question and what I heard was quite different from what I think Joel heard because Joel went on about the adverse effect of writing unit tests, that it’s, probably time better spent doing something else, and the absurdity of turning everything into an interface.

I think this story is really interesting, and Joel actually invites Robert Martin (Uncle Bob) to the show, after he published an open letter to Joel and Jeff, which became the topic of podcast #41. And Joel starts the podcast by apologizing.

Experienced software engineers realize when they are wrong.

But it’s important to point out that Joel is also right, in that he creates a situation where you have a programmer that’s at 99% code coverage and then presented with three choices:

  1. Get to 100% code coverage
  2. Deliver a key feature to a key customer
  3. Dramatically improve usability

Only one of the above choices is wrong.

At this point I’m 50 minutes in to the follow up podcast and I’m quite confident that they won’t actually be addressing the core issues. Joel has been going on and on about GUI testing and how unfruitful that is. And I’m tired of examples in which he appears to be trying to test layout quirks rather than stuff that matters.

It feels as if the whole discussion was geared towards using SOLID and TDD where it’s not appropriate and it’s unfortunate because they effectively fail to highlight the importance of SOLID software engineering principles.

The Pragmatic Programmer is a great read, but I think the key takeaway for any such book, is that as a programmer you should be pragmatic.

To answer the question, why SOLID and TDD? You only need take a look at the real world and make a simple observation.

Things change.

What SOLID is all about is understanding the underpinnings of SOLID software engineering principles. Why modular code better responds to change and how you avoid friction while adhering to principal object-oriented design practices. It’s a lot of fancy wording to say that you can design code to not be tightly coupled and still deliver on schedule, all while quickly adapting to change in the real world.

I did say object-oriented, but actually SOLID has very little to do with object-oriented programming, it fits very well into any paradigm, because it’s good advice and you’re not a SOLID zombie, just because you care about system wide interactions.

Now, I tend to forget that the minds of people all work differently and don’t typically share the same history. I often argue my point without realizing that people might not understand where I’m coming from. So, I’d like to drive this home with, why I value, the SOLID principles.

Why SOLID designs matters

This all started with me trying to get into the habit of writing tests. I slowly realized that I couldn’t test my code effectively because the implementation was in most cases tightly coupled. There was simply no way for me to pull out the important parts and isolate them in a controlled manner, and then run meaningful tests. At the time I couldn’t motiviate why I had to put up with the additional work.

To enable the above scenario I had to learn new principles and through practice and hardship I eventually got to understand the SOLID principles. With that came a new understanding and in hindsight I was already using tests to test my code just not in a very structured way.

The point of doing SOLID is that you’re gearing up to be testable by design. But to accept SOLID you don’t have to accept the entirety of TDD, you simply have to acknowledge that there’s value in being able to do automatic testing, something which you cannot do if you don’t design for it.

Testing is a way to drive SOLID designs, which is one way to tackle the constant change we have to deal with every day.

This is me

John Leidegren

Click here for a copy of my resume. There's also tonnes of information on this blog and in relation to my profile over at StackOverflow if you wanna find out even more about what I do.

Also, please keep in mind; the opinions expressed here are my own, not necessarily those of my employer.