Notes on the C Preprocessor: Token Pasting

A feature of the preprocessor little known outside of hardcore C programmers is token pasting (also called concatenation).  As the name suggests, token pasting lets the programmer take any two tokens in the source code and turn them into a brand new token.  Developers can and do use token pasting in all sorts of clever ways.  The manual for the preprocessor provides an example of using token pasting to generate code from boilerplate:

1: #define COMMAND(NAME) { #NAME, NAME ## _command }
2: struct command commands[] =
3: {
4:  COMMAND (quit),  // expands to { "quit", quit_command }
5:  COMMAND (help),  // expands to { "help", help_command }
6:  …
7: };

The COMMAND macro uses the token paste operator ## to construct identifiers with the _command suffix, helping to eliminate repetitive code.  This pattern is used to generate all sorts of things, even entire function definitions.

Token pasting can be a serious pain for program analyses, code browsers, and really any tool that depends on the syntax tree of a program, because it alters the stream of tokens that the parser sees.  A tool can opt to resolve token pasting first, but this requires evaluating at least some macros, resulting in code that might be dramatically different from the source code the developer is writing.  Even worse, the resulting tokens can depend on configuration options when used with #ifdefs.  Let’s look at a few examples, and see why this is so hard.

