Parsing PBRT scenes

December 30th, 2010 by Bramz

I’ve started writing this post back in June, but I guess something else came up =) LiAR always has been an on-and-off thing, and this is no different. But let’s finish this one before the end of the year.

For the last few weeks months, I’ve been working on a parser for the PBRT scene description language. This is the file format used by the likewise named renderer from the book Physically Based Rendering: From Theory to Implementation by Matt Pharr and Greg Humphreys.

The goal is to leverage the example scenes from the book as test cases for LiAR: being able to render from the exact same files as PBRT itself, I can directly compare the outputs. Plus, as LuxRender is based on PBRT, I can add support for their extensions to the file format, to take advantage of the different exporters in existence for Blender, Maya, …

And of course, I’m doing all this in Python. The benefit is that I’ve got syntax checking for free by simply implementing the commands as Python functions and calling them directly with a number of positional and keyword arguments. More on that below …

Writing the lexer

The command-oriented structure of the PBRT file format makes it very easy to build a parser for, and double so in Python. The file consists of a series of statements, each one starting with a command name followed by a number of arguments, either positional or named. Following are two statements from an example in the book. The first is Rotate with four positional arguments. The second is Shape with "disk" as positional argument and two named ones radius and height, valued [20] and [-1].

Rotate 135 1 0 0
Shape "disk" "float radius" [20] "float height" [-1]

Killeroo rendered by LiAR with PBRT parser. Killeroo model courtesy of headus 3D tools; scene from PBRT book.

The lexer is probably the hardest part to write: the function that translates the scene file into a series of tokens. And even that one is very simple if you make use of the undocumented Scanner class from the re module. We build a generator function _scan that reads the file line by line and feeds them to the scanner. The tokens are returned as a list of (type, value) pairs, which we yield one by one.

def _scan(stream):
  scanner = Scanner([
    (r"[a-zA-Z_]\w*", lambda s, tok: (_IDENTIFIER, tok)),
    (r"[\-+]?(\d+\.\d*|\.\d+)([eE][\-+]?[0-9]+)?", 
      lambda s, tok: (_NUMBER, float(tok))),
    # more rules ...
  ])
  for line in enumerate(stream):
    tokens, remainder = scanner.scan(line)
    assert not remainder, "syntax error"
    for (type, value) in tokens:
      yield (type, value)

PBRT Commands as Python functions

The major trick of the parser is to implement all PBRT commands as direct callable Python functions. Following is the implementation of Rotate. The first parameter self serves the same function as the C++ this pointer, and is independent of the PBRT syntax. It is only necessary because I’ve implemented the commands as methods of the PbrtScene class. The next four parameters correspond to arguments of the scene description, one by one.

def Rotate(self, angle, x, y, z):
  transform = liar.Transformation3D.rotation(
    (x, y, z), math.radians(angle))
  self.__cur_transform = transform.concatenate(
    self.__cur_transform)

DOF dragons rendered by LiAR with PBRT parser. Dragon model courtesy of Stanford University Computer Graphics Laboratory; scene from PBRT book.

The Shape command is a bit harder, as the first parameter is the shape type, and determines what parameters should follow. I tackled this by playing the same trick again. For each shape type, I provide a function _shape_<name> to be called. Shape eats the positional argument name, and the Python interpreter collects all remaining keyword arguments in **kwargs. Shape does not have any positional arguments other than name, so there’s no *args. Eventually Shape uses name to lookup the appropriate shape function, and calls it with the keyword arguments received. In the example, _shape_disk will be called and the content of **kwargs will automatically be mapped on the parameters height and radius.

def Shape(self, name, **kwargs):
  shape = getattr(self, "_shape_" + name)(**kwargs)
  shape.shader = self.__material
  self.__add_shape(shape)

def _shape_disk(self, height=0, radius=1):
  return liar.scenery.Disk(
    (0, 0, height), (0, 0, 1), radius)

Putting it all together

All that is left to be done, is parsing the tokens generated by the lexer, and calling the command functions.

Here’s a simplified version of the main loop, I’ve left out the Include statement. Each time we encounter a new identifier, we know we’re at the start of a new statement. We execute the previous one, and we store the current identifier for later use. If we find a parameter name, we know the value following will be a keyword argument, so we store the keyword. In any other case, we have a parameter value. If it’s the start of a list, we first eat tokens to complete the list. If we have a stored keyword, the argument is stored in kwargs and the keyword is reset. Otherwise, we append it to the positional arguments args.

key, identifier, args, kwargs = None, None, [], {}
tokens = _scan(path, stream)
for (type, value) in tokens:
  if type == _IDENTIFIER:
    # start of new statement, execute last one
    if identifier:
      getattr(self, identifier)(*args, **kwargs)
    identifier = value
    args = []
    kwargs = {}
  elif type == _PARAMETER:
    keyword = value
  else:
    if type == _START_LIST:
      arg = []
      for (type, value) in tokens:
        if type == _END_LIST: break
        arg.append(value)
    else:
      arg = token
    if keyword:
      kwargs[keyword] = arg
      keyword = None
    else:
      args.append(arg)
if identifier:
  getattr(self, identifier)(*args, **kwargs)

To execute the statement (in bold), we lookup the corresponding method using getattr. If it doesn’t exists, an AttributeError is raised. Next, we calll the method passing the positional and keyword arguments *args and **kwargs. The upshot of all this, is that we get syntax checking for free. If we call Rotate with the wrong number of arguments, the Python interpreter will complain with a TypeError.

TypeError: Rotate() takes exactly 4 arguments (5 given)

Default parameter values are handled automatically too. If kwargs doesn’t have an entry for a parameter, the default value is used instead. And if kwargs contains an unknown parameter name, you get a TypeError

TypeError: _shape_disk() got an unexpected keyword
argument 'z'

Complex ecosystem rendered by LiAR with PBRT parser. Model from Deussen et al. Realistic modeling and rendering of plant ecosystems; scene from PBRT book.

Voila, this sums it about up. It gets a little more complex than this, but not so much. Large parts of the PBRT scene description (version 1) are already implemented, but there’s lots to be done. And then we didn’t mention version 2 of the scene description and the LuxRender extensions. But as said in the introduction, this is an on-and-off thing, and features are implemented on as-needed basis.

PS: many thanks to Matt Pharr and Greg Humphreys for writing such a wonderful book!

One Response to “Parsing PBRT scenes”

  1. k Says:

    Hi

    Thanks for the sum up , this is great information for those who want to work on format converters.

    thanks