bramz' diary : Parsing PBRT scenes
December 30th, 2010 by BramzI’ve started writing this post back in June, but I guess something else came up =) LiAR always has been an on-and-off thing, and this is no different. But let’s finish this one before the end of the year.
For the last few weeks months, I’ve been working on a parser for the PBRT scene description language. This is the file format used by the likewise named renderer from the book Physically Based Rendering: From Theory to Implementation by Matt Pharr and Greg Humphreys.
The goal is to leverage the example scenes from the book as test cases for LiAR: being able to render from the exact same files as PBRT itself, I can directly compare the outputs. Plus, as LuxRender is based on PBRT, I can add support for their extensions to the file format, to take advantage of the different exporters in existence for Blender, Maya, …
And of course, I’m doing all this in Python. The benefit is that I’ve got syntax checking for free by simply implementing the commands as Python functions and calling them directly with a number of positional and keyword arguments. More on that below …
Writing the lexer
The command-oriented structure of the PBRT file format makes it very easy to build a parser for, and double so in Python. The file consists of a series of statements, each one starting with a command name followed by a number of arguments, either positional or named. Following are two statements from an example in the book. The first is Rotate
with four positional arguments. The second is Shape
with "disk"
as positional argument and two named ones radius and height, valued [20]
and [-1]
.
Rotate 135 1 0 0 Shape "disk" "float radius" [20] "float height" [-1]
Killeroo rendered by LiAR with PBRT parser. Killeroo model courtesy of headus 3D tools; scene from PBRT book.
The lexer is probably the hardest part to write: the function that translates the scene file into a series of tokens. And even that one is very simple if you make use of the undocumented Scanner class from the re module. We build a generator function _scan
that reads the file line by line and feeds them to the scanner
. The tokens are returned as a list of (type, value) pairs, which we yield one by one.
def _scan(stream): scanner = Scanner([ (r"[a-zA-Z_]\w*", lambda s, tok: (_IDENTIFIER, tok)), (r"[\-+]?(\d+\.\d*|\.\d+)([eE][\-+]?[0-9]+)?", lambda s, tok: (_NUMBER, float(tok))), # more rules ... ]) for line in enumerate(stream): tokens, remainder = scanner.scan(line) assert not remainder, "syntax error" for (type, value) in tokens: yield (type, value)
PBRT Commands as Python functions
The major trick of the parser is to implement all PBRT commands as direct callable Python functions. Following is the implementation of Rotate
. The first parameter self
serves the same function as the C++ this
pointer, and is independent of the PBRT syntax. It is only necessary because I’ve implemented the commands as methods of the PbrtScene
class. The next four parameters correspond to arguments of the scene description, one by one.
def Rotate(self, angle, x, y, z): transform = liar.Transformation3D.rotation( (x, y, z), math.radians(angle)) self.__cur_transform = transform.concatenate( self.__cur_transform)
DOF dragons rendered by LiAR with PBRT parser. Dragon model courtesy of Stanford University Computer Graphics Laboratory; scene from PBRT book.
The Shape
command is a bit harder, as the first parameter is the shape type, and determines what parameters should follow. I tackled this by playing the same trick again. For each shape type, I provide a function _shape_<name>
to be called. Shape
eats the positional argument name, and the Python interpreter collects all remaining keyword arguments in **kwargs
. Shape
does not have any positional arguments other than name, so there’s no *args
. Eventually Shape
uses name to lookup the appropriate shape function, and calls it with the keyword arguments received. In the example, _shape_disk
will be called and the content of **kwargs
will automatically be mapped on the parameters height and radius.
def Shape(self, name, **kwargs): shape = getattr(self, "_shape_" + name)(**kwargs) shape.shader = self.__material self.__add_shape(shape) def _shape_disk(self, height=0, radius=1): return liar.scenery.Disk( (0, 0, height), (0, 0, 1), radius)
Putting it all together
All that is left to be done, is parsing the tokens generated by the lexer, and calling the command functions.
Here’s a simplified version of the main loop, I’ve left out the Include
statement. Each time we encounter a new identifier, we know we’re at the start of a new statement. We execute the previous one, and we store the current identifier
for later use. If we find a parameter name, we know the value following will be a keyword argument, so we store the keyword
. In any other case, we have a parameter value. If it’s the start of a list, we first eat tokens to complete the list. If we have a stored keyword, the argument is stored in kwargs
and the keyword is reset. Otherwise, we append it to the positional arguments args
.
key, identifier, args, kwargs = None, None, [], {} tokens = _scan(path, stream) for (type, value) in tokens: if type == _IDENTIFIER: # start of new statement, execute last one if identifier: getattr(self, identifier)(*args, **kwargs) identifier = value args = [] kwargs = {} elif type == _PARAMETER: keyword = value else: if type == _START_LIST: arg = [] for (type, value) in tokens: if type == _END_LIST: break arg.append(value) else: arg = token if keyword: kwargs[keyword] = arg keyword = None else: args.append(arg) if identifier: getattr(self, identifier)(*args, **kwargs)
To execute the statement (in bold), we lookup the corresponding method using getattr
. If it doesn’t exists, an AttributeError
is raised. Next, we calll the method passing the positional and keyword arguments *args
and **kwargs
. The upshot of all this, is that we get syntax checking for free. If we call Rotate with the wrong number of arguments, the Python interpreter will complain with a TypeError
.
TypeError: Rotate() takes exactly 4 arguments (5 given)
Default parameter values are handled automatically too. If kwargs doesn’t have an entry for a parameter, the default value is used instead. And if kwargs contains an unknown parameter name, you get a TypeError
TypeError: _shape_disk() got an unexpected keyword argument 'z'
Complex ecosystem rendered by LiAR with PBRT parser. Model from Deussen et al. Realistic modeling and rendering of plant ecosystems; scene from PBRT book.
Voila, this sums it about up. It gets a little more complex than this, but not so much. Large parts of the PBRT scene description (version 1) are already implemented, but there’s lots to be done. And then we didn’t mention version 2 of the scene description and the LuxRender extensions. But as said in the introduction, this is an on-and-off thing, and features are implemented on as-needed basis.
PS: many thanks to Matt Pharr and Greg Humphreys for writing such a wonderful book!