What is procedural generation?
Computer games have a lot of data associated with them; level designs, monster characteristics, items, all sorts of stuff usually referred to as content. Where does this content come from? Usually the game designers create it, they make it up and polish it to ensure that it meets requirements.
Still, if all your content is “hand-made”, that means someone has to make every bit of it. For some games it’s more convenient to have the content automatically generated; generated by a procedure. The process used to create the content is therefore called procedural generation.
Some people will refer to “random” generation, but there are usually strong constraints on the content produced.
The Diablo series of games uses a lot of procedural generation. In Diablo 2, there are a certain number of levels set in the sewers under a desert town. Those levels all use “desert sewer” graphics, but the layout is procedurally generation for each game. The tunnel twists and turns are generated, then populated with monsters. Enemies and treasure chests drop generated treasure hoards featuring magical items with randomly chosen properties. And there are many more examples in the computer game world.
(I keep talking about games, but I’m sure procedural generation has applications in other fields. I just can’t think of any off the top of my head.)
How To
I’m going to cover some basic techniques for procedurally generating content. They won’t solve every problem, but they’ll form a good framework for learning more.
The problem
A while back I wrote a program to generate randomized “demon” descriptions. Inspired by Dungeons & Dragons demons, it produces descriptions of creatures that appear to be assembled from various creatures:
Ramalfal is a female demoness, of a quadreped shape, of generally a gray color. She has a woman’s head mounted on the body of a scorpion.
For this post I’ll go through making a (simpler) version of this program. I’ll provide some code in PHP.
Pseudorandom seeds
It’s nearly impossible to create true randomness in computer programs. Fortunately, nowadays it’s easy to create randomness that’s “good enough”. Most languages provide a facility for creating pseudorandom numbers. Give the number generator a seed, and it will return a string of numbers. More precisely, there’s usually a “seed” function and a “rand” function; after you seed the generator, you call the rand function repeatedly, and you receive a number each time.
Pseudorandom number generators have two useful properties (assuming they’re working correctly, of course):
- Each seed generates a different string of numbers.
- A specific seed will generate the same string of numbers each time it is used.
The point here is that a seed will generate a string of numbers—and thus it will generate the entire passel of content you’re creating. In a sense, the content is “compressed” into one seed (which is itself just a number). The end users can trade seeds that they find interesting. You’ll notice that the original demon generator uses user-input seeds.
PHP provides the mt_rand() pseudorandom number generator. You can seed it with the mt_srand() function, although the generator is automatically seeded when mt_rand() is called for the first time. (One trick programmers use with pseudorandom number generators is to seed them with the current time…if you find yourself in need of a seed.)
Modular arithmetic
Modular arithmetic is a simple concept, but powerful and useful in certain areas of programming (like this one). It all has to do with division. Imagine you’re dividing two integers; the divisor goes into the dividend a certain number of times, and then often a bit is left over. This amount, the remainder, is the whole point of modular arithmetic.
7 divided by 3 is 2 with a remainder of 1; thus, 7 mod 3 is 1.
321 mod 10 is 1.
It may be easiest to think of it as the face of a clock, disregarding AM or PM. If the clock says 12:00, and then we wait for 16 hours, what does the clock say? It says 4, because 16 mod 12 is 4. Of course, on a clock we have 12, and in modular arithmetic we would use 0. For modulus 12, we use the numbers 0 to 11, since those are the only possible remainders.
This is important because a lot of times the numbers returned by the random number generator are rather large. Let’s say we want to generate the roll of a standard six-sided die, and mt_rand() returns 8714182. What now? Well, if we take that number modulus 6, we will get a number from 0 to 5. Then we can add 1 to get a number from 1 to 6. (Of course, mt_rand() will take parameters specifying the minimum and maximum number to return.)
Modular arithmetic can be used in a pinch to let you do without random number generators. If we have an array of five entries, they’re numbered from 0 to 4. If we calculate our seed mod 5, we’ll get an index into the array. Most programming languages use % for modular arithmetic, like so:
$chosen = $array[$seed % 5];
In fact, in PHP we don’t even have to worry about the size of the array, we can just calculate it:
$chosen = $array[$seed % count($array)];
Relative Primality
Relative primality is another simple concept that can be quite useful in this area. However, if you’re really scared off by math, this section is optional.
Two numbers are relatively prime if they have no divisors in common.
- 15 = 3 * 5. 12 = 3 * 2 * 2. 15 and 12 are not relatively prime, because they’re both divisible by 3.
- 15 = 3 * 5. 14 = 2 * 7. 15 and 14 are relatively prime.
Let’s say that instead of using a simple modulus you want to mix things up a bit. One way to do this is to add a number to your seed, but that just adds a bit of an offset. A better way is to multiply your seed by a number. (Combining the methods works well too.) The one thing to remember is that if you multiplay your seed by a number, the multiplier and the modulus must be relatively prime.
If they aren’t relatively prime, you’ll get repeats in your pattern. Let’s say we want to ultimately take a modulus of 6; let’s look at every possible seed in turn; 0, 1, 2 and so on. They’ll make a sequence that looks like this:
0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2…
Now let’s multiply our seeds by 4. 0, 1, 2 becomes 0, 4, 8, 12, 16, 20, 24…and we get this:
0, 4, 2, 0, 4, 2, 0…
Because 6 and 4 are not relatively prime, we get repeats in the pattern. Some of our elements aren’t even being used! Let’s try multiplying by 5 instead. 0, 5, 10, 15, 20, 25, 30 gives us:
0, 5, 4, 3, 2, 1, 0…
That pattern might not seem too odd, but notice that every number is represented. That’s what’s really important.
Rerolling
At this point we can make a little program to illustrate our points. Let’s think of demons in the form of the head of some creature placed on the body of some other creature.
$heads = array("a man", "a woman", "a bird", "a dragon", "an insect");
$bodies = array("a man", "a woman", "a bird", "a dragon", "an insect", "an octopus");
$head = $heads[mt_rand() % count($heads)];
$body = $bodies[mt_rand() % count($bodies)];
$desc = "the head of $head on the body of $body";
If we run this program a few times, we’ll get output like the following:
the head of the man on the body of a dragon
the head of a bird on the body of a woman
the head of an insect on the body of an insect
But wait—that last one seems a bit boring. We’re trying to make really bizarre creatures here, so how do we fix that? One way would be to make sure none of the entries in the heads array match the entries in the bodies array, but that solution has problems.
The best solution is to choose a head, then choose a body; then if the body is unacceptable, choose a different body. (Or, just keep choosing bodies until we find one that works.)
$heads = array("a man", "a woman", "a bird", "a dragon", "an insect");
$bodies = array("a man", "a woman", "a bird", "a dragon", "an insect", "an octopus");
$head = $heads[mt_rand() % count($heads)];
do
{
$body = $bodies[mt_rand() % count($bodies)];
}
while ($body == $head);
$desc = "the head of $head on the body of $body";
(This is one of the few places where I routinely use do/while loops, as opposed to while or for loops.)
I call this technique “rerolling” because it’s like rolling dice to look up entries on a table.
Massaging
This is a slightly more advanced technique that can be useful in some situations. The original demon generator produces names as well. These names are assembled from randomly chosen syllables. However, sometimes the combinations are a little awkward, like “Ioeththul”.
The way I dealt with this was to massage the data. The program replaces awkward letter combinations with better ones.
- “io” at the beginning becomes “yo”
- “thth” becomes “th”
By these rules, our awkward name becomes the more palatable “Yothul”.
The great thing about this technique is that it can produce text that is more intricate than parts assembled together. The drawback is figuring out appropriate text replacement rules; with the original demons program, I pored over output from it to find names that sounded too awkward.
Emergent Properties
In the same vein as massaging, randomly chosen elements could contribute to characteristics of the final product. For example, different body parts could add to the demon’s “combat rating”.
$head_combat = array("a man" => 3, "a woman" => 3, "a bird" => 0, "a dragon" => 2, "an insect" => 1);
$body_combat = array("a man" => 1, "a woman" => 1, "a bird" => 2, "a dragon" => 4, "an insect" => 3, "an octopus" => 1);
(Human heads are smarter, thus they’re more lethal in combat.
)
So, after choosing all our parts, we can calculate stats for our demon.
$combat = $head_combat[$head] + $body_combat[$body];
Culling
Culling is rerolling taken a step further. If we get to the end of an assemblage of parts and we decide we don’t like it, we can throw it out and start again. It’s possible to do this instead of rerolling, but it’s best to start over as soon as you know you want to—and only change what’s necessary.
So why would we have to start over completely? It would have to do with something that cares about the entire demon, like the combat statistic we just created. Perhaps there’s an option on our new demon program that lets users ask for only demons with a certain combat statistic.
The simplest way to implement this is to take everything we’ve done so far and make it a function. The main loop (“generate 100 demons”) calls the function to get a demon; if the combat statistic is appropriate, it adds the generated demon to the list…if not, it gets another one.
This approach may seem inefficient, but it has the advantage of being clear, straightforward and easily maintained. (Just be sure to check that you’re not trying to create invalid content, like a demon with a combat statistic of 10!) It would be difficult to write a program to make intelligent choices based on a target statistic…hardly impossible, of course, just difficult.
In Conclusion
I believe that designing an interesting procedural generation can be a challenge, but implementing it is not that hard. I hope this post has helped you agree with me!