This is going to be a long one. Let’s talk about Pokemon ROM hacking.

First, if you don’t know what a ROM hack is, basically it’s just a modified or “hacked” game. Not necessarily for the sake of cheating, though, which is what usually comes to mind when someone talks about hacked games. In fact, many ROM hacks are significantly more difficult than the original games they’re based on. ROM hacks exist for many reasons, but that’s not really the point of this post so if you really want to learn more, Bulbapedia has a great article on the subject.

Anyway, I recently started playing a few different ROM hacks based on Pokemon Emerald. They’re mostly similar in terms of added features, including things such as the Physical/Special split originally introduced in Generation IV, making all Pokemon obtainable without trading, running indoors, capping EVs at 252, etc. But many ROM hacks are more ambitious than just simple quality-of-life improvements, modifying things like Pokemon base stats, movesets, and abilities to reflect changes made in subsequent Pokemon games.

And many of them change where you can find certain Pokemon, especially in order to make it possible to fill the National Pokedex without trading. For example, in Pokemon Emerald Final, you can catch all of the starter Pokemon from the first three Generations in the Safari Zone.

So how do you know where to look to track down an elusive Pokemon? The official games have strategy guides, fansites, and people who have decompiled the game and discovered every last secret contained within. ROM hacks only have whatever documentation was provided with them. Normally, that is enough. Most ROM hacks provide very detailed information about the changes they made, so this isn’t usually an issue. But some ROM hacks, especially very old or less popular ones, might not have documentation, or might have old/outdated information that no longer applies, or might even have been deleted from wherever they were hosted if it’s a very old hack.

But the information is in the game itself, if you know where to find it. So I decided to figure out how to find it.

Resources

There is no single, comprehensive guide to parsing and decoding data within ROMs, so I had to assemble information from several different sources, including:

I also ended up downloading a ROM hacking tool called Advance Map, because its configuration files contain information about where to find data in the ROM.

As for the necessary tools, you’ll need a Hex Editor at the very least. Coding knowledge helps as well, as you can simplify some of the steps with some scripts. I’ll be using Python in any code examples, but any language that can read binary files (so probably all of them) will suffice.

Getting Started

The first thing we have to do is figure out where to look for the data we need. We can find that information by examining the configuration files that come with another ROM hacking tool such as Advance Map, linked above in the Resources section. If you download the tool and browse to the Ini folder, you’ll see a list of configuration files. AdvanceMap.ini contains the information we need. The first few lines look like this:

1[Allgemein]
2AMVersion=1.92
3BekannteRomTypen=BPR,BPG,BPE,AXP,AXV

I don’t know German, so I put “Allgemein” and “Bekannte Rom Typen” into Google Translate and got this:

1[General]
2AMVersion=1.92
3KnownRomTypes=BPR,BPG,BPE,AXP,AXV

Okay, that’s a good start. Now let’s figure out what those ROM codes are. There are five of them, just as there are five Pokemon main-series games on the GameBoy Advance. (Ruby, Sapphire, Emerald, FireRed, and LeafGreen). So which three-letter code corresponds to which game? For that, we’ll need to look at the game itself. The ROM code is stored at the address 0xac in all of the Generation III ROMS. If you open up Pokemon Emerald and read three bytes from that address, you’ll get the code for that ROM:

>>> with open('emerald.gba', 'rb') as infile:
...     _ = infile.seek(0xac)
...     infile.read(3)
b'BPE'

So, BPE corresponds to Pokemon Emerald. All ROM hacks based on Emerald should have the same three bytes at this location. Looking at dumps of the other Pokemon games, we can determine that the codes are:

CodeGame
BPRFireRed
BPGLeafGreen
BPEEmerald
AXPSapphire
AXVRuby

So, with that out of the way, we first need to find out where to start looking. We want to find out what wild Pokemon are available in each location in the game. If you scroll down a bit in the same INI file as before, you’ll find this:

119[WildePokemon]
120art=pointer
121nach=0348048009E00000FFFF0000
122spiele=BPR,BPG,BPE,AXP,AXV

The meaning of WildePokemon should be fairly clear, and spiele translates to “games” and has a value containing all five ROM codes, signifying that this value applies to all five games. The value we’re most interested in though is the nach value, which contains a 12-byte hexadecimal string:

03 48 04 80 09 e0 00 00 ff ff 00 00

This value is a header that we need to search for in the file. We can find the address by just searching through the data, since the Emerald ROM is only 16MiB in size.

>>> with open('emerald.gba', 'rb') as infile:
...     data = infile.read()
...
>>> data.find(b'\x03\x48\x04\x80\x09\xe0\x00\x00\xff\xff\x00\x00')
742740
>>> hex(742740)
'0xb5554'

So the header we are searching for begins at the address 0xb5554. After the 12-byte header is a pointer to the start of the encounter data:

>>> data[0xb5554+12:0xb5554+16]
b'H-U\x80'
>>> import binascii
>>> binascii.hexlify(b'H-U\x80')
b'482d5508'

GBA ROMs store data in “little-endian” format, which means the least significant byte is first. In other words, a value of 48 2d 55 08 as seen in the above code block is actually a pointer to 0x08552d48.

GBA games are loaded into the GBA’s RAM at offset 0x08000000, which means any pointers will start from that address instead of 0x00. The easiest way to convert a pointer to a format that works for an unloaded ROM file is to just zero out the first byte. So the actual address that we should look for is 0x00552d48. At that address, we will find the start of the wild encounter data.

Wild Encounter Data

Encounter Data is stored in a list of 20-byte blocks starting at 0x00552d48. This sequence of blocks is terminated by a block that starts with ff ff with the remaining 18 bytes filled with null (00) bytes.

So let’s look at the first block of Encounter Data in Pokemon Emerald:

00 10 00 00 14 08 55 08 00 00 00 00 00 00 00 00 00 00 00 00

The format of each block is as follows:

ByteDataExample
0Bank Number00
1Map Number10
2-3Filler/Empty00 00
4-7Grass Pointer14 08 55 08
8-11Surfing Pointer00 00 00 00
12-15Rock Smash Pointer00 00 00 00
16-19Fishing Pointer00 00 00 00

The Bank and Map numbers are used as a unique identifier for each map in the game, and will be used later on to find more information about a location including its name. The four pointers point to encounter data for tall grass, surfing, rock smash/headbutt, and fishing encounters. Since all of them other than the grass pointer are empty in this example, we can conclude that there is no accessible water or breakable rocks on this map.

The grass pointer is 0x00550814. At that address we will find the following 8-byte block:

14 00 00 00 e4 07 55 08

The first byte, 14, is the encounter rate. It’s a base 16 hexadecimal number, so if we convert it to base 10, we get 20. Every time you step on a tall grass tile (or, for the other encounter types, surf into a water tile, break a rock, or use a fishing rod), the game will use this value to calculate whether an encounter should trigger. First it generates a random number from 0 to 2879 (inclusive), then compares this number to the encounter rate multiplied by 16. If the random number is lower, an encounter is triggered.

In Python, that kind of function could be defined like this:

from random import randint

def trigger_encounter(encounter_rate: int) -> bool:
    return encounter_rate * 16 > randint(0, 2879)

For this map, there is approximately a 11.111% chance of triggering an encounter when you step on a grass tile:

20 * 16 / 2880 = 0.11111...

The next three bytes are empty filler bytes. The second half of this block is another pointer, which points to the actual start of the encounter data: 0x005507e4

At that address is a 48-byte block, with four bytes per Pokemon for a total of up to 12 Pokemon. Each of the 12 slots has a set chance to occur when an encounter is triggered. The full block of data in this example is:

02 02 22 01 02 02 1e 01 02 02 22 01 03 03 22 01
03 03 1e 01 03 03 1e 01 03 03 22 01 03 03 1e 01
02 02 20 01 02 02 20 01 03 03 20 01 03 03 20 01

Split into four-byte chunks, it looks like this:

ByteChanceData
0-320%02 02 22 01
4-720%02 02 1e 01
8-1110%02 02 22 01
12-1510%03 03 22 01
16-1910%03 03 1e 01
20-2310%03 03 1e 01
24-275%03 03 22 01
28-315%03 03 1e 01
32-354%02 02 20 01
36-394%02 02 20 01
40-431%03 03 20 01
44-471%03 03 20 01

The encounter chances are hard-coded in the game. The first two slots of every grass encounter block each have a 20% chance, the next four have a 10% chance, etc.

Each four-byte slot has the minimum level in the first byte, the maximum level in the second byte, and an index number of the Pokemon as the remaining two bytes. The index number does not exactly correspond with the National Pokedex number of a Pokemon, unfortunately, due to the way Pokemon are stored in the Generation III games. There are several empty slots between Celebi and Treecko and some of the Pokemon aren’t in the same order as the National Dex. Because of this, as well as the fact that ROM hacks can add or chance Pokemon, we will need to read the list of Pokemon from the ROM as well, rather than relying on external information. But in the mean time, we have this data so far:

DataChanceNumberMin LvlMax Lvl
02 02 22 0120%29022
02 02 1e 0120%28622
02 02 22 0110%29022
03 03 22 0110%29033
03 03 1e 0110%28633
03 03 1e 0110%28633
03 03 22 015%29033
03 03 1e 015%28633
02 02 20 014%29022
02 02 20 014%29022
03 03 20 011%29033
03 03 20 011%29033

Numbers, like pointers, are stored in little-endian format. So 22 01 translates to 0x0122, which is 290 in base 10.

Pokemon Names

Luckily, it’s not hard to find the list of Pokemon names. First let’s look back at AdvanceMap.ini again:

152[PokemonNamen]
153inkSprache=1
154art=pointer
155vor=30B50025084CC8F7
156spiele=AXPJ,AXVJ,AXPE,AXVE
157
158[PokemonNamen2]
159art=pointer
160position=$000144
161spiele=BPR,BPG,BPE,AXP,AXV

If we look at the spiele values for these two sections, the bottom one (PokemonNamen2) is the one we want as it contains the BPE code for Emerald. The position value is different from the nach value we saw before that contained a header. This one contains a pointer itself, pointing to 0x00000144. If we read the four bytes from that address in the Emerald game, we get another pointer: 0x003185c8. At that address, we will find the start of the Pokemon Names list. Each name is exactly 11 bytes long, terminated by an ff byte. If the name is shorter than 10 characters, the rest of the space will be padded with 00. After the last name in the list is a “name” that contains only a single ae byte: ae ff 00 00 00 00 00 00 00 00 00

So, the first encounter in the table above has an index number of 290. We can find the corresponding name for that Pokemon by seeking forward 290 * 11 bytes and reading the next 11 bytes:

>>> with open('emerald.gba', 'rb') as infile:
...     _ = infile.seek(0x144)
...     pointer = int.from_bytes(infile.read(4), 'little') - 0x08000000
...     _ = infile.seek(pointer + 290 * 11)
...     name_bytes = infile.read(11)
...
>>> name_bytes
b'\xd1\xcf\xcc\xc7\xca\xc6\xbf\xff\x00\x00\x00'

ff signifies the end of the name, and the rest of the field is padded with 00 bytes, so we chop off the last four bytes to get the actual name: d1 cf cc c7 ca c6 bf

Later Pokemon games use unicode for text encoding, but back in Generation III they used a custom character map to save space. Decoding the above name with that character map gives us WURMPLE as a result.

Here is the encounter table from earlier, with the decoded Pokemon names included:

DataChanceNumberNameMin LvlMax Lvl
02 02 22 0120%290Wurmple22
02 02 1e 0120%286Poochyena22
02 02 22 0110%290Wurmple22
03 03 22 0110%290Wurmple33
03 03 1e 0110%286Poochyena33
03 03 1e 0110%286Poochyena33
03 03 22 015%290Wurmple33
03 03 1e 015%286Poochyena33
02 02 20 014%290Zigzagoon22
02 02 20 014%290Zigzagoon22
03 03 20 011%290Zigzagoon33
03 03 20 011%290Zigzagoon33

Duplicate entries can be combined to give us this final result:

PokemonLevelChance
Wurmple230%
Wurmple315%
Poochyena220%
Poochyena325%
Zigzagoon28%
ZigZagoon32%

This location does not have other encounter types for us to look at, but on maps where it’s possible to surf, fish, and break rocks, the different encounter tables work in a similar way, though the number of slots and spread of spawn chances is different.

Surfing and Rock Smash both have five encounter slots, which have the following spawn chances:

SlotChance
060%
130%
25%
34%
41%

Fishing has 10 encounter slots, ordered by both rod quality and spawn chance:

SlotRodChance
0Old70%
1Old30%
2Good60%
3Good20%
4Good20%
5Super40%
6Super30%
7Super15%
8Super10%
9Super5%

Location Names

So, we have almost everything we need. We have a list of Pokemon with their names, level ranges, and spawn chances.

But we don’t know which map it’s for.

Now, we could figure it out pretty easily on our own by going to Bulbapedia or another website that shows spawn chances, and it wouldn’t take much effort to determine that this is Route 101 based on the available Pokemon and their level ranges. But that doesn’t help us if we’re dissecting an undocumented ROM hack. So how do we get the map name?

First, let’s go back to AdvanceMap.ini again:

113[MapBankHeader]
114art=pointer
115nach=80180068890B091808687047
116spiele=BPR,BPG,BPE,AXP,AXV

Just like the last header value we dealt with, we search for 80 18 00 68 89 0b 09 18 08 68 70 47 and read the next four bytes to get a pointer to 0x00486578. At that address is a list of pointers for the different map banks:

PointerMap Bank
0x00485d600
0x00485e441
0x00485e582
0x00485e6c3
0x00485e844

Bank 0 contains most of the outdoor areas, including routes, cities, etc. Interior maps are in separate banks according to how you enter them. For example, bank 1 contains all of the interiors of Littleroot Town such as your house, your rival’s house, and Birch’s lab, while bank 2 contains the interiors for Oldale Town.

Looking back at the data for the map we’re using, we had a bank number of 00 and a map number of 10.

So let’s go to the address of map bank 0: 0x00485d60. It contains another list of pointers. The map number 10 is a hex value, so in base 10 it would be 16. The seventeenth pointer in the list (because the list is zero-indexed) is 0x00482678.

Each map data block is 28 bytes long. At that address, we find the following data:

64 bc 3e 08 c4 7f 52 08 ba bc 1e 08 24 68 48 08 67 01 11 00 10 00 02 03 00 00 0d 00

There is a lot of information included here:

BytesDescriptionExample
0-3Map Data Pointer64 bc 3e 08
4-7Event Data Pointerc4 7f 52 08
8-11Map Scripts Pointerba bc 1e 08
12-15Connections Pointer24 68 48 08
16-17Music Index67 01
18-19Map Pointer Index11 00
20Label Index10
21Visibility (Flash)00
22Weather02
23Map Type03
24-25Unknown/Padding00 00
26Show Label on Entry0d
27In-battle field id00

The only value we’re really interested in right now is the Label Index, which is 10 (16 in base 10)

Now let’s find the list of location names. Back to AdvanceMap.ini again:

164[NamenHeader]
165art=pointer
166nach=C078288030BC01BC00470000
167spiele=BPE,AXP,AXV
168
169[NamenHeader2]
170art=pointer
171nach=AC470000AE470000B0470000
172spiele=BPR,BPG

We want the location names for Emerald (BPE), so we’ll use the value of NamenHeader, which is:

c0 78 28 80 30 bc 01 bc 00 47 00 00

Search for that header, and you’ll find a pointer immediately after it: 0x005a147c. That points to the list of locations, stored in eight-byte blocks. Seek forward 16 blocks (128 bytes) and read the next 8 bytes:

04 0a 01 01 fb 0b 5a 08

The second half of that data is a pointer to 0x005a0bfb, where the actual location name is. The names in the list are of variable length, so you can’t just read a set number of bytes and will have to read byte-by-byte until you reach the ff terminator:

cc c9 cf ce bf 00 a2 a1 a2 ff

Decode this value with the Character Map and you get the actual name of this location: ROUTE 101.

And there you have it, the final piece of information we need. Now we know that the encounter table we found earlier is for Route 101. Hopefully you don’t have a headache after all that, because I certainly do.