The Shanten Parser
In which I fight man and god to get a number to show up on a mahjong table.
Background
Riichi mahjong is, fundamentally, a card game played with a different deck.
A set of tiles for riichi mahjong. Art by FluffyStuff.
Tiles are typically used for playing the game physically, a commonality with many other mahjong games. There’s actually lots of different mahjong games, many named after place of origin (riichi is also called “Japanese mahjong”; off the top of my head, I can name Chinese, Hong Kong, Filipino, British and American mahjongs, but there’s also Sichuan Bloody and probably a whole lot of others). The basic theme connecting the various mahjongs together is converting the random allotment of tiles you get at the start of the game to a hand made entirely out of multitile melds; you do this by drawing one tile per turn and discarding another.
In riichi mahjong, you have thirteen tiles in your hand, so this works out to be four three-tile melds (a run of consecutive tiles or a triplet of same-faced tiles)1 There is also a four-tile meld that you can make by separating the full quartet of same-faced tiles from the rest of your hand; for most purposes, however, it counts the same as a triplet. Technically, you can discard the winning tile, usually to try and get a better win, but it’s a very risky play for a number of reasons and usually not worth it. Riichi is full of these “You can do this unusual thing, it’s only usually a bad idea.” caveats; they’re just kind of everywhere in this game.
So, in riichi, you want to have a complete hand, and in order to get a complete hand, you must get into tenpai. There’s a concept in riichi known as “shanten”—it’s the number of best-case draws between you and tenpai. There’s formulas for it, and for standard hands it’s There’s a really nasty edgecase here that I ran into firsthand. It’s in the bugs section. Now I’m thinking about the possibility of modeling riichi hands in some sort of space that uses shanten as a distance metric and now I have to get out of here before I get nerd-sniped. For shanten is distance to tenpai, after all. 8 - 2*CompletedMeldCount - IncompleteMeldCount - {one or more pairs: 1; no pairs: 0}
. As for nonstandard hands, riichi mahjong has but two of them. These two hands can still win the round despite not consisting of four three-tile melds and a pair: chiitoitsu, seven pairs (an immigrant from American Mahjong; usually abbreviated as chiitoi) and kokushi musou, thirteen orphans (sometimes abbreviated as kokushi), which has twelve different, unmatched tiles and one, single pair, but must be made of an extremely specific combination of tile values. Each of these special hands requires their own formula for shanten; chiitoi’s is 6 - DistinctPairCount
313 - DistinctNonsimpleTileCount - {one or more nonsimple pairs: 1; no nonsimple pairs: 0}
, where a “simple” tile is a tile numbered 2-8, and a nonsimple tile is everything else. As shanten is concerned with the best-case scenario, the final shanten of the hand is the minimum of these three measurements (that is, we measure the distance along the shortest path to completion4
VRChat is a voice chat application with spatialized audio. Spatialized audio allows for voice chat meetings of large groups; the maximum number of active participants in a conversation is four people6 It has to do with theory-of-mind; theory of mind is necessary to have a proper conversation, and it seems that the maximum number of minds a human can reason about is five; themself, the three other people in a conversation, and what they think everybody else thinks they’re thinking.
A Unity project with a default Unity scene, as generated by the VRChat Creator Companion for use in creating VRChat worlds7 feat. endless falling and no ground
The objects contained within this scene are scriptable—for a variety of purposes—but as VRChat worlds are untrusted content, the scripting is not done with Unity’s built-in system of C# scripting. Instead, VRChat uses a technology known as Udon, which by default presents itself as a visual programming language (nodes connected with wires and whatnot) that is compiled to bytecode and executed by VRChat in a VM at runtime. However, there is an alternative to this—an alternative that, as far as I know, everybody uses—which is to use a program known as UdonSharp to compile a subset of C# (I will call this subset U#) into Udon bytecode. The existing mahjong game that we have purchased for modification and use is entirely U#. Thus, in order to calculate shanten, we will be using U#.
Problem
So, at this very moment, the problem seems to be fairly straightforward; the shanten formulas are pretty simple, we’ve got a subset of good ol’ C# to work with, and we’re actually building upon a preexisting mahjong game that’s relatively popular and battle-tested, at least among the niche we’re operating in. Smooth seas ahead, right?
If that was the case, there wouldn’t be a writeup, so you’re probably waiting for the catch.
It’s actually not one catch, it’s a whole stadium of them. Let’s start with the one that got me roped into this whole mess and the reason I refer to this program as a parser: mahjong hands are, by design, highly ambiguous. There’s many ways to make a game hard, and riichi mahjong primarily goes for computational complexity. This makes the game a small nightmare to program for.
Even seasoned, commercial devs, used to the particulars of game development and also building off an existing implementation, can find themselves enmired in bugs—there’s only supposed to be four of the same tile in a mahjong game, not seven! From a programming standpoint, riichi is a wily beast, full of pitfalls and edgecases and ready to completely fall apart if you are not on top of everything. There’s other computational nightmares lurking in the wings, but, today, we’re focused on shanten. What makes shanten gnarly is that that ambiguity mentioned above is geared up and ready to wreck our day; while shanten formulas themselves aren’t hard, actually counting up the pieces they use is. Let’s start with the most basic ambiguity. A run, in riichi, is three consecutive tiles of the same suit. Runs do not wrap around, so there’s seven possible runs (that is, a run cannot start at 8 or 9 of a suit, because you run into the end of the suit before you can complete the run). Let’s take ourselves an example run, say the 2, 3, and 4 tiles of the pinzu (circle) suit, or 234p, for short. For shanten purposes, this is a run, right? We add one to the CompletedMeldCount and move on, right? Well, it’s also three protoruns: 23p, 24p, and 34p. Completed melds give us -2 shanten per meld and partial melds give us -1 shanten, so is this completed run actually three incomplete runs and thus -3 shanten? Nope—a tile can only belong to a single group at a time, so this trio of tiles could be a run, -2, or a protorun with a lone tile attached, -1; to try and parse that trio as more than one protorun would require counting at least one tile as a member of multiple groups: 23p, 34p double-counts the 3p. So far, so good: there’s a pretty clear best parse here, and we only want the best parses, because shanten is about the best-case scenario. Everything else is irrelevant and must be discarded.
But let’s add another tile to the mix. Let’s say we draw a 3p. Now we have 2334p—what is this? Is this a run plus a lone tile; 234p, 3p? Is this two protoruns; 23p, 34p? Is this a protorun and a pair; 24p, 33p; or is it a protorun and a prototriplet; 24p, 33p? If these are all the pinzu-suited tiles we have, it’s all of them, for they all have the same effect on our shanten, -2.8 Where these parses differ in practice is which tiles could improve the hand, and here ambiguity is actually your friend; the more ways to parse a hand, the more ways you have to progress it, increasing how many individual tiles could improve your situation and, thus, your odds of drawing one. There is a fourth suit, honors (abbreviated A state also called “iishanten”, in riichi jargon.56 z
), but they’re difficult to work with—only making pairs and triplets and rendering you ineligible for some common win conditions—rendering them often not worth it, so not having any in your hand is pretty common. 6 - 5 = 1
—but it’s also 1-shanten as a standard hand; we could parse it as a whole bunch of runs with leftovers like 345m345m234p3p6s67s, or like 345m345m23p34p66s7s, or like—actually, let’s ask the parser what it thinks of this hand.
Nine. Nine distinct parses, all yielding the same shanten.11 A quirk of this parser is that, as solo tiles do not decrease shanten but instead negatively impact the ability of the rest of the hand to decrease it, the parser will—depending on hand structure—sometimes produce parses that don’t use the entire hand. These parses still have accurate shanten. The parser did, in fact, produce a handful of parses that did not consume every tile in this hand—leaving out, say, that 7s at the end when it was a solo tile—but I have filtered them out here for clarity. Exactly why the parser stops parsing in these situations will be explained later.
Ach! Now there’s fourteen of them!
It can get worse than that. It can get much worse than that. See, the farther away you are from tenpai, from being ready, the more tiles you have that aren’t hard locked into particular shapes. That’s more tiles that are easily able to swap around and make up different protomelds without affecting the shanten of the hand. I ran some numbers on the dataset I used to test the parser with (comprising 900900 hands), and the most ambiguous hand in there—by count of the number of parses that consumed the whole hand—was 13568m3467p35689s, with shanten 4 and 163 different parses!12 This hints at a general behavior within riichi: hands with bad shanten usually have many routes to becoming better—with all those ways to parse a hand, there’s probably a few that will improve if you accept the tile you just drew and pitch something else. Thus, hands with poor shanten will often improve relatively quickly; poor shanten within your hand isn’t necessarily a game-killer, at least at the beginning of the round.
But hold on. This is a card game, right? Do we really need to be clever? Couldn’t we just brute-force it in some way? I’ve got 4GHz in my processor, surely four billion clocks per second will get us pretty far; who cares if it takes 1ms or 10ms or, hell, 100ms per hand? All of those should be plenty fast enough for a number to update in the corner of a screen. The time we spend being clever, the time we spend maintaining it—could those be more effective elsewhere? Is the effort we’re about to put in actually going to actually improve anything?
If this were any other realtime programming environment, 100ms for updating a number onscreen would be unfortunate, but ultimately acceptable. But we’re working in Udon, and Udon is by far the worst programming environment I have ever worked in, realtime or not. I can say exactly one good thing about it: to my knowledge, nothing has escaped the Udon runtime. This is an incredibly high bar and should be high praise, but unfortunately, it’s kind of achieved through brute force; Udon’s security design is approximately equivalent to building an armored car using meter-thick steel walls—sure, no bullets are getting through, but your tires are about to burst, you can’t go over bridges, you gotta put your foot to floor just to get the thing to budge—and god forbid you so much as catch sight of a hill; your engine may just explode out of fright.13 Also you’re wondering if your lips might be turning cherry red the entire time.
It is difficult to express just how catastrophic these decisions are, because they all compound each other—and are all made quite literally ten times worse by VRChat being a realtime program that is strapped to your head: if Udon didn’t run on the main thread, its poor performance could be mitigated at the cost of extra latency by letting VRChat’s main thread continue to update and display the world while the scripts compute away in the background. The same mitigation—in a stronger form, but with more legwork—could be performed by the programmer themself if Udon allowed for the ability to launch background threads. But VRChat stalls all rendering to run Udon scripts, which themselves run through molasses14 Actually, what’s thicker than molasses? Pitch? Yeah, pitch. Cars are probably the most-common instance of this property, or at least the one you and I have most recently interacted with in some fashion. Next up is probably industrial process-monitoring software, the kind that shuts down the centrifuges if they’re doing something wacky.
So we’ve got a performance budget imposed by externalities, and it’s likely to be tight. What kind of budget are we talking, here? Unfortunately, I am not a human vision expert (I just know some slightly uncommon things), but I don’t have to be.16 yay! If you’ve never experienced this, by the way, it’s surreal; if it’s prolonged, the world itself suddenly drags along with you, instead of being in a fixed place while you move within it, like a world should. This is such a jarring experience that software responsible for feeding frames to HMDs will actually make up whole new frames if the application doesn’t give it a new frame in time for the next display cycle. Oculus’s middleware has a couple of variants of this feature, called “Spacewarp” and “Asynchronous Spacewarp”; other middleware may call it something different. This process is not free of artifacts and only has information pertaining to the difference in head position between the last frame and this one; actions taken within the program (like moving your arm around within the world) will not have their effects shown until the next frame rendered by the application itself makes its way to the user’s eyeballs. So, although spacewarp ameliorates minor performance hiccups, the motion depicted in the resulting video is less smooth than desired, as a frame depicting in-application motion or action has been delayed or has disappeared entirely. In addition, spacewarp and friends typically only work for so many frames after the last-rendered frame, whereupon the video will either freeze on the last frame delivered, or the software will give up and start displaying a different environment for the user, rendered by an application that can be trusted to keep up with the user’s head motion (eg: SteamVR default world, which will show up after ten consecutive missed frames). I haven’t seen any headsets out there with variable refresh rate, but it seems like a pretty obvious feature to me. I wonder what’s holding it back… Maybe it doesn’t work as well when strapped to a face? If you know, please write in!
Yes, “still”: if, during the course of a round, your draw brings your hand into tenpai and you haven’t taken any tiles from anybody else’s discards, you can cry “riichi”, throw a thousand points on the table, and sit back and relax with your hand locked in place while everybody else plays the game around you, every single one of them knowing you are now one lucky tile away from winning the round. Whether or not a riichi implementation knows or cares about shanten, it must be able to tell whether or not a hand is in tenpai in order to show the riichi button, unless it’s just aiming to be a space to move tiles around and let players themselves operate the rules. The mahjong game that we’re modifying is aiming to provide a pretty automatic experience, so, by necessity, it has a tenpai check in it. Unfortunately, it’s a rather slow one; it starts by iterating through each of the fourteen 13-tile subhands within a hand that has just drawn a tile and goes downhill from there. Stock, the resulting stall upon drawing a tile can be long enough to make VR middleware think VRChat has locked up, resulting in the world around you freezing into a floating rectangle inside the starry void of the SteamVR home while Udon sorts itself out.
To make this less bad, my SO managed to divide the tenpai check from one giant stall into having each of its iterations take place on consecutive frames (this is an old realtime performance technique, by the way; if you’ve ever swung a camera around in a videogame and watched stuff pop into existence as you turn, you’ve seen this in action; that’s an occlusion check with a latency of one frame unhiding stuff just after it becomes visible rather than stalling the render to find out if it should show up or not beforehand). This change was highly invasive; the existing riichi implementation was not designed to operate like this and the result was some rather ugly internals as a consequence—but it worked. Spreading out the existing tenpai check turned the game locking up on every draw into…14 frames of 20Hz video strapped to your head.19 owie Source: Personal conversation with SO, 2022-Oct.
However, there’s another problem that causes issues all throughout AozoraMahjong and is likely the progenitor of the aforementioned tenpai-checking solution that locked up the main thread: U# itself. U# has significant restrictions imposed upon it by the Udon VM it ultimately runs within—and at least one behavior (likely a bug) that cost me some hair in figuring out what the hell was going on. My initial proof-of-concept for The Shanten Parser was actually written ignorant of these restrictions, which necessitated a bit of soul-searching (do I really want to do this cursèd thing?) and then a conversion of what I had written into a form (a cursèd form, for U# corrupts all that it touches21 I think the readability of the code is ok for what it is, but what it is makes it, among other things, more difficult than usual to debug.
Oh, and about that probable bug? UdonSharp will happily compile a U# program with a 2²⁰ new[]
outside of the Start()
method without warning or error, but there is apparently a hidden cap on array sizes doled out outside of Start()
(including C#’s syntax-sugar static allocations) that will change on you without warning, rhyme, or reason. I don’t remember the exact numbers, but it was something on the order of a 6,000-element cap suddenly transforming into a 400-element cap between builds on the same machine with the same settings—when the array was supposed to have 65,535 elements. (No, that’s not overflow or typo; the shipping version of The Shanten Parser allocates a handful of one-million22Start()
).
So I think that’s all the pre-existing problems,23 Well, that affect this parser, anyway.
“I don’t think shanten can be calculated without backtracking.”
Solution
Hey, so, uh, does depth-first search count as backtracking?24
Let’s lay the jargon on the table: The Shanten Parser is a depth-first search through the space of all possible parses of a mahjong hand for the parse or parses with the best shanten. It is written in a recursive fashion, somewhat similar to a computer language parser. However, unlike most parsers25 I hope? The parses can be used for other riichi-related things, such as finding which tiles would bring the hand closer to completion, but this functionality is currently unused. It is occasionally helpful for debugging, however.
Before we proceed, I do want to make a note: nothing I say here is meant as a disparagement of the original author of the mahjong game in question, AozoraMahjong, that I’m modifying. Aozora, as a program, is struggling against an adverse programming environment and is a network game forced to work with a uniquely terrible network27 Udon networking is a level of atrocious that makes me astonished anything at all works atop it. I am unaware of anything standing between Unity and a reliable network connection; the C# used by Unity for running CPU code is perfectly capable of opening up TCP connections. Even though TCP—which does ensure message delivery—is unpopular in realtime settings due to having the potential for drastic latency spikes,57 these spikes in latency are not large enough to interfere with things like turn-based games. I am unaware of why Udon does not offer the ability to open a reliable TCP connection between players to transmit data in applications that care more about getting the data transferred than worst-case latency. Compounding the lack of objects for memory and code organization, U# also doesn’t allow partial classes, denying you even that amount of code organization. The .cs file that implements the mahjong game itself is long enough to make my VSCode installation stall so frequently while editing the file that it is unusable, and VS2022 takes several seconds just to update syntax squigglies. 8 - 2*CompletedMeldCount - IncompleteMeldCount - {one or more pairs: 1; no pairs: 0}
and goes “What we need is a parse tree!”? That’s probably not a large pool of people. I’d believe you if you told me that’s a pool of one.
So, about that parser decision—that one I have the least to write about because it was a decision made on instinct. I had been working on a little calculator bot30 It was originally envisioned as a dice-rolling bot but the calculator feature became the bulk of the code—and most of what I use it for.
Walkthrough of a Parse
So, how do we walk a hand? First, we sort it, primarily by suit (manzu, pinzu, souzu, honors) and secondarily by value (1-9; East, South, West, North, White, Green, Red). The internal representation of the tiles compresses things like suit and value into bitfields in such a way that sorting the tiles as if they were integers naturally performs this two-tiered sorting (this is a native Aozora feature—and Aozora also brought a tile-sorting function with it, which is very handy). This makes searching for things like consecutive tiles and same-faced tiles a matter of looking a tile or two over, not a search through the whole hand (is that the siren song of O(n²) off in the distance? Best not approach…).31 Alas, while writing this writeup, I found a place where I had succumbed to O(n²) and was looping through the hand on every node construction58. After fixing it, the average performance increased by 22% for a full parse in C# and 39% per parse step in U#.59
With our hand sorted, we start our walk at the first tile of the hand, index 0. For each of the six possible types of groups that this tile could start—solo, pair, adjacent protorun, split protorun, run, triplet—we first check if there’s material in the hand that could fit the bill: is there another tile with the same value for a pair? If it is not an honor tile32 Although, for logistical reasons, the honor tiles do have numbers associated with their faces and a defined sorting—an East tile is notated as a 1z—this is purely a notational and sorting aid. By the game rules, the individual honors values are unordered with respect to each other, so there cannot be a run of honors tiles, as there is no order to have three of them in to begin with.
In addition to pointers to including which tiles of the hand this node consumes, as well as the lowest available tile index, which is what we will use if we continue parsing from here (some node types, like split protoruns, may skip adjacent tiles within the sorted hand). If there are not enough tiles in the hand to make the node in question, no allocation is performed and null is returned.33 Allocating a node is performed using a stack allocator which simply increments up for every allocation and decrements by one if the freed node is at the top of the stack, counting off the available indices for the struct of arrays.60
After it has attempted to add all six types of nodes, the parser then checks to see if is over its time budget. If it is (and, as implemented, it always is, as the time budget is set to 0ms), it then skips over recursing into its child nodes to parse what children they may themselves have. Regardless of whether child nodes were further processed, a pass over the children is performed to see if this parse branch has further space to be explored or not—every child must either be null (denoting a branch as impossible or nonviable) or itself be marked as done (which will happen during construction if the node consumes all the remaining tiles in the hand) for the current parse branch headed by this node to be marked done. After that check is performed, the node is marked completed and is returned by-reference. The entry point into the parser always starts at the root node, and when it returns, it returns the address of the root node, which is what is used to check if the whole parse tree is done, using the result of the aforementioned check.
As mentioned, the parser always starts at the root node, even upon resumption. Upon resumption, when it encounters a node marked as completed, it skips over the timeout check and starts recursing into its children. Once it finds a child node that hasn’t yet been completed (that is, a child node that has not yet had its own possible children parsed out of the hand) it starts the parsing process mentioned above, using that child node as the starting point. The timer is not reset, thus including the time needed to find uncompleted nodes in the time budget for that step of the parse. Walking the tree to search for unfinished nodes adds approximately 22% to the total time consumption of a parse 34 Unless something wacky is going on, I believe the overhead is concentrated towards later parse steps, as those have more tree to walk through, but the exact curve is unknown to me, so I have listed here the number I do have, which is for the overall parse.
Once the status of a parse branch being done has proliferated back up the tree to the root node—everything beneath it has either reached the end of the hand or had all its child nodes deemed nonviable35 The existence of a node that just takes one, single tile means that there will always be a way to progress forward in a hand, unless the early-out burns down that path. This is why parses aren’t guaranteed to use all the tiles of a hand—these parses are still correct, however.
The Early-Out
The early-out works as follows: when attempting to add a new node to the parse tree, check to see if this choice of parse direction will send us down a path of no return; a path from where we can never get a better shanten than the best shanten seen so far. This check is performed using a metric called Optimistic Shanten, which looks at the existing parse and the number of remaining tiles and assumes the best case scenario for the remaining tiles. For example, for regular shanten, it assumes all remaining tiles are in three-tile melds and, if there’s two tiles left over after that, it treats them as a pair. For optimistic chiitoi, it adds in the count of how many distinct pairs of tiles are in the hand, unconsumed by the parse (on top of the existing pair count). Optimistic kokushi musou shanten, however, is the same as regular kokushi shanten; they both come from a prepass over the hand.36 Kokushi shanten is, blessedly, not ambiguous; “how many different kinds of nonsimple tiles are in this hand” is relatively easy to answer for a computer, and adding in a check for a pair is pretty simple as well.
This works due to an element of shanten’s behavior: when you add a tile, shanten can either decrease by one or experience no change; there is a speed cap, as it were, in regards to shanten getting better as you add more tiles. This maximum speed means that it is possible to irrevocably fall behind; that is, we have consumed so many tiles that don’t improve our shanten that we can’t catch up to the best-seen shanten with the tiles remaining, even if those tiles are as good as they can be. So, if the best parse we could possibly make with what we have in progress is worse than the best full parse for this hand yet, it is because we have a suboptimal parse; therefore, we mark this node as failed and try to continue the parse another way, saving us the time of walking down the rest of the hand. The early-out cuts average time usage by ~53% and worst-case space usage by ~84% in the sample data, which is not bad.
Struct-of-Arrays
This is what required some soul-searching.
This is the basic idea of struct-of-arrays; take apart a regular class definition and put each member in its own array. This changes the memory layout of the collection—in the above example, instead of [RGBARGBARGBARGBA] it’s [RRRR] [GGGG] [BBBB] [AAAA]. Depending on the access patterns in play—say you wanted to invert the red channel—the latter layout can be friendlier to caching and SIMD operations, CPU- or GPU-side. But I didn’t care about that; I figured that the cache lines and prefetching would just be thrashed by the fact that almost everything in U# gets implemented as a foreign-function call to the .NET runtime37 From within a VM running on top of the .NET runtime!
The problem—and the reason I had to dredge around in my soul—is that nothing and no one has support for struct-of-arrays. There are no debug tools for it. There are no library tools for it. The Daily WTF has an entire article against it.38 After reading the comments, it turns out that “parallel arrays” is another name for SoA, and that I am not the first to use it to get around a lack of custom object types; I didn’t think I was, not by a long shot, but I find it interesting that this is also considered an old technique for working around language limitations regarding memory organization—I turned the Wheel of Samsara on purpose and, when I wasn’t looking, it turned on me. This is really useful, honestly. In general, I try to engineer my changes like this to make the compiler do as much work for me as I can. Seriously, though, Udon probably counts as a crime against nature.61
You also don’t have language support when doing struct-of-arrays; you don’t have default initialization, you don’t have nice initializers or support for abbreviated constructor calls, you don’t have any memory management of any sort once you’ve made your arrays. Unless you’re lucky enough to be doing SoA atop autogrowing arrays, you’re also in a rather rarified situation for modern computers with having a limited amount of RAM.41 Modern machines do their absolute damnedest to pretend to have unlimited RAM. This is actually the fundamental premise of garbage-collected languages. Non-garbage-collected OSes62 will do things like slap RAM contents onto disk and kill random processes(!) to maintain the illusion of an infinite pool of RAM. Windows, for example, will actually let you allocate arbitrarily large amounts of RAM, boil-the-oceans amounts of RAM—and only start doing things to your computer if you ever try to actually touch any of it.63 64
But the traditional memory management itself—allocating and freeing—wasn’t actually the hard part; the parser isn’t long-running. It can, in essence, leak memory because it finishes long before it starts pushing at the bounds of the machine abstractions and—importantly—because it restarts cleanly. If I was, in fact, running on a 1980s microprocessor I might actually run into trouble scraping against maximum RAM with imperfect deallocation, but RAM capacity is, thankfully, not something I have to worry about—so long as I make sure to allocate enough of it on boot.42 Because any single error brings down the whole table until a whole new instance of the world is made, RAM for the parser is actually wildly overprovisioned. With freeing turned on, the maximum number of nodes used at one time in any parse in my test dataset is 3,067, but the parser allocates space for 2²⁰ nodes instead (1,024*1,024, ~a million). This number was picked to be absolutely sure that no hand in the wide, wide world of mahjong could run the parser out of RAM—and because I didn’t have the instrumentation over my test dataset then that I do now. Anyway—in native code at least, but my understanding of U# and .NET suggests this should be the same in Udon—this is 94MB/mahjong table for the nodes themselves, with another couple MB for per-parse information. What might change this number is a CPU architecture change, but I don’t have a ready .NET ARM environment handy and there is no Quest version65 of this world anyway66 You probably know where this is going. You called it!67 C# outright refuses to compile a program if you try to read from a variable it can’t prove has been initialized yet. Unfortunately for the parser, methods like struct-of-arrays mean that the variables that C# can see have been initialized—that’s the on-boot memory allocation—and there’s ones it can’t see that can become dirty, uninitialized memory. This is a hazard of writing things in array-of-structs style first; the automatic memory-initting done by the language itself means that there’s no explicit memory-initting code for you to transform. Ow.
To my surprise, when I finally did finished the transformation, struct-of-arrays actually did improve my performance, but this is an algorithm that is very allocation/deallocation heavy that was initially implemented without an object pool, so the transition to a stack allocator being a performance bump is not, in retrospect, that surprising.
Performance
Speaking of performance, let’s break out those numbers! When used, the native C# is running on .NET version 6.0.306 (x64). The U# is running in VRChat—uh, I actually couldn’t find a version number, so I’m reporting this with the Steam Build ID, 10111183; the UdonSharp version within the world project is v0.20.3, and the Unity version46 At the time of this writing, the Unity version is the same for all updated VRChat Worlds
Machine Name | CPU Make | CPU Model | CPU Base Clockspeed | CPU Clockspeed During VRChat | RAM Amount | RAM Type | RAM Speed |
---|---|---|---|---|---|---|---|
Rehoboam | AMD | 3950X | 3.50GHz | 4.20GHz | 2x8GB | DDR4 DIMM | 4400MHz |
Red | Intel | i7-6700HQ | 2.60GHz | 3.15GHz | 2x8GB | DDR4 SODIMM | 2133MHz |
September | Intel | I5-2400S | 2.50GHz | 2.59GHz | 2x2GB | DDR3 SODIMM | 1333MHz |
I did have a machine that was even more potato than an Early-2011 iMac, but VRChat would not run on it.47 EAC
Let’s start with the end result: the amount of frametime each step of the shanten parsing takes out of each frame it’s active. The following chart is from Rehoboam. It shows the amount of CPU frametime devoted to parsing shanten per-frame, sorted from shortest amount of time to longest. In this and the following discussion around numbers and performance, the individual numbers will be color-coded to make clear which machine they came from, using the colors shown in the table above.
Enlarged to show texture
A short test game of riichi mahjong generated 749 parse steps, which used an average of 0.407ms/frame while the parse was ongoing. The worst-case parse time was 0.8448ms, with a runner-up of 0.778ms. So—we did it! We came in under budget!48 …After I fixed an instance of O(n²) while writing this thing, anyway.
Well, so much for having a nice, clean graph. I wish I knew what was up with those crazy outliers screwing up my axes. Maybe a wretched task switch or something, but that kind of nasty trail-off is exactly what you don’t want in a realtime application, where maximum time guarantees are usually more important than average times. That said, we do have at least three quartiles under budget on all three machines, so that’s something. However, while September is not the kind of machine that’ll serve as a rendering server for an HMD—it was hitting 32fps inside the VRChat world in question while rendering at approximately 800x600—hook an HMD up to that thing to experience falling over instead of VRChat49 Poor thing’s running a laptop GPU from 2010, give it a break.68 My benchmark for it actually does peg a core to 100% for its duration, so it’s at least using the CPU, although perhaps it could use it better. I think it’s interesting that the HMD port of Rez is the best version of the game, as it has the best cursor control of any version—you paint targets by just pointing your controller at them. It feels direct. Prior releases are stuck with a thumbstick as an input mechanism for a 2D cursor,69 and the flatscreen version of Rez maps the ingame cursor to the movement of the mouse in a rather bizarre way70 (there’s also significant latency between mouse movement and it being reflected in cursor movement). Given the high price of an HMD, it’s rather a shame that Rez Infinite’s mousefeel is so poor, as the game otherwise runs well on what is now rather low-cost hardware. Often, enforced by wallet.
While I’m here, I want to also mention how effective the early-out is; it was even more effective than I was expecting when I wrote it.
It has also only gotten more effective over time as I have improved performance outside of resumption; the early-out cuts out late steps, and more steps taken means more expensive resumptions; the average latency goes from 10 to 14 without the early-out, a 40% increase, but the total parse time goes up by 145%—those later resumptions are really beating the crap out of the performance—oh, hello O(n²); every step deeper into the parse is more and more state that must be rebuilt. Nasty stuff, that. Still, thanks to there only being 14 tiles, the badness is at least bounded. Unfortunately, for now, it’s staying there; even though performance improvements—some spurred by this writeup itself—have made reentry a larger and larger portion of the total frametime, it seems that Udon hates my guts.53 This technology is slandering the soup.
Did you know that C# natively prints double.PositiveInfinity as ∞? I didn’t!
Reentry, in native C# on Rehoboam, adds an average 38% overhead (24.5μs/hand with no reentry vs 33.8μs/hand with forced reentry) to the total parse time in the current version of The Parser. Ow. This is almost certainly not evenly-distributed; later nodes in a parse have more tree to reenter through, after all, but this suggests that an average of 38% could be shaved off of the frametime used for node parsing. The Udon overhead is not evenly applied to the code; one performance fix improved perf in C# by 28%, but Udon performance by 38%,54 weird coincidence
This development was done on Red, so here’s where the current version of The Parser stands on that machine:
So after implementing the queue and removing another hallmark of the clarion call of O(n²), on Red, I finally got
…um. Oh, wait, it’s only working on single nodes at a time, now, with a 0ms timeout. Let’s take that up just a little bit.
Ok, now we’re cooking! Let’s crank that timeout back down a bit and see how we’re doing.
That’s a 24% speedup or so, compared to current-version with 0ms, and its frametime consumption should be a lot more controllable—check how close the 99th-percentile step time is to its budget. That’s pretty respectable! It’s time to test this in-world and—well, there’s no pretty pictures this time, I think I must have burned the logs.
Back on Rehoboam, I set the timeout to 0.5ms on a guess, and it was reporting that the shanten parsing was consuming >1ms/frame in Udon—which, at that point, had been sitting at a cool average of 0.407ms. And that was only half of it. Check out that 0ms-timeout report again. See that latency, there, of 3066 frames? That was foreshadowing. Even with a timeout of 0.5ms—more frametime than the current version of the parser used on average, a budget which it exceeded, on average, by more than twice what it was set to—The Parser took enormous numbers of frames to actually finish. What once was unnoticeable latency now became a waiting game. Often I’d be in the middle of picking a tile to discard when the shanten parse would actually get around to updating. Sometimes, I’d be sitting there, remoted into a debug session strapped to my head from almost a thousand miles away, waiting for the number to hurry up and update to be sure it was working—and as the results of the shanten parse trigger the riichi button (a crucial gameplay feature), the increased latency actually did force waiting on the parse result when a hand was in tenpai.55 Riichi button, where are you? ;~;71
I thought about trying to tweak the timeout, but making the timeout larger would only make per-frame performance even worse when it was already eating up more frametime than the current version. More frametime consumption and more latency was an unfixable combination; the supposed speed-up did not work and made everything worse. So I scrapped it and the logs are probably in the desert somewhere, just slightly to the left of all those E.T. cartridges.
There’s a good chance there’s still room for performance improvements. After all, inspecting my program logic for this writeup, after some time away from it, found performance improvements that I had missed. There might be some easy avenues towards more performance in the future—there’s some bitpacked acceleration structures that might benefit from being unpacked in some fashion, but many of them are bitpacked simply to bypass Udon’s restrictions around allocating arrays; spreading them out requires something they could be spread out into. If and when VRChat transitions to a WASM-based VM for world scripting, I will investigate. Using a queue instead of tree-walking for resumption might be back on the table, but so might any number of other things; there’s many vagaries of C# performance I am currently unfamiliar with—if the manual availability bitmasks are replaced with BitVector32s, how does that affect performance? Given that the algorithm is very allocation-happy in its current formulation, would even an object-pooled version beat out the allocation performance of the existing stack allocator? Would the overhead of simply being an object recognizable to the runtime make my performance worse? I don’t know. That’s kind of the nasty thing about languages like C#; there’s enough machinery added onto the machinery you yourself write that reasoning about behavior, especially performance-related, can require a lot of knowledge of implementation details of a variety of fields, from syntax to garbage collection.
Bugs
There are no known conformance bugs in The Shanten Parser; to my knowledge, it produces the correct result for all possible riichi mahjong hands and, under all possible inputs within the game, it cannot crash. This is at least partially thanks to crashes and incorrect results bringing down the entire mahjong table. In the case of the latter, an incorrect result will generally either softlock the table or turn a normal, desirable game function into a softlock; though the mahjong game is technically still running, it is waiting for some condition to occur that never will, such as selecting which tile to discard to make a 13-tile tenpai hand—when the task is impossible and there is no such tile. Even worse is a crash; although a softlock is resolvable by abandoning the currently-running game and hitting the table’s reset button, a crash kills all Udon execution for that object. If The Parser ever crashes, the table is dead and the whole world instance must be thrown away and a new one made to have a functioning version of that table.
Though it meets the performance target as I defined it, <1ms of CPU frametime on Rehoboam, I’m staring at that resumption overhead and feeling antsy; that overhead probably counts as a performance bug.
This is not how The Parser started out; there were lots and lots of bugs along the way, but most of them were just typos—forgetting to update a variable, using the wrong variable name. One byproduct of struct-of-arrays style was the easy ability to use uninitialized, dirty RAM; a few of the acceleration structures that had existed in the array-of-structs-style prototype used C#’s implicit initialization of ints to 0 and I had forgotten about that when I made my transposition. The AoS→SoA transition, while largely mechanical, does require special care in the constructor for reasons such as these; initialization is something that is very vulnerable to the kind of typos mentioned above—it’s largely rote, but mistakes are disastrous. Bizarrely, it was well into testing before this error was found, so there’s a lesson, I suppose—uninitialized memory will bite you when and where it wants to; temporal proximity between writing and seeing the bug is absolutely not to be relied upon.
Most of the other mistakes laid in integration with Aozora Mahjong. Aozora is state-based rather than event-based; I default to thinking in event-based terms and as a result, the internal logic flow of Aozora ran against my gut instinct for where things should be; I often found myself looking for states that simply were not there. It took three or four tries to actually have an integration that was actually wired up properly; fitting the concept of a hand update to a system without the concept of an event required tracking down all the states that result from an event that could change a hand. A bit of a lesson in how much the internal logic and flow of a thing can differ from how I’d approach constructing one; I haven’t modified much software from others before, not this deeply.
The one real intent error in The Parser itself—as far as conformance issues go—was an edge case in shanten calculation that was mentioned absolutely nowhere among any of the resources I found on the topic; it had to do with chiitoi. Chiitoi is, very specifically, seven distinct pairs. While playing the game, if you draw a new tile such that you now have a hand consisting of four distinct pairs and two triplets, then—despite having six pairs (the last two being two-thirds of the triplets)—you are actually not ready for chiitoi, despite having the otherwise-requisite six pairs in your hand (chiitoi shanten: 6 - 6pairs = 0
). This is because there are two tiles that are in your hand right now that cannot be used for chiitoi: the third members of those two triplets you have in your hand. If you were to draw another match for one of those tiles, you would not be in chiitoi, as you have four of the same tile, two of the same pair—which makes for six distinct pairs and not the seven you actually need. It took me quite a bit of staring at the hands that caused the softlocks to finally understand what was going on. This spawned a prepass that counts up how many times trios appear within the hand, specifically to catch that scenario and set chiitoi shanten back to 1.
Takeaways
You will see advice against writing stuff in struct-of-arrays format, and it’s a decent general rule; the tooling just is not there to make it work cleanly; debuggers and visualizers have no idea what you’re doing, and you kinda can’t tell them, either. The foundations of this algorithm were sketched out in array-of-structs format to begin with. If I didn’t have a foundation—built with proper tooling—to make that mechanical transition to SoA from, The Parser would have assuredly taken much longer, and perhaps not come to fruition at all. If you find yourself backed into a corner and SoA is your only way out, maybe sketch out your program in AoS form first, and then go in and make that conversion after you’ve gotten it working and debugged. Honestly, much of the code looks so bizarre that, had I not written it, I’d probably reject it on sight if somebody offered it to me. SoA is incredibly useful for certain niches beyond working in bad programming environments—if you’re often iterating over just one member of the object, having them all in one array is better for cache performance and vectorization, and sometimes necessary for things like uploading to a GPU—so it’s a shame there isn’t better support for it; a collection class would be a good first step, but it probably requires some metaprogramming to make work and/or compiler support.
That said, without struct-of-arrays, this parser could not have been built in Udon, which would have been a shame; it’s a genuinely massive improvement over the prior state of the table. The Shanten Parser means people wanting to play riichi mahjong in VRChat no longer have to brace themselves when they hear the chime informing them that it is their turn. By and large, they can just sit there and play the game with their friends and not have to worry about the performance characteristics of the mahjong table or anything else under the hood. People often give advice that is good in general without that caveat;
When you’re operating in strange programming environments, performance characteristics can be absolutely wild; I had no reason to expect using a queue to resume parsing nodes would be anywhere near as awful for Udon performance as it turned out to be.
Other people’s code—especially the inner guts of it—can be really, really different than how you would natively model the problem at hand, and it’s easy to go on a wild goose chase trying to track down an implementation of a thought that they didn’t have and didn’t need. Perhaps this is obvious to people who are used to such things, but I am not (yet).
Having a huge test suite to throw The Parser against was a lifesaver; bugfixes and regression checks are as easy as hitting F5. That said, the awfulness of U# development means that most development is actually done in separate files in this native-C# test environ, then manually copied into the U# code. Manual copying is tedious and error prone, try to make your workflow not rely on it, if at all possible.
Unity, as an engine, does not play nice with Git.
Sometimes, when you’re backed into a corner, you just gotta throw weird solutions at a problem. Sometimes you can unweird them later, sometimes not. Good luck, and may your problems not be cursed.
-
There is also a four-tile meld that you can make by separating the full quartet of same-faced tiles from the rest of your hand; for most purposes, however, it counts the same as a triplet. ↩
-
Technically, you can discard the winning tile, usually to try and get a better win, but it’s a very risky play for a number of reasons and usually not worth it. Riichi is full of these “You can do this unusual thing, it’s only usually a bad idea.” caveats; they’re just kind of everywhere in this game. ↩
-
There’s a really nasty edgecase here that I ran into firsthand. It’s in the bugs section. ↩
-
Now I’m thinking about the possibility of modeling riichi hands in some sort of space that uses shanten as a distance metric and now I have to get out of here before I get nerd-sniped. ↩
-
For shanten is distance to tenpai, after all. ↩
-
It has to do with theory-of-mind; theory of mind is necessary to have a proper conversation, and it seems that the maximum number of minds a human can reason about is five; themself, the three other people in a conversation, and what they think everybody else thinks they’re thinking. ↩
-
feat. endless falling and no ground ↩
-
Where these parses differ in practice is which tiles could improve the hand, and here ambiguity is actually your friend; the more ways to parse a hand, the more ways you have to progress it, increasing how many individual tiles could improve your situation and, thus, your odds of drawing one. ↩
-
There is a fourth suit, honors (abbreviated
z
), but they’re difficult to work with—only making pairs and triplets and rendering you ineligible for some common win conditions—rendering them often not worth it, so not having any in your hand is pretty common. ↩ -
A state also called “iishanten”, in riichi jargon.56
There’s jargon for just about everything.72
-
A quirk of this parser is that, as solo tiles do not decrease shanten but instead negatively impact the ability of the rest of the hand to decrease it, the parser will—depending on hand structure—sometimes produce parses that don’t use the entire hand. These parses still have accurate shanten. The parser did, in fact, produce a handful of parses that did not consume every tile in this hand—leaving out, say, that 7s at the end when it was a solo tile—but I have filtered them out here for clarity. Exactly why the parser stops parsing in these situations will be explained later. ↩
-
This hints at a general behavior within riichi: hands with bad shanten usually have many routes to becoming better—with all those ways to parse a hand, there’s probably a few that will improve if you accept the tile you just drew and pitch something else. Thus, hands with poor shanten will often improve relatively quickly; poor shanten within your hand isn’t necessarily a game-killer, at least at the beginning of the round. ↩
-
Also you’re wondering if your lips might be turning cherry red the entire time. ↩
-
Actually, what’s thicker than molasses? Pitch? Yeah, pitch. ↩
-
Cars are probably the most-common instance of this property, or at least the one you and I have most recently interacted with in some fashion. Next up is probably industrial process-monitoring software, the kind that shuts down the centrifuges if they’re doing something wacky. ↩
-
yay! ↩
-
If you’ve never experienced this, by the way, it’s surreal; if it’s prolonged, the world itself suddenly drags along with you, instead of being in a fixed place while you move within it, like a world should. This is such a jarring experience that software responsible for feeding frames to HMDs will actually make up whole new frames if the application doesn’t give it a new frame in time for the next display cycle. Oculus’s middleware has a couple of variants of this feature, called “Spacewarp” and “Asynchronous Spacewarp”; other middleware may call it something different. This process is not free of artifacts and only has information pertaining to the difference in head position between the last frame and this one; actions taken within the program (like moving your arm around within the world) will not have their effects shown until the next frame rendered by the application itself makes its way to the user’s eyeballs. So, although spacewarp ameliorates minor performance hiccups, the motion depicted in the resulting video is less smooth than desired, as a frame depicting in-application motion or action has been delayed or has disappeared entirely. In addition, spacewarp and friends typically only work for so many frames after the last-rendered frame, whereupon the video will either freeze on the last frame delivered, or the software will give up and start displaying a different environment for the user, rendered by an application that can be trusted to keep up with the user’s head motion (eg: SteamVR default world, which will show up after ten consecutive missed frames). ↩
-
I haven’t seen any headsets out there with variable refresh rate, but it seems like a pretty obvious feature to me. I wonder what’s holding it back… Maybe it doesn’t work as well when strapped to a face? If you know, please write in! ↩
-
owie ↩
-
Source: Personal conversation with SO, 2022-Oct. ↩
-
I think the readability of the code is ok for what it is, but what it is makes it, among other things, more difficult than usual to debug. ↩
-
2²⁰ ↩
-
Well, that affect this parser, anyway. ↩
-
I hope? ↩
-
The parses can be used for other riichi-related things, such as finding which tiles would bring the hand closer to completion, but this functionality is currently unused. It is occasionally helpful for debugging, however. ↩
-
Udon networking is a level of atrocious that makes me astonished anything at all works atop it. ↩
-
I am unaware of anything standing between Unity and a reliable network connection; the C# used by Unity for running CPU code is perfectly capable of opening up TCP connections. Even though TCP—which does ensure message delivery—is unpopular in realtime settings due to having the potential for drastic latency spikes,57
TCP does ensure that messages arrive at their destination, but it also has the sometimes-desirable property of always ensuring they arrive in order. This may not be desirable in certain circumstances where low latency is king, as if messages arrive out of order (whether due to retransmission or other network shenanigans) TCP will hold later messages in queue while it waits for the earlier one to arrive. This potential for unpredictable, possibly drastic latency spikes makes TCP unpopular for applications where latency is the enemy and later packets correct or override the missing data.
-
Compounding the lack of objects for memory and code organization, U# also doesn’t allow partial classes, denying you even that amount of code organization. The .cs file that implements the mahjong game itself is long enough to make my VSCode installation stall so frequently while editing the file that it is unusable, and VS2022 takes several seconds just to update syntax squigglies. ↩
-
It was originally envisioned as a dice-rolling bot but the calculator feature became the bulk of the code—and most of what I use it for. ↩
-
Alas, while writing this writeup, I found a place where I had succumbed to O(n²) and was looping through the hand on every node construction58
ffs
!!
-
Although, for logistical reasons, the honor tiles do have numbers associated with their faces and a defined sorting—an East tile is notated as a 1z—this is purely a notational and sorting aid. By the game rules, the individual honors values are unordered with respect to each other, so there cannot be a run of honors tiles, as there is no order to have three of them in to begin with. ↩
-
Allocating a node is performed using a stack allocator which simply increments up for every allocation and decrements by one if the freed node is at the top of the stack, counting off the available indices for the struct of arrays.60
As basic as this is, even this simple sometimes-freeing cuts peak node consumption roughly in half.
-
Unless something wacky is going on, I believe the overhead is concentrated towards later parse steps, as those have more tree to walk through, but the exact curve is unknown to me, so I have listed here the number I do have, which is for the overall parse. ↩
-
The existence of a node that just takes one, single tile means that there will always be a way to progress forward in a hand, unless the early-out burns down that path. This is why parses aren’t guaranteed to use all the tiles of a hand—these parses are still correct, however. ↩
-
Kokushi shanten is, blessedly, not ambiguous; “how many different kinds of nonsimple tiles are in this hand” is relatively easy to answer for a computer, and adding in a check for a pair is pretty simple as well. ↩
-
From within a VM running on top of the .NET runtime! ↩
-
After reading the comments, it turns out that “parallel arrays” is another name for SoA, and that I am not the first to use it to get around a lack of custom object types; I didn’t think I was, not by a long shot, but I find it interesting that this is also considered an old technique for working around language limitations regarding memory organization—I turned the Wheel of Samsara on purpose and, when I wasn’t looking, it turned on me. ↩
-
This is really useful, honestly. In general, I try to engineer my changes like this to make the compiler do as much work for me as I can. ↩
-
Seriously, though, Udon probably counts as a crime against nature.61
This is not an appeal to some imagined natural order, but an appeal to not causing CPUs to consume 1000x as much energy as they should to do any scripting at all in VRChat.
-
Modern machines do their absolute damnedest to pretend to have unlimited RAM. This is actually the fundamental premise of garbage-collected languages. Non-garbage-collected OSes62
will do things like slap RAM contents onto disk and kill random processes(!) to maintain the illusion of an infinite pool of RAM. Windows, for example, will actually let you allocate arbitrarily large amounts of RAM, boil-the-oceans amounts of RAM—and only start doing things to your computer if you ever try to actually touch any of it.63 The relevant Win32 call here is VirtualAlloc, which reserves some virtual addresses for you in your process’s virtual memory map which will then soft page fault if touched, trapping into the OS to actually grab you a piece of silicon to store your data in (and, if needed, kick some other process’s memory out onto, uh, the slower silicon74).
Also the referenced zfs ‘article’ is here and the ocean-boiling quote is at the bottom; it’s the only slice of humanity in what is otherwise a simpering, fellatious ad.
-
Because any single error brings down the whole table until a whole new instance of the world is made, RAM for the parser is actually wildly overprovisioned. With freeing turned on, the maximum number of nodes used at one time in any parse in my test dataset is 3,067, but the parser allocates space for 2²⁰ nodes instead (1,024*1,024, ~a million). This number was picked to be absolutely sure that no hand in the wide, wide world of mahjong could run the parser out of RAM—and because I didn’t have the instrumentation over my test dataset then that I do now. Anyway—in native code at least, but my understanding of U# and .NET suggests this should be the same in Udon—this is 94MB/mahjong table for the nodes themselves, with another couple MB for per-parse information. What might change this number is a CPU architecture change, but I don’t have a ready .NET ARM environment handy and there is no Quest version65
VRChat has a severely downgraded and crippled75 version of its client that runs natively on the Quest 2, an HMD that is, essentially, an Android tablet attached to a headband—just like every other standalone HMD. The Qualcomm XR2 SoC76 is ARM-powered.
We tried making one and even before trying to turn a table on, with none of the expensive effects turned on and my clock77 turned into an error shader showcase, performance was so dreadful—and the world looked so genuinely terrible, especially with its textures so horrifically crunched—that we decided our attention was best placed elsewhere for the time being.
-
You probably know where this is going. ↩
-
You called it!67
If you didn’t call it, totally pretend you did.
-
C# outright refuses to compile a program if you try to read from a variable it can’t prove has been initialized yet. Unfortunately for the parser, methods like struct-of-arrays mean that the variables that C# can see have been initialized—that’s the on-boot memory allocation—and there’s ones it can’t see that can become dirty, uninitialized memory. This is a hazard of writing things in array-of-structs style first; the automatic memory-initting done by the language itself means that there’s no explicit memory-initting code for you to transform. Ow. ↩
-
At the time of this writing, the Unity version is the same for all updated VRChat Worlds ↩
-
EAC ↩
-
…After I fixed an instance of O(n²) while writing this thing, anyway. ↩
-
Poor thing’s running a laptop GPU from 2010, give it a break.68
Also, it can’t actually boot SteamVR.78
-
My benchmark for it actually does peg a core to 100% for its duration, so it’s at least using the CPU, although perhaps it could use it better. ↩
-
I think it’s interesting that the HMD port of Rez is the best version of the game, as it has the best cursor control of any version—you paint targets by just pointing your controller at them. It feels direct. Prior releases are stuck with a thumbstick as an input mechanism for a 2D cursor,69
Though physically a 2D input mechanism, thumbsticks work most comfortably when parsed as 1D input devices into either a toroidal79 or—less comfortably—Cartesian space.80 Their small size makes input imprecise—mapping small amounts of 3D space to large amounts of 2D space—forcing, in the case of both cursors angular81 and rectangular82, low maximum speeds to produce an output space matching the small size of the physical input space. These speed caps leave the user staring at the cursor, waiting for it to align with the thing they want to point at83; compare this to a SHMUP, where one of the main objectives is to move a cursor to avoid targets; in that scenario, the game is actively making you plot out a path for your cursor with the limitations of its speed in mind—on top of the task of keeping as much of the path of the cursor within a particular subset of the total space as possible (so your own shots can hit stuff). In a thumbstick FPS, you plot a straight line to where you think your target is going to go and then you wait for the cursor to get over there. It’s not an interesting waste of time.
The game is functionally played in projectionspace, so you should be able to just use the regular OS cursor’s position instead of transforming it; I’m at a loss to explain this from the outside.
-
Often, enforced by wallet. ↩
-
This technology is slandering the soup. ↩
-
weird coincidence ↩
-
Riichi button, where are you? ;~;71
Riichi Button:84
-
There’s jargon for just about everything.72
↩ -
TCP does ensure that messages arrive at their destination, but it also has the sometimes-desirable property of always ensuring they arrive in order. This may not be desirable in certain circumstances where low latency is king, as if messages arrive out of order (whether due to retransmission or other network shenanigans) TCP will hold later messages in queue while it waits for the earlier one to arrive. This potential for unpredictable, possibly drastic latency spikes makes TCP unpopular for applications where latency is the enemy and later packets correct or override the missing data. ↩
-
ffs ↩
-
!! ↩
-
As basic as this is, even this simple sometimes-freeing cuts peak node consumption roughly in half. ↩
-
This is not an appeal to some imagined natural order, but an appeal to not causing CPUs to consume 1000x as much energy as they should to do any scripting at all in VRChat. ↩
-
Everything but Midori and anything that ran on the i43273
God I wish the i432 hadn’t sucked.85
-
The relevant Win32 call here is VirtualAlloc, which reserves some virtual addresses for you in your process’s virtual memory map which will then soft page fault if touched, trapping into the OS to actually grab you a piece of silicon to store your data in (and, if needed, kick some other process’s memory out onto, uh, the slower silicon74
Or, god forbid, spinning rust.86
-
Also the referenced zfs ‘article’ is here and the ocean-boiling quote is at the bottom; it’s the only slice of humanity in what is otherwise a simpering, fellatious ad. ↩
-
VRChat has a severely downgraded and crippled75
The removed features of the Quest port severely hamper the self-expression half of VRChat. For some, the self-expression is the whole point.
Also omnipresent.
-
We tried making one and even before trying to turn a table on, with none of the expensive effects turned on and my clock77
It’s an analog clock with an emissive shader and the reason bloom is enabled in that world. It’s also implemented as a day-long euler animation because I thought that was funny (and also because it eschews as much Udon as possible—it resyncs itself once a minute though because I have no idea what amount of accumulated error to expect).
-
If you didn’t call it, totally pretend you did. ↩
-
Also, it can’t actually boot SteamVR.78
-
Though physically a 2D input mechanism, thumbsticks work most comfortably when parsed as 1D input devices into either a toroidal79
gateriding
steering in a racing game
first- and third-person shooters
mouse cursor
-
The game is functionally played in projectionspace, so you should be able to just use the regular OS cursor’s position instead of transforming it; I’m at a loss to explain this from the outside. ↩
-
Riichi Button:84
87
-
God I wish the i432 hadn’t sucked.85
Remember that the i432 started development in 1975. In 1975 the user-based security model of computers was already known to be fundamentally broken and unusable and fifty years later every single computer is worse off for it, from the excuses it provides for destroying human agency to the horrifying performance of security solutions that actually work (Udon).
-
Or, god forbid, spinning rust.86
please don’t boot off HDDs any more I’m begging you
-
The removed features of the Quest port severely hamper the self-expression half of VRChat. For some, the self-expression is the whole point. ↩
-
Also omnipresent. ↩
-
It’s an analog clock with an emissive shader and the reason bloom is enabled in that world. It’s also implemented as a day-long euler animation because I thought that was funny (and also because it eschews as much Udon as possible—it resyncs itself once a minute though because I have no idea what amount of accumulated error to expect). ↩
-
gateriding ↩
-
steering in a racing game ↩
-
first- and third-person shooters ↩
-
mouse cursor ↩
-
87
88
-
Remember that the i432 started development in 1975. In 1975 the user-based security model of computers was already known to be fundamentally broken and unusable and fifty years later every single computer is worse off for it, from the excuses it provides for destroying human agency to the horrifying performance of security solutions that actually work (Udon). ↩
-
please don’t boot off HDDs any more I’m begging you ↩
-
88
89
-
89
90
-
90
91
-
91
92
-
92
93
-
93
94
-
94
95
-
95
96
-
96
97
-
97
98
-
98
99
-
99
リーチ!
-
リーチ! ↩