Latest Posts (8 found)
Owen Lacey 2 months ago

"Are you the one?" is free money

OK, so this is niche. One of my wife's guilty pleasures is reality TV, usually ones centred around dating - the more American, the better. By extension, I absorb some of this noise and I'm happy to admit I can sometimes get invested. At one point, she was (let's face it, we were) watching a show called "Are you the one?" on MTV. I'm going to show you how this game is pretty much free money. Consider a group of equal numbers of men & women: Each contestant has exactly one perfect match of the opposite sex that is pre-determined for them, as represented by the colours. Click the "Match" button to pair up the contestants correctly. Crucially, they don't initially know who their perfect match is. If the group can correctly guess all the perfect matches, they win a cash prize of $1M. You probably have the follow up question of how the perfect matches are calculated, which is a great question. In short: dunno, it's black-boxed, but let's just say "science"? How this is calculated isn't really the point, I could even argue that it doesn't matter so long as you get your strategy right. For what it's worth, the plot of the TV show mentions employing "the most extensive match-making process ever seen". Let's get into it. Here are the two ways in which contestants can learn new pieces of information throughout the game: truth booths and match ups . A truth booth is where a male & female are chosen by the contestants, and it is revealed definitively whether they're a perfect match or not. So there are two potential outcomes: If you've found a way to stream this and want to skip straight to the good stuff, I'd fast-forward to the fallout from these. In S1E6 it took Shanley an entire episode to come to terms with Chris T & Paige being a perfect match, even though in E1 she learned she was no match with him anyway (sigh). At the end of each episode, all contestants match up and they are informed (via dramatic lighting) how many correct matches they've got. If they've got all matches, the game is over and they win. Crucially, they don't know what the correct matches are, just how many they got in total. The only way they can definitively rule out a pairing is if they scored zero: the dreaded blackout. Though it might seem like a bad thing, a blackout can in fact be helpful in the long-term, as it gives you a definitive answer for all pairs that were matched up, it's like getting a free truth booth for each pair. Much like a high school disco, let's put all the boys on one side and the girls on the other, and re-use the pairs from the match up example above: Here we have two correct pairs red and pink at position 1 and 5 respectively. The orange man at position 2 was paired with the purple woman from position 6, and so on. How good is a score of two? Is that any better than if you were to randomly pair people up? Let's experiment by doing just that: click the 'shuffle' button to re-pick: You'll notice that the average score comes out at around 1 after a while, which this line chart keeps track of. Below is a chart capturing the frequency of each score, you'll notice it eventually converges to a specific shape. The height of each outlined bar is the probability of scoring that number in a random pairing in a game of 6 couples. Interestingly, both these probabilities and the average score stay the same no matter how many couples we use. Whatever the selected # couples, the probability stays this same. There's tonnes of tangents we could explore that you might find interesting here 1 , but for our purposes we just wanted to put some data behind "how good is a score of X". I created a model that computes the remaining viable matchings of all couples. By 'viable', I mean that there's still a chance that it's the perfect match. Initially, as you can imagine, this is a big number. The aim of the game then becomes getting that number down to 1 as quickly as possible. Each time new information is learned, we recalculate the remaining matches. For example if we have a positive truth booth result, the remaining matches are filtered out to only those that contain these two people as a pair. Conversely, if the truth booth result was negative, then the remaining matches cannot contain any where these two are paired. Imagine a huge a game of "Guess Who?" where each image is a viable matching and you flip down the options that become invalid each time you learn new information. Match ups also massively help you reduce this number, however their impact is a bit more indirect and it's very difficult for a human brain to figure out the implications of the result of one. Here is a graph of the remaining viable matches in Season 1 as the season progresses. It may surprise you that in this game of 10 men and 10 women, the initial number of viable matches is almost 4 million: Hovering over the dots will tell you what's responsible for that change in the remaining matches. As you can see, they gain enough information to win the game by episode 8, so why does it take them so long to get it right? As mentioned earlier, it's almost impossible for humans to keep tabs on all these potential matchings so it's very likely they just didn't know. That being said, the graph itself isn't particularly useful, is it? After a couple of events, the line hugs the x-axis, and it's hard to see the difference between 1 and 5,773 seen in episodes 8 and 2 respectively. Let's try a log base 2 graph: That's hopefully a lot clearer. You can see how they learn information as they go, and at which point the model 'cracks it' with the match up in episode 8. You can also clearly see that the most valuable piece of information they gained was the match up in episode 2 - with a decent early score of 4. This might be intuitive to you, but as we found earlier you've got a less than 2% chance of scoring 4 when randomly selecting. Let's plot this again along with a few more seasons 2 : Other than S3 and S7 , the contests mathematically learn enough information to win the game with time to spare. Could they have got there sooner though? Could they have chosen better truth booths / match ups to spare us all of the extra episodes of trashy TV? Before I get into this, I need to cover some basics of information theory. We're going to revisit the "Guess Who?" game now, which you can think of as a simplified version of "Are you the one?". Stick with me; the idea is that we can use the more straightforward game mechanics to establish an information theory based strategy that we can then apply to "Are you the one?". These two games are similar in that: Consider an 8x8 grid of potential answers: Now I'm a terrible artist so I thought I would be able to articulate this more clearly with shapes instead. There are 4 shapes ( , , and ), 2 different types (opaque or outlined), and 8 colours - this makes 64 unique combinations. The aim of the game is to guess the correct answer before your opponent guesses yours. To give yourself the best chance of winning, you need to rule out as many answers as you can, as quickly as you can. Should you then employ a strategy that splits the potential answers in half (e.g "is it opaque?"), or something a bit more specific (e.g "is it an orange star?"). The latter is high-risk, high-reward, whereas the former will almost always rule out half of the remaining answers. Consider a bit of information as reducing the problem space by half. That is, by ruling out half the remaining answers. I want to stress that the word bit is a common term in information theory, as opposed to something that might sound less exact as it's intended in this context. The opaque question is a sure-fire way of gaining 1 bit of information. On the other hand, let's say you find out that the answer is a which allows you to flip down three quarters of the answers, that's the same as halving the problem space twice and therefore gaining two bits of information. In this example the answer is : As you can see, different answers are more useful than others. "Opaque?" rules out half of the remaining answers (1 bit), whereas " Blue ?" rules out 7/8ths of them (3 bits). Getting from 64 potential answers to 1 involves halving the problem space 6 times - 64 becomes 32, then 16, 8, 4, 2 and 1. In other words, if you're able to gain 6 bits of information, you'll know for sure what the answer is. This is supported by the fact that the sum of the information gained by asking all three above questions is 6. Let's simulate an actual game now, keeping tabs on the information gained throughout. Once everything but remains, you'll have gained 6 bits of information and can be 100% confident in the answer. Now we know we need to get to 6 bits of information as quickly as possible, our strategy becomes picking the question that we expect to give us the most information. That is, the sum of the information we would gain if that answer were true or false, multiplied by the probability of that specific outcome. Let's work through our three questions to give the expected information for each: This table shows the expected information for each of our 3 questions. As you can see, the more "Hail Mary" the question, the lower expected information. " Blue ?" comes out at 0.54, which is almost half the amount of expected information as "Opaque?". Therefore, we can speculate that a decent strategy for this game would be to ask questions that split the remaining problem space in half. To support this, we can plot a graph 3 for all possible probabilities between 0 and 1: This shows that splitting the problem space in half (where the probability is 0.5), gives the highest expected information. This means that asking a very specific question like " Blue ?" is statistically the worst thing you can do. Let's play one final game, this time I'll show you the questions ordered by most to least expected information: How did you do? You'll notice that picking the questions at the top of the list gets you to the answer quicker, whereas the opposite is true when picking from the bottom. You'll also notice that you're never presented with a question that gives you more than 1 expected information, which is backed up by the above graph never going higher than 1. Now we've got a strategy that works well for "Guess Who?", we can get back to the proper game. Earlier on, I posed a (until now) rhetorical question as to the performance of the contestants on the show. In order to answer this question, we need two things: A way to measure performance: For this, we'll use the average bits gained per event . That is, each time there is a match up or truth booth , how many bits of information did they gain? A sensible benchmark: How do the contestants stack up against something that employed a strategy of randomly selecting match ups and truth booths ? For this sensible benchmark, I simulated over 100 fake seasons of "Are you the one?" to see how much information was gained if the match ups and truth booths were selected (almost 4 ) arbitrarily. The performance of the random simulated models was . Let's plot all the simulations on a graph, with trendlines for random and actual performance: So the actual performance hits the x-axis sooner, meaning it's able to zero-in on the perfect match earlier. That's reassuring, right? Maybe love is real after all. That, or they're just performing better than someone shooting fish in a barrel. Here's the numbers behind this comparison: The success rate is calculated as the number of seasons in which they're able to mathematically determine the perfect match before the game finishes. As you can see the success rate for the random simulation is higher than in real life. The sample of size of only 7 seasons of "Are you the one?" undoubtedly is too small for this to be a useful comparison. Now that we know the contestants make better decisions than randomly selecting pairings, the remaining question is exactly how much better. To show this, we'll employ our information theory strategy that we used for "Guess Who?" to this game. This simulation works similarly to the random simulation, only the mechanism for selecting pairings is different. That is, the pairings that are selected for either a truth booth or a match up are the ones that are statistically likeliest to give the most information. Suppose we have calculated the expected information gained by potential truth booths like below: The model would therefore pick and as it's the most likely to give it the most information. Match ups work similarly, however we know that it's not a simple true or false question. Instead, we've got to calculate the information we would gain for every score between 0 and 10 (where 10 is the number of couples), for every viable matching. I ran this information theory simulation 41 times (for no other reason than I got bored waiting), and saw it perform significantly better than random simulation or real life data: Now we can compare all three scenarios: This means that, all you need is a bit of code and a can-do attitude to perform better than the "vibes" approach of the contestants in the show. Before you pop the champagne, we still haven't shown if this is good enough such that we get to the perfect match before we run out of time (or episodes). In a game of , the problem space is (for brevity, you can take my word for this), which is bits of information. This means you would need to gain bits of information per event minimum to ensure that you go into the final match up knowing for certain what the perfect match is. Wait, isn't that a lower number than the random simulation? Doesn't that mean that someone shooting fish in a barrel could win this game? I should stress that these are averages , and in 26% of random simulations they didn't get to there in time. Hopefully now you agree with me that "Are you the one?" is free money, albeit with a just about near-perfect success rate. I showed that even picking pairings at random will more often than not give you enough information to win the game, as well as showing how to use classic information theory practices to get you there with episodes to spare. Maybe this haemorrhaging of money is what got the show cancelled in the first place, or maybe love is real, whatever you prefer. This post is my first foray into content like this. I wanted to scratch the itch of an interesting maths problem, with a light-hearted spin that I hope you enjoyed as much as I did making it. The techniques shown in this post are very common information theory approaches, though I was inspired to apply them based on this video on wordle by 3Blue1Brown. I very rarely watch youtube videos over 10 minutes long (maybe that's my loss), but I wholly recommend this one if you found this interesting. Other than that, in my research I came across a boardgame called Mastermind, which has been around since the 70s. This is a very similar premise - think of it as "Guess Who?" on hard mode. I also pitched this idea to The Pudding , and had a great experience with them nerding out about this subject. Though they didn't take my up on my idea, I left with really great and actionable feedback, and I'm looking forward to my next rejection. Next steps for me would be to see if I can make a web-based game (don't hold me to this) on this theme. I'm interested in how people would intuitively make decisions based on information gained so far so the plan would be to see if I can find a way to capture that, and ideally make it fun. Finally, the code for my OR Tools model can also be found here . As the number of couples increase, the probabilities trend towards a poisson distribution with λ=1. The probability of 0 and 1 is also given by 1/e, which is a classic result in derangements , specifically with the "hat-check problem". ↩ I omitted Seasons 2, 8 and 9. Each season that wasn't considered was due to them introducing different game mechanics, which would have been hard to take into account for my model. Maybe my model was too rigid and I'm a bad developer, or maybe it's just inappropriate to find commonality there. Season 2: One female contestant had two perfect matches, meaning there were two perfect matchings. Season 8: In this season, they introduced gender fluidity. Whilst an interesting problem on its own, this would have wreaked havoc on my model. Season 9: One of the contestants left the show at an early stage, so the decisions made by the contestants would have been biased. ↩ This is known as the binary entropy function . ↩ I say "almost" here because I wanted this simulation to have some common sense. Specifically, if a pair were to have an unsuccessful truth booth , then it wouldn't be paired up for any subsequent events. My reasoning here is that no right-minded person would ever pair up people who can't be a match, as you would learn nothing new, and crucially it wasn't too arduous to code this into my random simulation model. ↩ There is a correct answer unknown to the player(s). The player(s) are able to learn information by offering up hypotheses, and getting definitive answers to them. A way to measure performance: For this, we'll use the average bits gained per event . That is, each time there is a match up or truth booth , how many bits of information did they gain? A sensible benchmark: How do the contestants stack up against something that employed a strategy of randomly selecting match ups and truth booths ? As the number of couples increase, the probabilities trend towards a poisson distribution with λ=1. The probability of 0 and 1 is also given by 1/e, which is a classic result in derangements , specifically with the "hat-check problem". ↩ I omitted Seasons 2, 8 and 9. Each season that wasn't considered was due to them introducing different game mechanics, which would have been hard to take into account for my model. Maybe my model was too rigid and I'm a bad developer, or maybe it's just inappropriate to find commonality there. Season 2: One female contestant had two perfect matches, meaning there were two perfect matchings. Season 8: In this season, they introduced gender fluidity. Whilst an interesting problem on its own, this would have wreaked havoc on my model. Season 9: One of the contestants left the show at an early stage, so the decisions made by the contestants would have been biased. ↩ This is known as the binary entropy function . ↩ I say "almost" here because I wanted this simulation to have some common sense. Specifically, if a pair were to have an unsuccessful truth booth , then it wouldn't be paired up for any subsequent events. My reasoning here is that no right-minded person would ever pair up people who can't be a match, as you would learn nothing new, and crucially it wasn't too arduous to code this into my random simulation model. ↩

0 views
Owen Lacey 4 months ago

When vibe-coding went wrong

Recently, I've been coding more in my spare time. It's becoming dangerously close to what you might describe as a hobby, and is one of those things where the more stupid side-projects I pick up the more ideas for stupid side projects I have. Github copilot has been both a blessing and a curse for this; it's enabled me to mess around with things absent-mindedly, chucking things together with barely any forethought. Without it, I'm confident I'd barely do any coding in my spare time. Do you just not enjoy coding? I love coding, but it's very "big brain", and after a day of braining I want to explore. Burnout is real, so if I wanted to keep doing this I needed to set myself some ground rules: If the answer to either of those ever became "no", play-time's over. Laptop shut. We go again another time. I'd been wanting to build on my FPL tool for ages (the first iteration of which I wrote about in my maiden blog post here ). To give a brief bit of background, I use Google OR Tools to pick a team that maximizes the points the players are predicted to score. I love this side-project because it combines my love of maths and football, and enjoy doing the odd tech talk on the subject and meeting fellow FPL nerds before/after. The problem is that the FPL API's expected points (XP) model performed poorly for me last season, stupid robots! I thought I'd give making my own models a go. Now, I'm not a data scientist by any stretch. In my tech talks I joke that I still think python is a species of snake, which gets one or two polite chuckles at best. There was absolutely no way I'd be able to learn how to develop my own machine learning models in my spare time, so I thought I'd try a bit of vibe-coding to see what I could come up with. Initial progress was amazing. I very quickly "vibed" (hate myself) with Claude models, maybe the excessive emojis gave me the seratonin hit that kept me going. Its hyper-positivity made me think I'd stumbled upon an absolute gold mine. In a single evening, I developed a high-performing suite of ML Models that could relatively accurately calculate a player's XP for a given gameweek. I used almost my entire copilot premium request quota for the month in a single weekend. Ground-rules 1 & 2 very much being respected so far. From the title of the post, you know that 💩's gonna hit the 🪭, and that's exactly what happened. Fast-forward to the present, and I've not touched this side project in months, and can't see myself picking it back up - though I'm tempted to start from scratch ("don't do it, Owen!"). Now, let it be known that I'm definitely pro-AI, so keep that in mind as I rip it to shreds for the remainder of the post. The aim of this post is to be a cautionary tale, and hopefully articulate some lessons learned. If I were to describe vibe-coding to someone who's never used it before, I'd say it was like pair-coding with a super-eager junior developer with an astronomical velocity. The only difference would be that its skillset far surpassed mine (especially in python) so it was like a senior developer's know-how with perhaps a less-experienced developer's, well, experience. I would tell it "the sky is pink" and it would respond with "You're absolutely right!"; it's tendency to challenge me was very small, which is dangerous when you haven't got a clue what you're doing. Over-correcting quickly became a massive issue for me; I would tell it I liked README files so I can use that documentation later. Before long it felt like I had more files than files in the repo, not exactly the time-saver I was looking for. It coded extremely defensively, and the code was littered with feature switches and fallbacks as if I were supporting thousands of users that required backwards-compatibility. As I fixed the issues, I found more and more hidden false-positives that I'd need to take note of to come back to later. As the code became more and more branched it became impossible to debug, and if it was hard for me you can assume it also became hard for the model. The worse things got, the more trigger-happy I got with the enter key. This quickly snowballed and my velocity plummeted. I heard copilot performs better if you're mean to it; this sadly wasn't the case for me. I'd gone from building multiple ML models in a single evening, to spending an entire weekend (well, when the two year old was in bed) trying to retrain the models based on what I thought would be a small change. Technical debt is real, copilot had made it so the entire lifespan of a project, life and death, could be condensed into a much smaller time period. At the very least this has been therapeutic for me, but how can I avoid stuff like this in future? I want to keep coding in my spare time, but ideally I'd have something to show for it. If I were to summarize my lessons learned, I'd probably say: I hope this hasn't dissuaded you. I've vibe-coded since with more success, and I'm sure I will do in future. The side-project in question is here . Take a look for yourself, warts and all - it's an absolute state. Let me know your vibe-coding pitfalls, I'd love to hear them! Am I enjoying myself? Am I making progress? Start small : get an MVP working end-to-end as fast as you possibly can. Maybe don't start with 6 pristine ML models all at once. Copilot tends to recommend waterfall-style approaches, so be wary of the plans it generates and ask yourself what's going to bring value earliest. Micro-manage : the less your hands are on the wheel, the harder it's going to get. It's very easy to just go "yeah yeah whatever" as you get less interested. I would sooner stop before I started spamming 'Continue' in future.

0 views
Owen Lacey 6 months ago

Micro-managing my time

I've been at my new place for 3 months or so now and I'd say I'm pretty settled in by now. Over the last few months I've prioritised building relationships with my new coworkers whilst being helpful where I can. Whilst feedback has been good, I've struggled to switch off at the end of the day. The amount of unknowns you have when starting somewhere new is difficult to ignore, and I found myself losing sleep over all the loose threads that weren't tied up at the end of the day. Additionally, my morning standups were a chaotic stream of consciousness as I attemped to remember what I did the day before. It got to the point where I wasn't happy with how I was getting on. One thing I hadn't even remotely nailed was a decent system for organising my notes, TODO list or tracking my time, the latter of which I'm going to go into detail on here. I'm not advocating or dissuading anyone from using a particular tool, it's more the process (or lack of it) that was causing me these problems. There is a happy ending here, and I'm a few weeks into micro-managing my time, and the effect has been huge both in and outside of work. Before I go into what I did, I'll add a bit more context. I work for a consultancy, so I'm already comfortable with the idea of tracking my time and why that's important to the company I work for. I'm in a fairly simple position where I'm 100% allocated to a client, and my rate is the same no matter what I'm working on. So long as I'm doing "stuff" and being helpful, we're all good. Here was what I wanted to record: Tracking everything helped me focus on a single task at any given time. If I felt like I didn't want to record it, that's usually a good sign I shouldn't be doing it. I decided to use notion for this, and like a true engineer I put together this ERD: Recording my time in this way allows me to report on how much time I'm spending on each responsibility (via activities) or topic. It also shows me how many different plates I'm spinning on a given day; I learned I'm doing at least 10 different things every day, and very rarely spending more than an hour on a given task. Additionally, I'm spending over 50% of my time doing "Project day-to-day" activities, and only 10% on "Project big-picture" sort of stuff. I reckon I can kill two birds with one stone here and start carving out larger chunks of time in my calendar for focus time with the aim to get the "Project big-picture" time closer to 25%. Now I've got a robust way of tracking my time that I actually enjoy doing, I'm confident I can now make that happen. ⏱️ Timesheets: Everything over ~15 minutes gets tracked. 🔨 Activities: Timesheets are assigned to things I do regularly e.g standups, coding, 1-1s. I'm already up to over 50 of these. 🚩 Responsibilities: Each activity belongs to a bucket that I believe are my role responsilities: "Project day-to-day", "Company big-picture", "Line Management" to name a few. Each of these responsibilities has a description to help me decide what's a most appropriate fit for an activity. ⚡️ Topics: Timesheets are also related to optional topics, for example a specific feature my team is trying to deliver.

0 views
Owen Lacey 8 months ago

I made my website show what I'm listening to on Spotify

I've recently found myself as a lead developer on a Next.js project. There's a .NET API that acts as the backend, which is much more within my comfort zone, but I'm definitely finding myself on the less-prepared side of things on the frontend. Server-side rendering has always been a bit mysterious to me: I struggle to properly grasp the benefit of something until I've got a use-case that's meaningful to me. With quite a lot of my experience being for business-facing enterprise web apps, the benefits Next.js adds are usually fairly low down on the priority list, so it was easier for me to dismiss it as just the latest fad in web development. Until recently. Recently, something clicked. Fairly often, I'd have have an idea for a personal project and due to my experience as a full-stack dev it's not long before I start thinking about things like databases & integrations. This is what usually stops me; when an idea becomes multiple executable applications, not to mention their own CI/CD requirements, it feels like too much hassle for me to do in my spare time. It was only when I considered a Next.js application like two separate applications, backend and frontend, rolled into one I immediately found a use case that I was excited about. I thought it would be cool for my website to show what I'm currently listening to on spotify. From their API docs it seemed like a relatively straightforward integration. When playing around with their sandbox I was horrified to learn that Five Little Ducks had made it into my top 5 songs of all time (don't have kids). All I needed to do was generate a client id / secret and start messing about with it, the only problem is I don't want to expose any secrets in my website. In this case, it's quite easy to find the secrets by checking the network tab for network requests to the spotify API: Now, I can't imagine you're going to do much more damage to my music library than my two-year-old currently is, but it still isn't good practice. I decided I'd migrate my React application to Next.js so that this integration could take place entirely on the server and I could keep my secrets just that: secret. This guide for migrating my app to Next.js took me about 30 minutes, the only obstacles I had here was fixing eslint errors introduced when following the guide. Because my CI/CD pipeline just does , I had my website migrated over to Next.js in no time. I've been developing long enough to know how rare it is for a migration to go this smoothly. Job done, right? No. Just because I've migrated my app to Next.js doesn't mean the necessary parts are server-side rendered. Indeed, those pesky token requests continued to happen because I bundled my whole in the client using the directive. I reached a fork in the road here where I felt I needed to choose between the following: I decided on the latter because that pretty much solves my problem entirely without rearchitecting my (albeit small) application. Despite most of my website being static text; I reckon rendering all that on the server-side sounds like a problem for future Owen. Next step was to make a http request to the next.js backend for it to make an onward authenticated request with spotify. I create the following file at : Not so fast, computer says no: ⨯ API Routes cannot be used with "output: export". See more info here: https://nextjs.org/docs/advanced-features/static-html-export Indeed, removing the in my file sorted the issue. As much as I'd love to tell you I had some well-founded reason for doing this other than just to get it to work, I'm just trying to prove the concept at this page: Now if my years of software engineering have tought me anything it's to be wary of making changes to files containing the word , so I pushed the commit and checked I could still deploy the site. I had already checked I could deploy since the next.js migration, so this would prove that this change to the config file caused any potential issues. My hunch, sadly, paid off: Serves me right for blindly changing things without understanding the implications of doing so. So what did I actually do, and how did this break the site? Let's compare the output directory when removing this from the config: LHS is with , RHS is without. Crucially, the left hand side has an , which explains why previously the site would load without a . Adding the following file resolved this issue for me: Hey presto, the API route is working on production: Now I can replace this hello-world endpoint with an implementation to the spotify API, and my client-side just call my new API route without exposing any secrets: .> Edit: I noticed that the API route would never change when I skipped the song on spotify, which wasn't ideal. Looking into it, I found that it was because I wasn't using Dynamic APIs , and adding `cache: 'no-store' fixed the problem! Now I've got a working API route, with secrets nicely hidden. All that's left is to throw together some questionable CSS and get something pretty showing on my homepage. I won't go into the weeds of that, but you can take a look for yourself : Move the boundary between server and client to a higher level so I can be pickier about which components are rendered where. At the point of writing, the only responsive part of my application is related to the floating objects (see here , and the 'fun fact' shuffle thing; the rest is just static text. Use server actions to make an authenticated request with spotify under the hood without exposing the configured secrets.

0 views
Owen Lacey 8 months ago

I made an AI Sudoku Solver

I regularly use Sudokus as a Hello World use case for what you can use Google OR Tools for. I decided to take it a few steps further to see if I could create an end-to-end solution for solving a Sudoku in the quickest way possible. I wanted to expand my OR Tools Sudoku solver into an AI pipeline, where the OR Tools implementation is one part of a string of events to solve a Sudoku pretty much instantly. I'm by no means a trailblazer here. If you google "Sudoku solver" or similar, you'll see loads of results that are all pretty much doing the same thing. In fact, the code for parts of the problem are (shamelessley plagiarised) based upon this github repo . I would also recommend watching this youtube video where the author goes into detail about how he approached this problem, which I found really interesting. You can find the code for my implementation here , feel free to try out the app for yourself as well here ! Here's a quick demo in its current state: I decided to break my problem down into three stages: Breaking it down like this allowed me to solve each part in isolation, using the most appropriate solution for each stage. Here's a visual breakdown of what I'm doing in each step: At the end of this stage, the aim is to have a region representing a sudoku. I could have used many different approaches here, and it might surprise you to learn that I didn't even go with an AI-based solution for this stage. I tried to train my own object detection model, but it performed really poorly. As obvious as it seems in retrospect, the only thing any two sudoku puzzles have in common is the 9x9 grid containing the digits for that specific puzzle. I wouldn't describe myself as an expert in training object detection models so perhaps I could have persevered here, but the amount of false-positives I got from my model made me crawl back to the drawing board. This was a nice refresher for me that AI isn't always just magic, and sometimes pulling your sleeves up and doing it manually is the best (and possibly most rewarding) approach. That being said, a manual approach has its drawbacks as well. For example, my approach doesn't handle sudokus in dark theme, nor would it be able to find a sudoku if its border happened to be dotted or dashed. It works brilliantly on my £1.99 puzzle book I bought from Asda, and that's good enough for me. Now, into the approach. Put simply, we need to find the biggest square in a given image. This doesn't need to be a perfect square, as it would be practically impossible for a human to take a photo of a sudoku such that the area it takes up on the image is perfectly square. We'll therefore add tolerances to sanity check what the application thinks might be a square rather than checking that the exactly. To find our square, we'll iterate through the pixels on the image. For each pixel, we'll return all the other pixels which are "connected" to it i.e there exists a path between the pixels such that all pixels along the path are the same colour (the same hex code). Our square is the largest connected region that lies within the tolerances mentioned above. This presents a problem in that, to the naked eye, two pixels that look the same colour are very likely to have a different hex code. We therefore need to apply a process called adaptive thresholding , which is where we reduce our image into one such that each pixel is one of two colours: white or black. This polarisation of pixels allows us to find a collection of points of all the same colour on the thresholded image that are most likely to contain a sudoku. Here's what an image of a sudoku looks like when adaptive thresholding is applied: Looking at this image, it's easy to see what the largest number of white connected pixels would be, and that's exactly what our program was able to find. At this point, we have a region containing a sudoku. However, there's a need for this region to be square so we can easily divide it into smaller squares where we can expect to find a digit. If we were to subdivide this image now, it would look for digits in the wrong place. Here's an image of me sub-dividing this regions manually to illustrate that point: We need to transform our image into a square. Specifically, this needs to be a homographic transform . Once we have our transform, we can construct a new image that's perfectly square and ready to sub-divide. The first step is to find what we think are the corners of our sudoku. To do this, for each corner, we find the closes point to it using manhattan distance . I added a bit of padding to the region to illustrate how manhattan distance works: We need to map these four points onto the corners of a square. This is where we can get pretty deep into the maths of it all, but for brevity you can determine the homographic transform using this guide (note, I used mathjs for the matrix operations). Once we've got the transform we can show our sudoku as a square, ready for further processing. Now we've got a Sudoku, we need to be able to say what existing digits are there so we can solve the remainder of the puzzle in stage 3. Firstly, we need to find which cells contain a digit so that we can process the sub-image and determine what digit is in the cell. For this stage, I went with a pre-trained tensorflow model . Once sub-divided, we can re-use the code in stage 1 to return the largest connected region of each cell. We can again use tolerances to ensure we flag a cell as containing a digit if the found region is of a large enough size (again, no AI yet). At this point, all the AI can take place in the browser thanks to the magic of tensorflowjs . Now, this model isn't mine, it was made by the same person who I linked in the introduction of the post. One thing that surprised me was how lightweight an AI model can be. So lightweight infact that you can even bundle it in your application and use it directly in the browser, and that's exactly what I did: If you didn't know already, you can find a heap of pre-trained models online in TensorFlow Hub . I tried a few models which matched the search term "digit recogniser", but this one worked the best. Right, I've used the AI buzzword now - I can get onto my favourite bit: OR Tools. To save me re-introducing it every post, I give a summary of what it is here . Because there is always exactly one solution to a Sudoku we can't brand this as an optimisation problem, but it's a great example of finding the single feasible solution from a high number of infeasible ones. Now we have a digital representation of a sudoku, I can create a variable for each blank cell with possible values of all integers between 1 to 9, and with just a few more lines of code, let OR Tools do the rest. Circling back to what I said at the top of this post - this thing needs to be fast . Here's how quick it is for an expert level sudoku: What started as a simple "Hello World" demo with OR Tools turned into a really rewarding journey in a completely new area for me in image processing techniques. My main takeaway is that just because you can use AI to solve a problem doesn't mean you should - manual methods still have their place, especially when they just work. It’s definitely not perfect, but it was great fun to build—and I learned loads doing it. If you want to try it out yourself or poke around the code, you can find all the links you need at the top. Thanks for reading! Sudoku detection: given an image, return at most 1 Sudoku present in the image. Digit OCR: sub-divide the Soduku into the 81 cells that make up the 9x9 grid, and run OCR to find what digits are already there. Send the unsolved Sudoku to Google OR Tools and show the solution to the user (ideally with a sexy CSS animation).

0 views
Owen Lacey 10 months ago

Switching off my consultancy brain

Last week I did something scary. I left a job that I loved after 8 years for something different - a fresh challenge. From what I hear, it's pretty common for people to take some time off work between jobs by using up remaining annual leave or taking some time unpaid. I'm no exception here; I'm currently half-way through my self-proclaimed "micro-sabbatical" of 10 whole days, and it's brutal. During my notice period, I built up a list of things I wanted to get done during this week-and-a-bit. Like all good software projects, this list is a backlog of musts/shoulds/coulds. I "must" make a plan for Mothers' day, I "could" fix the under-counter lights in the kitchen. My wife and I are blessed with a bouncy two year old (Reuben), and we blessed ourselves almost 7 years ago by getting a dog (Toby). Though very much self-inflicted, these dependents meant my free time was suddenly at a premium. I'm sure you don't have much sympathy for this situation (I don't either - stop whining, Owen); I've loved taking Reuben and Toby on adventures this week, but I took me less than half a day into my micro-sabbatical to realise this list isn't going to get done. After 8 years of deadlines where time is king, I felt myself wanting to update someone on my progress to manage their expectations. I was half an hour late picking up my wife from work on Monday for the simple reason that I just wanted to have a cup of tea that I hadn't given myself chance to have yet. Yesterday, after a morning of manual labour in the garden, my hairdresser said I looked shattered and prescribed me a nap when I got home. Did I take a nap? Don't be stupid. I hope no-one from my new work is reading this, but I'm arguably working harder than I was last week when I was getting paid for it. There's a difference between "free time" and "alone time", and this has become a chasm since having kids. My wife said herself that her productivity in her free time has massively increased in the last two years; she's like a whirlwind the second he goes down for a nap. It's almost like you need to have time taken away from you for you to realise how you want to spend it. Today is Wednesday and I'm going to try take things a bit slower. Things like writing this post is a good example of a "could" that I'm proud of myself for doing. In fact, this wasn't even in the backlog, talk about scope creep! This weekend I'm going to London to visit some good friends ahead of starting at my new place on Monday. Between now and then, I hope to indulge myself further.

0 views
Owen Lacey 10 months ago

I over-engineered my website's personalisation

A recent milestone for me is finally putting together my personal website . Though I'm not expecting this website to be much more than a pointer to where else you might find me, I wanted it to show my personality and to give some examples of things I find inspiring. If you've checked out the website, you'll have noticed the floating icons distracting you from from the content I've put on the site. I don't care, they're far cooler than anything I would have to say. These, if you didn't know, are called penrose triangles : These triangles are impossible in euclidean geometry, and are the sort of optical illusions that might remind you of the artist Escher's work. For me, they are the simplest way to make your head spin using just a pen & paper; if you've spotted me doodling in meetings (me? never!), I'm probably drawing some of these - I think they're brilliant. Enough nerding out. After all, the point of this post is to go through how I added them to my site. More specifically, how I made them do that stuff with pure CSS. Here's a quick overview of my approach: I wanted the icons to idly float around the background of my website in random directions. At the same time, I didn't want my website to be full of loads of hacky javascript code, so I thought it would be fun to try and do this in pure CSS . Without the benefit of hindsight I decided to name these icons s; something I now regret. Regardless, it reminds me of one of my favourite quotes: There are only two hard things in Computer Science: cache invalidation and naming things - Phil Karlton With my consultancy hat on, here was my acceptance criteria: First, I wanted to prove the concept. I needed to prove that I could apply custom behaviour to each icon without writing massive amounts of duplicated CSS. I know that I have tabs on the (a unique identifier) when iterating through the s, but how do I pass that through to the styles? I'm not much of a front-end developer, but I like using the BEM methodology for styles so that my DOM looks like the below: Now that each has its own unique class name, I can style them differently. I can also use so that I don't have to duplicate the styles for each one: This means that each consecutive gets bigger by , like so: Now we know we can style them differently, that means each can have its own animation . Let's start small; simply moving up and down: This means that each consecutive moves down 10 more pixels: This was the final thing I needed to do to prove the concept. Now to actually make it look good... I decided to use mixins to keep the styles within the class declaration nice and succinct. Here's a for random positioning: This uses random to give each a random and . This dotted the s around nicely like so: Here's the code to make the s spin: I went with this because: Here's how it looks, we're getting there: Finally, I wanted them to float around like little lost asteriods. I decided to make them take a little journey and return to the original place whilst rotating: All the magic numbers in the CSS are based on what felt right when playing around locally with it. Since this was a bit of fun, and I'd sunk plenty of time into this already, I decided this was good enough to chuck into production. As you saw above, my SASS code is pretty questionable, but here's how much function component shaped up: Told you! No hacky javascript 😎. I needed to have a and so that they could have the and animations respectively. Here's how my site looks without those styles imported: Pretty boring right? Lastly, and most importantly, I need to consider people accessing my site with accessibility requirements, so I wrapped the CSS properties in the following: As a treat for getting to the end, why don't you try triple-click on my homepage and see what happens... I made the triangles myself on procreate. I enjoy messing around on the family iPad, and figured it might come in handy having these assets in future. I used SASS for styling, which compiles into CSS, so that I can utilise various At-Rules to make this CSS-only. Penrose triangles aren't the only thing I find cool, so I wanted the app to iterate through all the floating icons that might be bundled in my app and randomly decide on an amount of them to use. This way, I might have time to create some Möbius strips, Rubiks cubes, Poké Balls, whatever. Icons are randomly positioned on the page. Icons float around in random directions. Icons have a random degree of rotation. Each has its own starting degree of rotation, so they weren't all rotating uniformly. Having the makes them appear to move around more aimlessly.

0 views
Owen Lacey 10 months ago

I let Google OR Tools pick my FPL team

First of all, I'm glad you're here. If we had a venn diagram of "software developers" and "people interested in FPL", it'd likely not be much more than us two. Rest assured, if you're only in the former camp I hope there will still be some things of interest. Also, disclaimer, I am by no means a FPL expert: I'm sitting rock bottom of my league with friends so think less Liverpool, more Southampton. Perhaps if it were possible to predict what would happen in a game of football (assuming I'm smart enough to even do so), it wouldn't be the most popular sport in the world. The code for this post can be found in my Github repo here . For those that just want FPL tips, here is (probably) the best FPL team, not that it'll get you very far: This team is picked based on how many points each player is predicted to score (which I'll go into later). FPL means Fantasy Premier League. It's a game where you pick a team of players, and your team scores points based on real-live events in premier league games. Here are the rules: There are around 700 players for 20 different clubs, that's a lot of possible teams. To save this becoming a post on discrete maths, that a lot of possible teams. But what's the "best" team to pick? What criteria do we use to make this decision? This is what's known as a "combinatoral optimisation" problem, and that's where Google OR Tools comes in. Google OR Tools is an open source library that allows you to solve problems like this. You can use it in Java, C++, Python and, for our purposes, in C#. The documentation linked here is decent, so I wont go into too much detail as to how it all works. For our use case we're essentially going to ask OR Tools to solve many s. That is, variables which must take an integer value based on the provided upper and lower bounds. We'll give our s bounds of so that if a player is picked that means its corresponding will have a value of . The complexity comes in telling the what decisions to make based on a player being picked. I found it useful to create a child class of OR Tools' so I can easily keep track of variables that I want to add constraints for: is a that contains the player's details, as well as their selection variable. Now, let's add our constraints. We need to make sure we have the right number of players in each position. We can do this by using the properties on our child class of : A is an OR Tools object. that sums an array of s. Think of it as a "computed" field, dependent on other s. By doing we adding a hard constraint to the model. This means that if a solution were to be found that picked only defenders, the solution would be infeasible . Our is a dictionary, keyed by team name. We can tell the model to not pick too many players from the same team as follows: This one's a bit more involved as we're not simply summing up s and s, we're summing up player costs based on these values. If you've done any work with vectors before you might have come across dot product operations. We're going to use this to sum up the cost of all selected players. That is, the dot product of their selection vars and their costs. The type contains these pieces of information: We'll get to later. By keeping this information together, we can easily sum up the costs of selected players like so: Note: we're using rather than because player cost is represented in 100,000s e.g if a player cost 7.9M, their would be . does the dot product as described earlier. At this point, we have a feasible solution to the problem. I could go into the web app and click "Save" without any validation preventing me from doing so. If this were purely a constraint solving problem (e.g doing a Sudoku is a good example of this) where no two solutions are better than another we'd be done. However, we want the best team: I want to wipe the floor with my friends and be the envy of FPL merchants everywhere. In order to say an FPL team is the best possible team, the model needs to be able to compare two solutions together, and say one solution is better than another. OR Tools gets all feasible solutions, labelling the one with the best score as the optimal solution. In order to change this into an optimisation problem, we need to give the model an objective . This is where it gets quite squishy. If we knew for certain how many points a player would score in a given game, other than that being a very boring prospect, I'd be doing a lot better in my FPL leagues than I currently am. So what do we go with? Selection-y-est : we could pick the players most selected by other players. Using this hive-mind approach would mean we're more likely to pick the no-brainers, but the model wouldn't give us any hipster choices that would make other players go "how did I not think of that?!". Point-y-est : we could pick the ones who've scored the most points so far. This would make a lot of sense, but would not select players who've recently transfered, or have been out injured for a while. Form-y-est : FPL has a concept of "form". However it's a bit black-boxed and it's not very clear how this is calculated, and therefore how it corresponds to the amount of points a player might score. How about some combination of the above? I landed on putting together a very ropey linear regression model built with ML.NET . I won't go into how this model was built, but essentially it would use past performance to predict points scored by a player for the upcoming gameweek. Inexplicably, this model was obsessed with Georginio Rutter; a very average player by FPL standards. Reassuringly, it consistently predicted that Mo Salah would score highly so I decided it was "good enough". If there were reliable, publicly available information about a player's expected points (XP), I would swap out my linear regression model without hesitation. Here's how I told the model to pick me a team with the highest predicted points: The important bit here is , this means the model's solution will be the one with the highest calculated . Lets put all this together. Here are the top players in terms of predicted points for the upcoming gameweek (this is GW26 on 22nd Feb 2025): An immediate observation is that we won't be able to pick all the players in this list due to fact that we can't have more than five midfielders. Chris Wood is the first FWD that appears in at #12 with a XP of , with Erling Haaland in at #17 with (I hope not - he's playing Liverpool this weekend). Let's compare that with the team my model picked: In table form: Let's check our rules: The predicted number of points this team scores is . Narrator: they did not. They scored 70 points. Now, I've been scoring around points on average per gameweek, so this would be a big change in fortunes. I'll need to choose four players to go on the bench here, so I reckon I would be happy with a score of . As you can see, the model ended up picking the five best midfielders. It may look like it picked any old goalkeeper, however the two listed here are in fact the highest ranked, followed by Martin Dúbravka at #98. As mentioned a few times, it isn't just self-deprecation, this application is far from perfect. I'd love to be able to tell you how I've soared up the FPL rankings since I started doing this, but that's not the case. These are the changes I'd like to make to the model in future. The FPL squad has a size of 15, but you only score points for the 11 players you pick for the starting lineup. Therefore, why should we waste budget on benchwarmers that don't give me any points? You get a free transfer every game week and these cumulate if you don't use them. Any further transfers cost you 4 points. You get a wildcard twice a season which gives you unlimited transfers, but day-to-day gameweeks need to support the scenario of "I have these players, and X free transfers. Tell me what transfers to do". As mentioned above, this solution is only as good as the XP data is. I need to either work on my linear regression model, or find some XP data that's less fixated on Georginio Rutter. At the moment, we predict the points based on the next game, but sometimes we have double-gameweeks i.e a player plays twice. It also might be that a specific player has a particularly good run of easier opponents coming up, so getting them in the team now might make sense, even if they wouldn't do as well as someone else in the next game. There's a lot of avenues (or rabbit holes?) we could go down for this. There's also plenty of third-party websites out there that facilitate FPL admin tasks. All things being equal, I found this to be a really fun application of OR Tools. You can find the code for this post on my Github repo here . Pick 15 players: 2 goalkeepers (GK), 5 defenders (DEF), 5 midfielders (MID), 3 forwards (FWD). You can't have more than 3 players from a specific team. The cost of your team can't exceed the budget of 100M. ✅ 2 GKs, 5 DEF, 5 MID, 3 FWD ✅ Max 3 players per team ✅ Total team cost: 90.8M

0 views