Posts in Analytics (20 found)

How You Read My Content

A week ago, after chatting with Kev about his own findings , I created a similar survey (which is still open if you want to answer it) to collect a second set of data because why the heck not. Kev’s data showed that 84.5% of responses picked RSS, Fediverse was second at 7.6%, direct visits to the site were third at 5.4%, and email was last at 2.4%. My survey has a slightly different set of options and allows for multiple choices—which is why the % don’t add up to 100—but the results are very similar: This is the bulk of the data, but then there’s a bunch of custom, random answers, some of which were very entertaining to read: So the takeaway is: people still love and use RSS. Which makes sense, RSS is fucking awesome, and more people should use it. Since we’re talking data, I’m gonna share some more information about the numbers I have available, related to this blog and how people follow it. I don’t have analytics, and these numbers are very rough, so my advice is not to give them too much weight. 31 people in the survey said they read content in their inbox, but there are currently 103 people who are subscribed to my blog-to-inbox automated newsletter. RSS is a black box for the most part, and finding out how many people are subscribed to a feed is basically impossible. That said, some services do expose the number of people who are subscribed, and so there are ways to get at least an estimate of how big that number is. I just grabbed the latest log from my server, cleaned the data as best as I could in order to eliminate duplicates and also entries that feel like duplicates, for example: In this case, it’s obvious that those two are the same service, and at some point, one more person has signed up for the RSS. But how about these: All those IDs are different, but what should I do here? Do I keep them all? Who knows. Anyway, after cleaning up everything, keeping only requests for the main RSS feed, I’m left with 1975 subscribers, whatever that means. Are these actual people? Who knows. Running the exact same log file (it’s the NGINX access log from Jan 10th to Jan 13th at ~10AM) through Goaccess, with all the RSS entries removed, tells me the server received ~50k requests from ~8000 unique IPs. 33% of those hits are from tools whose UA is marked as “Unknown” by Goaccess. Same story when it comes to reported OS: 35% is marked as “Unknown”. Another 15% on both of those tables is “Crawlers”, which to me suggests that at least half of the traffic hitting the website directly is bots. In conclusion, is it still worth serving content via RSS? Yes. Is the web overrun by bots? Also yes. Is somebody watching me type these words? Maybe. If you have a site and are going to run a similar experiment, let me know about it, and I’ll be happy to link it here. Also, if you want some more data from my logs, let me know. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs 80.1% reads the content inside their RSS apps 23.8% uses RSS to get notified, but then read in the browser 10.7% visits the site directly 4.9% reads in their inbox. 1 person said they follow on Mastodon, and I am not on Mastodon, so 🤷‍♂️ 1 person left a very useful message in German, a language I don’t speak, which was quite amusing 1 person lives in my house and looks over my shoulder when I write A couple of people mentioned that they read on RSS but check the site every now and again because they like the website

0 views
Kev Quirk 1 weeks ago

How Do You Read My Content?

I'm trying to get an idea on how people consume the waffle I put out, it should only take 5 seconds to respond, and I'd be very grateful. It’s well publicised that I don’t run any kind of analytics on this site . For me, engagement is far more important. But I’m trying to better understand how you fine people consume the waffle I spit out into the world. The only reason I want to do this is that I think it will be interesting to know. I could temporarily add tracking to the site, but that feels icky to me; I’d rather have something that’s opt in. So I’ve created a really simple form that you can fill in. It only has 1 question, so should take no more than a few seconds to complete. If you’re a regular reader, I’d be very grateful if you could take a few seconds out of your day to cast a vote please. The form is embedded below, but it may not embed properly in some places (like on the RSS feed), so just in case here’s a direct link to the form too . Thanks for reading this post via RSS. RSS is great, and you're great for using it. ❤️ You can reply to this post by email , or leave a comment .

0 views
Karboosx 1 months ago

Homemade tracking system without use of third-party libs like Google Analytics

Tired of sending your analytics data to Google? Build with me a simple, self-hosted tracking system from scratch that respects user privacy, detects bots, and keeps everything on your own servers.

0 views
Alex White's Blog 1 months ago

Babbling About Solutions

I've been in so many meeting where people will discuss a tiny point to death over the course of hours/days. For example, at numerous companies there have been debates around time axis graphs and the current date. There's always someone who thinks users will be confused that the visual for the current day is less than the previous days (because the day is in progress). Additionally, the topic of timezones will come up. "Our data is in X timezone, but the user is in Y, the graph will be confusing". I was reminded of this example as I built out a "daily visitors" graph for my analytics page. I chuckled as I just implemented a solution, without hours of debate and Y salaries * Z hours of money wasted. My solution to the two problems was this: Nothing is ever perfect and every UI will confuse someone, somewhere, somehow. The key is to see if there's a significant amount of data indicating the UI is confusing, not to debate the tiny details to death before even releasing something. Show visitor count as "X visitors in 24 hours". You're not commiting to a day in a timezone, instead it's relative by hours. For the graph, use the wording "Until Now" to represent the partial nature of the value. Give users some credit and know your audience.

0 views
Alex White's Blog 1 months ago

Privacy Focused Analytics in Under 200 Lines of Code

When I launched this blog, I told myself I wouldn't succumb to monitoring analytics. But, curiosity killed the cat and here we are! I've built and deployed a privacy focused analytics "platform" for this blog. Best of all, it's under 200 lines of code and requires a single PHP file! My analytics script (dubbed 1Script Analytics) works by recording a hash of the visitor's IP and date (inspired by Herman's analytics on Bear Blog). This allows me to count unique visitors in a privacy friendly way. The script itself is a single PHP file that does two jobs. When called directly (/analytics.php) it displays a dashboard with traffic data. When used in an a simple JS function with the query parameter, it records the visit to a SQLite database. That's it, super simple analytics. No cookies, JavaScript frameworks or dependencies. Throw it on your server, migrate the database and put a image tag in your template file. Wanna see my live analytics? Click here for the analytics dashboard. Okay I fixed a few things, guess I'm a bit sleep deprived! To properly get the referrer, I switched to JavaScript to call the analytics PHP script rather than the image method. I'm using a POST request via to pass current page and referrer to PHP. Also updated the styling slightly on the dashboard to use a grid layout. Finally, moved my sqlite file into a non-web directory on the server, updated config, and bundled the analytics script with my 11ty deployment process. Planning to layer in some simple graphs in the future, but so far pretty happy with how things are working!

0 views
Jim Nielsen 1 months ago

My Number One “Resource Not Found”

The data is in. The number one requested resource on my blog which doesn’t exist is: According to Netlify’s analytics, that resources was requested 15,553 times over the last thirty days. Same story for other personal projects I manage: “That many requests and it serves a 404? Damn Jim, you better fix that quick!” Nah, I’m good. Why fix it? I have very little faith that the people who I want most to respect what’s in that file are not going to do so . So for now, I’m good serving a 404 for . Change my mind. Reply via: Email · Mastodon · Bluesky iOS Icon Gallery : 18,531 requests. macOS Icon Gallery 10,565 requests.

1 views
Evan Schwartz 2 months ago

Scour - October Update

Hi friends, In October, Scour ingested 1,042,894 new posts from 14,140 sources . I was also training for the NYC Marathon (which is why this email comes a few days into November)! Last month was all about Interests: Your weekly email digest now includes a couple of topic recommendations at the end. And, if you use an RSS reader to consume your Scour feed, you’ll also find interest recommendations in that feed as well. When you add a new interest on the Interests page, you’ll now see a menu of similar topics that you can click to quickly add. You can browse the new Popular Interests page to find other topics you might want to add. Infinite scrolling is now optional. You can disable it and switch back to explicit pages on your Settings page. Thanks Tomáš Burkert for this suggestion! Earlier, Scour’s topic recommendations were a little too broad. I tried to fix that and now, as you might have noticed, they’re often too specific. I’m still working on solving this “Goldilocks problem”, so more on this to come! Finally, here were a couple of my favorite posts that I found on Scour in October: Happy Scouring! - Evan Introducing RTEB: A New Standard for Retrieval Evaluation Everything About Transformers Turn off Cursor, turn on your mind

1 views
Jack Vanlightly 2 months ago

How Would You Like Your Iceberg Sir? Stream or Batch Ordered?

Today I want to talk about stream analytics, batch analytics and Apache Iceberg. Stream and batch analytics work differently but both can be built on top of Iceberg, but due to their differences there can be a tug-of-war over the Iceberg table itself. In this post I am going to use two real-world systems, Apache Fluss (streaming tabular storage) and Confluent Tableflow (Kafka-to-Iceberg), as a case study for these tensions between stream and batch analytics. Apache Fluss uses zero-copy tiering to Iceberg . Recent data is stored on Fluss servers (using Kafka replication protocol for high availability and durability) but is then moved to Iceberg for long-term storage. This results in one copy of the data. Confluent Kora and Tableflow uses internal topic tiering and Iceberg materialization , copying Kafka topic data to Iceberg, such that we have two copies (one in Kora, one in Iceberg). This post will explain why both have chosen different approaches and why both are totally sane, defensible decisions. First we should understand the concepts of stream-order and batch-order . A streaming Flink job typically assumes its sources come with stream-order . For example, a simple SELECT * Flink query assumes the source is (loosely) temporally ordered, as if it were a live stream. It might be historical data, such as starting at the earliest offset of a Kafka topic, but it is still loaded in a temporal order. Windows and temporal joins also depend on the source being stream-ordered to some degree, to avoid needing large/infinite window sizes which blow up the state. A Spark batch job typically hopes that the data layout of the Iceberg table is batch-ordered , say, partitioned and sorted by business values like region, customer etc), thus allowing it to efficiently prune data files that are not relevant, and to minimize costly shuffles. If Flink is just reading a Kafka topic from start to end, it’s nothing special. But we can also get fancy by reading from two data sources: one historical and one real-time. The idea is that we can unify historical data from Iceberg (or another table format) and real-time data from some kind of event stream. We call the reading from the historical source, bootstrapping . Streaming bootstrap refers to running a continuous query that reads historical data first and then seamlessly switches to live streaming input. In order to do the switch from historical to real-time source, we need to do that switch on a given offset. The notion of a “last tiered offset” is a correctness boundary that ensures that the bootstrap and the live stream blend seamlessly without duplication or gaps. This offset can be mapped to an Iceberg snapshot. Fig 1. Bootstrap a streaming Flink job from historical then switch to real-time. However, if the historical Iceberg data is laid out with a batch-order (partitioned and sorted by business values like region, customer etc) then the bootstrap portion of a SELECT * will appear completely out-of-order relative to stream-order. This breaks the expectations of the user, who wants to see data in the order it arrived (i.e., stream-order), not a seemingly random one.  We could sort the data first from batch-order back to stream-order in the Flink source before it reaches the Flink operator level, but this can get really inefficient. Fig 2. Sort batch-ordered historical data in the Flink source task. If the table has been partitioned by region and sorted by customer, but we want to sort it by the time it arrived (such as by timestamp or Kafka offset), this will require a huge amount of work and data shuffling (in a large table). The result is not only a very expensive bootstrap, but also a very slow one (afterall, we expect fast results with a streaming query). So we hit a wall: Flink wants data ordered temporally for efficient streaming bootstrap. Batch workloads want data ordered by value (e.g., columns) for effective pruning and scan efficiency. These two data layouts are orthogonal. Temporal order preserves ingest locality; value order preserves query locality. You can’t have both in a single physical layout. Fluss is a streaming tabular storage layer built for real-time analytics which can serve as the real-time data layer for lakehouse architectures. I did a comprehensive deep dive into Apache Fluss recently, diving right into the internals if you are interested. Apache Fluss takes a clear stance. It’s designed as a streaming storage layer for data lakehouses, so it optimizes Iceberg for streaming bootstrap efficiency. It does this by maintaining stream-order in the Iceberg table. Fig 3. Fluss stores real-time and historical data in stream-order. Internally, Fluss uses its own offset (akin to the Kafka offset) as the Iceberg sort order. This ensures that when Flink reads from Iceberg, it sees a temporally ordered sequence. The Flink source can literally stream data from Iceberg without a costly data shuffle.  Let’s take look at a Fluss log table. A log table can define: Optional partitioning keys (based on one or more columns). Without them, a table is one large partition. The number of buckets per partition . The bucket is the smallest logical subdivision of a Fluss partition. Optional bucketing key for hash-bucketing. Else rows are added to random buckets, or round-robin. The partitioning and buckets are both converted to an Iceberg partition spec. Fig 4. An example of the Iceberg partition spec and sort order Within each of these Iceberg partitions, the sort order is the Fluss offset. For example, we could partition by a date field, then spread the data randomly across the buckets within each partition. Fig 5. The partitions of an Iceberg table visualized. Inside Flink, the source will generate one “split” per table bucket, routing them by bucket id to split readers. Due to the offset sort order, each Parquet file should contain contiguous blocks of offsets after compaction. Therefore each split reader naturally reads Iceberg data in offset order until it switches to the Fluss servers for real-time data (also in offset order). Fig 6. Flink source bootstraps from Iceberg visualized Once the lake splits have been read, the readers start reading from the Fluss servers for real-time data. This is great for Flink streaming bootstrap (it is just scanning the data files as a cheap sequential scan). Primary key tables are similar but have additional limitations on the partitioning and bucketing keys (as they must be subsets of the primary key). A primary key, such as device_id , is not a good partition column as it’s too fine grained, leading us to use an unpartitioned table. Fig 7. Unpartitioned primary key table with 6 buckets. If we want Iceberg partitioning, we’ll need to add another column (such as a date) to the primary key and then use the date column for the partitioning key (and device_id as a bucket key for hash-bucketing) . This makes the device_id non-unique though. In short, Fluss is a streaming storage abstraction for tabular data in lakehouses and stores both real-time and historical data in stream-order. This layout is designed for streaming Flink jobs. But if you have a Spark job trying to query that same Iceberg table, pruning is almost useless as it does not use a batch-optimized layout. Fluss may well decide to support Iceberg custom partitioning and sorting (batch-order) in the future, but it will then face the same challenges of supporting streaming bootstrap from batch-ordered Iceberg. Confluent’s Tableflow (the Kafka-to-Iceberg materialization layer) took the opposite approach. It stores two copies of the data: one stream-ordered and one optionally batch-ordered. Kafka/Kora internally tiers log segments to object storage, which is a historical data source in stream-order (good for streaming bootstrap). Iceberg is a copy, which allows for stream-order or batch-order, it’s up to the customer. Custom partitioning and sort order is not yet available at the time of writing, but it’s coming. Fig 8. Tableflow continuously materializes a copy of a Kafka topic as an Iceberg table. I already wrote why I think zero-copy Iceberg tiering is a bad fit for Kafka specifically. Much also applies to Kora, which is why Tableflow is a separate distributed component from Kora brokers. So if we’re going to materialize a copy of the data for analytics, we have the freedom to allow customers to optimize their tables for their use case, which is often batch-based analytics. Fig 9. Copy 1 (original): Kora maintains stream-ordered live and historical Kafka data. Copy 2 (derived): Tableflow continuously materializes Kafka topics as Iceberg tables. If the Iceberg table is also stored in stream-order then Flink could do an Iceberg streaming bootstrap and then switch to Kafka. This is not available right now in Confluent, but it could be built. There are also improvements that could be made to historical data stored by Kora/Kafka, such as using a columnar format for log segments (something that Fluss does today). Either way, the materialization design provides the flexibility to execute a streaming bootstrap using a stream-order historical data source, allowing the customer to optimize the Iceberg table according to their needs. Batch jobs want value locality (data clustered by common predicates), aka batch-order. Streaming jobs want temporal locality (data ordered by ingestion), aka stream-order. With a single Iceberg table, once you commit to one, the other becomes inefficient. Given this constraint, we can understand the two different approaches: Fluss chose stream-order in its Iceberg tables to support stream analytics constraints and avoid a second copy of the data. That’s a valid design decision as after all, Fluss is a streaming tabular storage layer for real-time analytics that fronts the lakehouse. But it does mean giving up the ability to use Iceberg’s layout levers of partitioning and sorting to tune batch query performance. Confluent chose a stream-order in Kora and one optionally batch-ordered Iceberg copy (via Tableflow materialization), letting the customer decide the optimum Iceberg layout. That’s also a valid design decision as Confluent wants to connect systems of all kinds, be they real-time or not. Flexibility to handle diverse systems and diverse customer requirements wins out. But it does require a second copy of the data (causing higher storage costs). As the saying goes, the opposite of a good idea can be a good idea. It all depends on what you are building and what you want to prioritize. The only losing move is pretending you can have both (stream-optimized and batch-optimized workloads) in one Iceberg table without a cost. Once you factor in the compute cost of using one format for both workloads, the storage savings disappear. If you really need both, build two physical views and keep them in sync. Some related blog posts that are relevant this one: Beyond Indexes: How Open Table Formats Optimize Query Performance Why I’m not a fan of zero-copy Apache Kafka-Apache Iceberg Understanding Apache Fluss Apache Fluss uses zero-copy tiering to Iceberg . Recent data is stored on Fluss servers (using Kafka replication protocol for high availability and durability) but is then moved to Iceberg for long-term storage. This results in one copy of the data. Confluent Kora and Tableflow uses internal topic tiering and Iceberg materialization , copying Kafka topic data to Iceberg, such that we have two copies (one in Kora, one in Iceberg). Flink wants data ordered temporally for efficient streaming bootstrap. Batch workloads want data ordered by value (e.g., columns) for effective pruning and scan efficiency. Optional partitioning keys (based on one or more columns). Without them, a table is one large partition. The number of buckets per partition . The bucket is the smallest logical subdivision of a Fluss partition. Optional bucketing key for hash-bucketing. Else rows are added to random buckets, or round-robin. Fluss chose stream-order in its Iceberg tables to support stream analytics constraints and avoid a second copy of the data. That’s a valid design decision as after all, Fluss is a streaming tabular storage layer for real-time analytics that fronts the lakehouse. But it does mean giving up the ability to use Iceberg’s layout levers of partitioning and sorting to tune batch query performance. Confluent chose a stream-order in Kora and one optionally batch-ordered Iceberg copy (via Tableflow materialization), letting the customer decide the optimum Iceberg layout. That’s also a valid design decision as Confluent wants to connect systems of all kinds, be they real-time or not. Flexibility to handle diverse systems and diverse customer requirements wins out. But it does require a second copy of the data (causing higher storage costs). Beyond Indexes: How Open Table Formats Optimize Query Performance Why I’m not a fan of zero-copy Apache Kafka-Apache Iceberg Understanding Apache Fluss

0 views
Jeff Geerling 4 months ago

Digging deeper into YouTube's view count discrepancy

For a great many tech YouTube channels, views have been markedly down from desktop ("computer") users since August 10th (or so). This month-long event has kicked up some dust—enough that two British YouTubers, Spiffing Brit and Josh Strife Hayes are having a very British argument 1 over who's right about the root cause. Spiffing Brit argued it's a mix of YouTube's seasonality (it's back to school season) and channels falling off, or as TechLinked puts it, " git gud ", while Josh Strife Hayes points out the massive number of channels which identified a historic shift down in desktop views (compared to mobile, tablet, and TV) starting after August 10. This data was corroborated by this Moist Critical video as well.

0 views
Martin Fowler 5 months ago

Actions to improve impact intelligence

Sriram Narayan continues his article on impact intelligence by outlining five actions that can be done to improve impact intelligence: introduce robust demand management, pay down measurement debt introduce impact validation, offer your CFO/COO an alternative to ROI, equip your teams.

0 views
Martin Fowler 5 months ago

The Reformist CTO’s Guide to Impact Intelligence

The productivity of knowledge workers is hard to quantify and often decoupled from direct business outcomes. The lack of understanding leads to many initiatives, bloated tech spend, and ill-chosen efforts to improve this productivity. Sriram Narayan begins an article that looks at how to avoid this by developing an intelligence of the business impact of their work across a network connecting output to proximate and downstream impact.

0 views
A Smart Bear 5 months ago

Max MRR: Your growth ceiling

Your company will stop growing sooner than you think. The "Max MRR" metric predicts revenue plateaus based on churn and new revenue.

0 views
Grumpy Gamer 6 months ago

Death By Scrolling Part 2

If you haven’t read my previous post about Death By Scrolling way back in February, I suggest you do. Of course this is my lazy way of doing the 2nd promised blog post for Death By Scrolling. In all fairness, I started to write it and it seem awfully familiar so I went back and checked and sure enough I had already written about it. But, I’ll do another real post… I asked for beta testers on Mastodon and got close to 300 sign-ups. I didn’t want to invite everyone all at once. There is an old saying that you can only make a first impression once. Every time I make a new beta version I invite 25 more people. Couple of stats: About 25% of the people never redeem the steam key. Or they redeem it weeks later. This is a little surprising, but maybe it shouldn’t be. People are busy. Of the people who did redeem the key a third play the game once or twice and never again. This is not surprising. Death By Scrolling is a rogue-like and you die a lot. I do mean a lot, it’s right in the title. Some people do not like this type of game, and I’m OK with that. Maybe half the people who play the game never visit the Discord. We can get only so much info from analytics. Having a conversation about what you like and don’t like is very helpful. Again, this isn’t too unexpected. The players that do play more than a few times play a lot and that is good to see. It’s nice to see strategies emerge that we, as the designers, didn’t think of. That is always a good sign. This is the first time I’ve done large-ish beta test for one of my games and it’s been fascinating and very insightful. I’m about to invite the next group of 25 testers. If you’re among this group, please visit the Discord. – Ron

0 views
Peter Steinberger 7 months ago

stats.store: Privacy-First Sparkle Analytics

How curiosity about VibeTunnel users led me to build stats.store - a free, open source analytics backend for Sparkle using AI tools, all while cooking dinner.

0 views
James O'Claire 7 months ago

The Trackers and SDKs in ChatGPT, Claude, Grok and Perplexity

Well for a quick weekend recap I’m going to look at which 3rd party SDKs and API calls I can find in the big 4 Android chat apps based. We’ll be using free data from AppGoblin which you can feel free to browse at any of the links below or on tables. Data is collected via de-compiled SDKs and MITM API traffic. Let’s look first at the development tools. These were interesting to me because I had assumed I’d see more of the dynamic JavaScript libraries like React. Instead we see these are all classic Kotlin apps. If you click through the Chat App names you’ll see the more detailed breakdowns of which specific parts of the libraries they’re using like e (in app animations library) or Kotlin Coil Compose or Square’s . Wow, way more than I expected and with quite the variety! I guess it’s enough we can further break these down. As is common now, most apps have more than one analytics tracker in their app. First up let’s recognize Google, it’s across every app in multiple ways. The main one that is used in most apps is the . GMS which is required for both Firebase and Google Play Services. Here’s an example of the measurement SDKs related to this: Next was statsig.com and wow! I was blown away I found this one in 3 of the 4 apps. This looks like a super popular company and I was surprised as I hadn’t heard of them before. Looking around, they look a bit more developer / product focused, but have tons of features and look quite popular. Finally in the analytics section we’ll add the classic segment.com (marketing analytics) and sentry.io (deployment analytics) which get to call OpenAI and Anthropic as it’s clients. It’s always interesting how every company from games to AI end up needed multiple analytics platforms and probably still depend most on their home BI/Backend. Here’s where the money is at. Now SUPER cool is that RevenueCat is now in both OpenAI and Perplexity. RevenueCat helps to use react native updatable web payment / subscription walls so that marketers can change those sections of the apps without needing to do an entire app update. I believe Perplexity is using Stripe, but that could also be a part of their bigger app ecosystem livekit.io ( AppGoblin livekit.io ) is an AI voice platform which is used by OpenAI and Grok. I’m surprised that OpenAI uses this, as they were quite early to the voice game, but perhaps they use this for some deeper custom voice tools. Perplexity has the most interesting third party tools with MapBox and Shopify. I believe MapBox, which delivers mapping tiles, is used for some of Perplexity’s image generation tools like adding circles/lines etc to maps. After seeing Shopify in Perplexity, I realized there wasn’t a Shopify SDK found for OpenAI (despite checking recently). They have been rolling out shopping features as a way to monetize their app, so I am curious if these are just implemented via API or if they were obfuscated well enough to not be found. If you’re still interested, you can also check out the API calls recorded by each app while open. The data is scrubbed, and I’m not sharing the clear text JSONs associated, but you can see some of the endpoints related to the SDKs. If you have further questions about these, or have a specific piece of data (say GPS, or email) that you’d like to check if it is sent along to any of these, just let me know and we can do further research: https://appgoblin.info/apps/com.openai.chatgpt/data-flows https://appgoblin.info/apps/com.anthropic.claude/data-flows If you have feedback please join the https://appgoblin.info Discord, you can find the link on the home page.

0 views
Jefferson Heard 8 months ago

Examples metrics and fitness functions for Evolutionary Architecture

In my previous article, Your SaaS's most important trait is Evolvability I talk about the need to define fitness functions that ladder up to core company metrics like NPS, CSAT, GRR, and COGS. Just today I had a great followup where a connection on LinkedIn ask me for specifics for an early stage SaaS. I think it'd be valuable to follow up that post with some examples from that conversation. The Ecology of Great Tech No spam. Unsubscribe anytime. Pick a metric first that's important to the company at large. For early stage SaaS, I'd say that's NPS. It's easy to collect, low touch, and Promoters are the people who will help you clinch down renewals and propagate your SaaS to their colleagues at other organizations. The more promotable your software is, the less work your sales and renewals folks will have to do to move their pipeline. Promoters are people who think your software is a joy to use, and that everyone should be using it over whatever they're using today. At an early stage, whatever your software is, you have one or two killer features that really drive engagement and dominate a user's experience of your product. You're asking yourself, "What metrics do I have control over that make the experience Promotion Worthy?" The point is to make it concrete and measurable. Once you can measure it, you want to know two things: Build a now. Measure continuously. Find the trend. Build that into your Site Reliability practice. Push your engineering team to understand what levers they have to control that function and know how quickly they can adapt if it starts trending negative. As your software and company grows, you'll accumulate functions like this for measuring the fitness of your software for common use-cases. It won't be "one key metric" but one or two metrics for each persona. Pivots happen. M&As happen. Product requirements shift as the horizon gets closer. For the kinds of changes you learn to expect as an executive, how well does your tech team adapt to change? As a top software architect or VP of Engineering, these are the kinds of things you measure to see if the team is healthy and if the software is healthy under it. Change is life. Change is necessary for growth. In a healthy, growing company, change is constant. But change introduces stress. Your software architecture's ability to absorb this stress and adapt to new circumstances faster than your competition without creating longer term problems is the ultimate measure of its quality. If your killer feature is messaging, how long does it take for messages and read-receipts to arrive? How long until someone notices lag? How fast is fast enough that improvements aren't noticed? If your killer feature is delivering support through AI, how many times does a user redirect the AI agent for a single question? How complex an inquiry can your AI handle before that's too great? How long does it take for a response to come back? If your killer feature is a calendar, how long does it take for someone to build an appointment, how long does it take to sync to their other calendars, and how close to "on-time" are reminders being delivered? If your killer feature is your financial charting, how up to date are the charts, and how long does it take for a dashboard to load and update? What's the minimum acceptable bound? What's the point of diminishing returns? Do they get thrown into crunch-time in the last 30 days of every project? Does software ship with loose ends and fast-follows that impinge on the next project's start time? Does technical debt accumulate and affect customer experience, support burden, or COGS?

0 views
Nicky Reinert 9 months ago

Adobe Launch DTM Naming Conventions

I’ve worked with Adobe Tracking Suite (which is Adobe Launch and all it’s sibblings) for quite a while and I saw many, some quite chaotic, tracking implementations and tag managers. At some point I felt the need to write down some basic rules to navigate those messy libraries. Hope that helps you, …

0 views
A Smart Bear 9 months ago

All pretty models are wrong, but some ugly models are useful

Identifying useful frameworks for companies, strategy, markets, and organizations, instead of those that just look pretty in PowerPoint.

0 views
Dizzy Zone 11 months ago

On Umami

I’ve been using Umami analytics on this blog for quite some time now. I self host an instance on my homelab. I spent a bit of time researching self-hosted analytics and generally they had a few issues. First, I’d like the analytics platform to be privacy focused. No cookies, GDPR compliant, no PII. Umami checks this mark. Second, many of the alternatives had quite a bit of hardware resource overhead once hosted. They would either consume a ton of memory, the cpu usage would be high or require me to host something like ClickHouse to run them. Since this blog is not the new york times I feel like hosting a specific database for it would be overkill. My homelab is rather small, so keeping things minimal is how I manage to stretch it. Umami consumes around 1% of a CPU core and ~240MiB of RAM on my homelab. Third, postgres as the datastore. Postgres is my go to database, and I host tons of small tools that use it as the backend. I like having a single instance that can then be easily backed up & restored, without having to resort to a ton of different databases. Therefore, any analytics tool would have to use it. Fourth, the tracking script should be minimal in size, not to affect load times too badly. The blog itself is pretty lightweight, so any tracking script bloat would defeat that. The Umami script has a content lenght of 1482 bytes once gzipped, not too shabby. Generally, I’ve been happy with the choice. However, there is one thing in Umami that annoys me more than it probably should: the visit timer. Apparently, the visit time is only updated once a user navigates to another page on the blog. If they simply leave, there’s no visit duration stored whatsoever. This makes the visit time tracker completely useless. I’m not the first one to notice this but the issue has since been moved to a discussion which has seen no progress. Good news is there’s a few things one could do - perhaps add a custom event to track this? Or fork Umami, since it’s open source and fix it. Both of these fall strictly into my “can’t be arsed to do” category, so I guess it’s not that important. Thanks for reading! Perhaps there are other analytics tools that tick the boxes above? Let me know in the comments below.

0 views