JMP

TwitterReddit

Full Text Search with IndexedDB

singpolyma@singpolyma.net

While working on Borogove there has been a desire to have full-text search of locally-stored chat message history. For web-based apps the main storage engine in use is IndexedDB, which is a rather low-level system and certainly doesn’t have easy facilities for this kind of operation. So what’s the simplest, performant way to implement a full-text search on IndexedDB?

In case you are wondering “what is full-text search?” for our purposes we are going to be looking at a search where we want to find any message that contains all the words the user has typed, in any order

Table Scan

While this won’t be our final solution, it is almost always the right place to start. If your data is small (say, under 10k documents) then a table scan is pretty quick, and definitely the simplest option

// Helper so we can use promises with IndexedDB
function promisifyRequest(request) {
	return new Promise((resolve, reject) => {
	  request.oncomplete = request.onsuccess = () => resolve(request.result);
	  request.onabort = request.onerror = () => reject(request.error);
	});
}

const stopwords = ["and", "if", "but"];
function tokenize(s) {
	return s.toLowerCase().split(/\s*\b/)
		.filter(w => w.length > 1 && w.match(/\w/) && !stopwords.includes(w));
}

function stemmer(s) {
	// Optional, https://www.npmjs.com/package/porter2 or similar
}

async function search(q) {
	const qTerms = new Set(tokenize(q).map(stemmer));
	const tx = db.transaction(["messages"], "readonly");
	const store = tx.objectStore("messages");
	const cursor = store.openCursor();

	const result = [];
	while (true) {
        const cresult = await promisifyRequest(cursor);
        if (!cresult?.value) break;

		if (new Set(tokenize(cresult.value.text).map(stemmer)).isSupersetOf(qTerms) {
			result.push(cresult.value);
		}
		cresult.continue();
	}

	return result;
}

Even though we aren’t doing anything fancy with the database yet, there are still a lot of important building blocks here

First we tokenize the query. This means to chunk it up into “words”. Words don’t have to be exactly real words, they can be consistent parts of words or anything else, so long as you tokenize your query and document text in the same way. Here we use a simple strategy where we trust the \b word-break regex pattern, strip any extra whitespace, ignore any empty words or words made of only non-word characters (again, as determined by \w, nothing fancy), and also ignore any stopwords. A “stopword” is a very common word that is not useful to include in the search query such as “and”. Mostly, stopwords are useful to avoid blowing up the index size later, but we include it here for now for consistency

Next we stem the tokens. This is optional and depends on the kind of search you’re trying to build. The purpose of a stemmer is to make searches for eg “flying” also match messages containing the word “fly”

Then we iterate over all the items in the object store. If you wanted the results in a particular order, you could instead iterate over an index on the store in a particular order. We tokenize and stem the text from each item and check if it contains all the “words” from qTerms, if so it is part of the results.

Use an Index

Now what if you have many messages (one million or more, perhaps) and the simple table scan is just too slow. How can we speed this up? With IndexedDB we only get one kind of index: an ordered index, usually build on a B-Tree. This is not exactly what many full-text indexes are built on, but still we can get a very big performance boost without too much more complexity.

First in the onupgradeneeded migrator we need to create the index:

tx.objectStore("messages").createIndex("terms", "terms", { multiEntry: true });

This is a “multiEntry” index which means that the index will get one entry for each item in the array stored at this key, rather than indexing on the array as a whole piece. So when we store a new message we need to include the terms as an array:

tx.objectStore("messages").put({ text, terms: [...new Set(tokenize(text).map(stemmer))] });

Now, this does not let us search by our full query, but rather only by one word. How does this help us? Well, we can iterate over only documents which match at least one of the terms in our query. Which one should we pick? Counting items is pretty fast, so let’s pick whichever one has the smallest number of results:

async function search(q) {
	const qTerms = new Set(tokenize(q).map(stemmer));
	const tx = db.transaction(["messages"], "readonly");
	const store = tx.objectStore("messages");
	const index = store.index("terms");
	// Figure out which search term matches the fewest messages
	let probeTerm = null;
	let probeScore = null;
	for (const term of qTerms) {
		const score = await promisifyRequest(index.count(IDBKeyRange.only(term)));
		if (!probeTerm || score < probeScore) {
			probeTerm = term;
			probeScore = score;
		}
	}
	// Using the smallest list of messages that match one term
	// Find the ones that match every term
	const result = [];
	const cursor = index.openCursor(IDBKeyRange.only(probeTerm));
	while (true) {
		const cresult = await promisifyRequest(cursor);
		if (!cresult?.value) break;
		if (new Set(cresult.value.terms).isSupersetOf(qTerms)) {
			result.push(cresult.value);
		}
		cresult.continue();
	}

	// Sort results
	return result.sort((a, b) => a.timestamp < b.timestamp ? -1 : (a.timestamp > b.timestamp ? 1 : 0));
}

The operation to count the index for each term is pretty fast, but if you found this prohibitive, you could also store these counts in their own keys and update them as you insert new messages. Once we know which is the smallest, we then do the same scan as before, but only over that much smaller subset. The tokenized and stemmed terms are stored so we can compare against those directly here rather than doing it again.

Now if we want it sorted we have to do it ourselves, just like a DB engine would with this kind of query where the order we want does not match the index order.

On a test set of one million messages, this simple index was enough to take the performance from unusable grinding to almost instant responses, since the number of messages in the smallest term is usually still well under 10k.

Google Wants to Control Your Device

singpolyma@singpolyma.net

Today we join with other organizations in signing the open letter to Google about their plans to require all Android app developers to register centrally with Google in order to distribute applications outside the Google Play Store.  You should go read the letter, it is quite well done. We want to talk a little bit additionally about why sideloading (aka installing apps on your own device, or “directly installing” as the letter puts it) is important to us

In early fall of 2024 Google Play decided to remove and ban the Cheogram app from their store.  Worse, since it had been listed by Play as “malware” Google’s Play Protect system began warning existing users that they should uninstall the existing app from their devices.  This was, as you might imagine, very bad for us.  No new customers could get the app and existing customers were contacting support unsure how to get back into their account after being tricked into removing it.

After a single submission to Google Play appealing this decision, they came back very quickly affirming “yes, this app seems to be malware.” No indication of why they thought that, just a decision. At this point the box we could use to submit new appeals also went away.  With no appeals process available and requests to what little support Google has going totally ignored, it was not clear if we were ever going to be able to distribute our app in the Play Store again.  After months of being delisted we finally got a lucky break.  In talking with some of our contacts at the Software Freedom Conservancy, they offered to write to some of their contacts deep inside Google and ask if there was anything that could be done. When their contact pushed internally at Google to get more information, suddenly the app was re-activated in the Play Store

I want to be clear here. We did not change the app. We did not upload a new build. This app Google had been so very, very sure was “malware” was fully reinstated and available to customers again the moment a human being bothered to actually look at it. It was of course obvious to any human looking at the app that it is not malware and so they restored it to the store immediately. They never replied to that final request, and no details about what happened were ever made available. From that point on Google has essentially pretended that this never happened and the app was always great and in the Play Store. If we had not been able to get in contact with a human and really push them to take a look, however, we would never have been reinstated.

Despite our good fortune, we still lost months of potential business over this.  Of course you’ve heard stories like this before. Stories of Play Store abuse are a dime a dozen and most of them don’t have the “happy” ending ours does. What does this have to do with “sideloading” and the open letter? Well, despite all the months of lost business, and despite all the existing customers being told to uninstall their app if they had got it from Play Store, we lost no more than 10% of our customers and continued to onboard new ones during the entire time.  How is this possible?  The main reason is direct installs (“sideloading”).  The majority of our customers get the app from our preferred sources on F-Droid and Itch. These customers were not told by Play Protect to remove their app. During the time we were delisted from Play Store we removed the link to Play Store from our website and new customers were instructed to use F-Droid. Of course we still lost some business here, some people were unable or unwilling to use F-Droid or other direct install options, but the damage was far, far less than it might have otherwise been.

What Google is proposing would allow them to ban anyone from creating apps which may be directly installed without their approval. One of the reasons they say they need to do this is to protect people from malware! Yet even if this was the narrow purpose of a ban it would still routinely catch apps which are not nefarious in any way, just as ours wasn’t. Furthermore, with all apps and developers registered in their system, a ban under these new rules could result in everyone being told to uninstall the app by Play Protect, and not just those who got it from Play Store to begin with. This would leave app developers who are erroneously marked by Google as malware with no options, no recourse, no way to appeal, and praying there is a friend of a friend who knows someone deep in Google who can poke the right button. This is just not an acceptable future for the world’s largest mobile platform.

Featured Image

Newsletter: Referral Event!

singpolyma@singpolyma.net

Hi everyone

Welcome to the latest edition of your pseudo-monthly JMP update!

In case it’s been a while since you checked out JMP, here’s a refresher: JMP lets you send and receive text and picture messages (and calls) through a real phone number right from your computer, tablet, phone, or anything else that has a Jabber client.  Among other things, JMP has these features: Your phone number on every device; Multiple phone numbers, one app; Free as in Freedom; Share one number with multiple people.

This month begins JMP’s 2026 referral event! From now until June 1st, everyone you refer to JMP rewards you triple! That’s three free months of service for every person who signs up and pays.

For those who haven’t explored the referral system much, a short refresher on how it works. In your account settings there is a command Refer a friend for free credit. You will be presented with a message that reads like so:

This code will provide credit equivalent to one month of service to anyone after they sign up and pay: CODE These remaining codes are single use and give the person using them a free month of JMP service. You will receive credit equivalent to one month of service after someone uses any of your codes and their payment clears.

When someone uses this code during their signup process with JMP, they will sign up and pay as usual. But after they pay, they will get a free month deposited to their initial balance as well. Once their payment clears (about 90 days for a credit card transaction) your account will also get a month’s worth of credit deposited. Except for referrals which reward during this event, it will be three months! This code is appropriate for publishing publicly and can be used over and over again.

Under this message in the command result is a list of invite codes. If you are out of invite codes, you can always ask JMP support for more as well. These codes are each single-use and give the one person who uses them a free month of JMP service right away. They don’t pay anything. Then, if they pay, and once their payment clears (about 90 days for a credit card transaction) your account will also get a month’s worth of credit deposited. Except for referrals which reward during this event, it will be three months! These codes are appropriate for giving to trusted friends or family members, and each code can only be used once.

To learn what’s happening with JMP between newsletters, here are some ways you can find out:

Thanks for reading and have a wonderful rest of your week!

Creative Commons Attribution ShareAlike