JMP

TwitterReddit

Full Text Search with IndexedDB

singpolyma@singpolyma.net

While working on Borogove there has been a desire to have full-text search of locally-stored chat message history. For web-based apps the main storage engine in use is IndexedDB, which is a rather low-level system and certainly doesn’t have easy facilities for this kind of operation. So what’s the simplest, performant way to implement a full-text search on IndexedDB?

In case you are wondering “what is full-text search?” for our purposes we are going to be looking at a search where we want to find any message that contains all the words the user has typed, in any order

Table Scan

While this won’t be our final solution, it is almost always the right place to start. If your data is small (say, under 10k documents) then a table scan is pretty quick, and definitely the simplest option

// Helper so we can use promises with IndexedDB
function promisifyRequest(request) {
	return new Promise((resolve, reject) => {
	  request.oncomplete = request.onsuccess = () => resolve(request.result);
	  request.onabort = request.onerror = () => reject(request.error);
	});
}

const stopwords = ["and", "if", "but"];
function tokenize(s) {
	return s.toLowerCase().split(/\s*\b/)
		.filter(w => w.length > 1 && w.match(/\w/) && !stopwords.includes(w));
}

function stemmer(s) {
	// Optional, https://www.npmjs.com/package/porter2 or similar
}

async function search(q) {
	const qTerms = new Set(tokenize(q).map(stemmer));
	const tx = db.transaction(["messages"], "readonly");
	const store = tx.objectStore("messages");
	const cursor = store.openCursor();

	const result = [];
	while (true) {
        const cresult = await promisifyRequest(cursor);
        if (!cresult?.value) break;

		if (new Set(tokenize(cresult.value.text).map(stemmer)).isSupersetOf(qTerms) {
			result.push(cresult.value);
		}
		cresult.continue();
	}

	return result;
}

Even though we aren’t doing anything fancy with the database yet, there are still a lot of important building blocks here

First we tokenize the query. This means to chunk it up into “words”. Words don’t have to be exactly real words, they can be consistent parts of words or anything else, so long as you tokenize your query and document text in the same way. Here we use a simple strategy where we trust the \b word-break regex pattern, strip any extra whitespace, ignore any empty words or words made of only non-word characters (again, as determined by \w, nothing fancy), and also ignore any stopwords. A “stopword” is a very common word that is not useful to include in the search query such as “and”. Mostly, stopwords are useful to avoid blowing up the index size later, but we include it here for now for consistency

Next we stem the tokens. This is optional and depends on the kind of search you’re trying to build. The purpose of a stemmer is to make searches for eg “flying” also match messages containing the word “fly”

Then we iterate over all the items in the object store. If you wanted the results in a particular order, you could instead iterate over an index on the store in a particular order. We tokenize and stem the text from each item and check if it contains all the “words” from qTerms, if so it is part of the results.

Use an Index

Now what if you have many messages (one million or more, perhaps) and the simple table scan is just too slow. How can we speed this up? With IndexedDB we only get one kind of index: an ordered index, usually build on a B-Tree. This is not exactly what many full-text indexes are built on, but still we can get a very big performance boost without too much more complexity.

First in the onupgradeneeded migrator we need to create the index:

tx.objectStore("messages").createIndex("terms", "terms", { multiEntry: true });

This is a “multiEntry” index which means that the index will get one entry for each item in the array stored at this key, rather than indexing on the array as a whole piece. So when we store a new message we need to include the terms as an array:

tx.objectStore("messages").put({ text, terms: [...new Set(tokenize(text).map(stemmer))] });

Now, this does not let us search by our full query, but rather only by one word. How does this help us? Well, we can iterate over only documents which match at least one of the terms in our query. Which one should we pick? Counting items is pretty fast, so let’s pick whichever one has the smallest number of results:

async function search(q) {
	const qTerms = new Set(tokenize(q).map(stemmer));
	const tx = db.transaction(["messages"], "readonly");
	const store = tx.objectStore("messages");
	const index = store.index("terms");
	// Figure out which search term matches the fewest messages
	let probeTerm = null;
	let probeScore = null;
	for (const term of qTerms) {
		const score = await promisifyRequest(index.count(IDBKeyRange.only(term)));
		if (!probeTerm || score < probeScore) {
			probeTerm = term;
			probeScore = score;
		}
	}
	// Using the smallest list of messages that match one term
	// Find the ones that match every term
	const result = [];
	const cursor = index.openCursor(IDBKeyRange.only(probeTerm));
	while (true) {
		const cresult = await promisifyRequest(cursor);
		if (!cresult?.value) break;
		if (new Set(cresult.value.terms).isSupersetOf(qTerms)) {
			result.push(cresult.value);
		}
		cresult.continue();
	}

	// Sort results
	return result.sort((a, b) => a.timestamp < b.timestamp ? -1 : (a.timestamp > b.timestamp ? 1 : 0));
}

The operation to count the index for each term is pretty fast, but if you found this prohibitive, you could also store these counts in their own keys and update them as you insert new messages. Once we know which is the smallest, we then do the same scan as before, but only over that much smaller subset. The tokenized and stemmed terms are stored so we can compare against those directly here rather than doing it again.

Now if we want it sorted we have to do it ourselves, just like a DB engine would with this kind of query where the order we want does not match the index order.

On a test set of one million messages, this simple index was enough to take the performance from unusable grinding to almost instant responses, since the number of messages in the smallest term is usually still well under 10k.

Featured Image

Newsletter: Cheogram App Options

singpolyma@singpolyma.net

Hi everyone!

Welcome to the latest edition of your pseudo-monthly JMP update!

In case it’s been a while since you checked out JMP, here’s a refresher: JMP lets you send and receive text and picture messages (and calls) through a real phone number right from your computer, tablet, phone, or anything else that has a Jabber client.  Among other things, JMP has these features: Your phone number on every device; Multiple phone numbers, one app; Free as in Freedom; Share one number with multiple people.

End of a Year

Wow, another year has come and gone. Not many newsletters this year, but rest assured we’ve just been too busy building to write! We know one of the biggest questions we get every year is about a Cheogram-like app for other platforms, and this year we’ve come much closer to having two more of those ready for release, besides also maintaining everything you’ve already come to know and love. Below are a few highlights.

Port-out PIN Self-Service

Users can now set their own port-out PIN through the account settings bot, only shown to users with a number eligible for automatic set up (#fe75c33, #a953751, #a18b326).

Data-Only Registration

Support for data plan registration during sign-up process without a phone number (#5cdf6c0).

Cheogram WWW

We’ve been working for years on a browser-based Progressive Web App client for the Cheogram family, based on Borogove. Community members have been testing various versions of this under many names, but this fall we finally began alpha testing under the Cheogram WWW name at app.cheogram.com. Expect more to come for this app, but it is already very usable for many of the needs JMP customers have, including a Command UI / “app” view for account settings. It is also one of the only browser-based apps with native multi-account support.

You’ll note above I said “browser-based” and not quite “web app”. There is no server side component required for this app, as it connects directly to your Jabber service. This requires a special browser protocol to be supported by the service, and a few other things are needed for it to work very well. Of course we’ve worked with Snikket to make sure their offering supports everything needed for a best in class experience and more.

This app works well in a desktop/laptop/tablet form factor, but also has a mobile-optimized view. Along with support for Web Push notifications (if supported by your Jabber service; of course latest Snikket has support) this can make it a viable option on mobile platforms without a good native solution yet (to try this on iOS you’ll need to use the “add to home screen” option for notifications to work). The biggest limitation for Web Push is it cannot make a device “ring” so if you get a call while the app is not open you’ll get only a simple notification like for a message which is not always ideal.

Cheogram iOS

Also in alpha testing starting earlier this fall is Cheogram iOS. Also based on Borogove but with a native Swift UI and deeper OS integration than the PWA can muster, this app is still in a bit of an earlier stage than Cheogram WWW, but some very adventurous people are daily driving it already. Come by the community chat if you want a TestFlight link.

Distribution for Cheogram Apps

Cheogram apps area also making some changes to official distribution mechanisms. For Android the most recommended and official distribution will of course remain F-Droid. And for people who need it the app will remain available on Play Store as well. Pre-release debug builds will still be distributed in the community and custom repo. So what is changing? There will no longer be official distribution of debug APK builds tied to releases. This practise has, quite honestly, been confusing to many people who expect release-tied builds to be release builds. Releases will now come with official distribution of first-party release builds for sale at itch.io (free to JMP customers or with a free JMP month for non customers along with the purchase). Builds of future Cheogram app releases (including Cheogram WWW, desktop, etc. releases) will also be available as part of the itch.io package. Android apps from itch.io can be kept up to date using Mitch.

Other alternative app stores and distributions we have supported in the past (such as Aptoide) will no longer be official.

Unfortunately this does mean that anyone running the release-tied debug builds will either need to move to pre-release or to the new release builds in order to get updates.

Selected Recent Cheogram Android Changes

UI Improvements

  • Default options in command grids look less like headers (#b156e93)
  • Fixed account colors on item lists including start conversation (#957df4f)
  • Reorganized contact details layout for better narrow device support (#8780945)
  • “Manage Phone Accounts” button now scrollable (in list footer) (#b3bcfbb)

Chat & Messaging

  • Users can now see themselves in group chat participants list (#f240e52)
  • Can view own hats/roles in conference details (#3984516)
  • Improved button labels in group chat context menus (#37c504b)
  • Enhanced call failure UI with more informative displays (#4985523)

Connectivity

  • Fixed backup import functionality (#575cdff)
  • Improved password change flow for unlocked accounts (#8863066)

Stability

Group Chat

  • Added workaround for Snikket’s unavailable presence handling (#7e157f5)
  • Fixed menu handling on tablets (duplicate actions issue) (#c2c37f2)
  • Better JID escaping for improved compatibility (#b11972b)

QR/Barcode

  • Enhanced barcode compatibility (use ASCII where possible) (#e68d564)

To learn what’s happening with JMP between newsletters, here are some ways you can find out:

Thanks for reading and have a wonderful rest of your week!

Featured Image

Newsletter: (e)SIM nicknames, Cheogram Android updates, and Cheogram iOS alpha

amolith@secluded.site

Hi everyone!

Welcome to the latest edition of your pseudo-monthly JMP update! (it’s been 7 months since the last one 😨)

In case it’s been a while since you checked out JMP, here’s a refresher: JMP lets you send and receive text and picture messages (and calls) through a real phone number right from your computer, tablet, phone, or anything else that has a Jabber client. Among other things, JMP has these features: Your phone number on every device; Multiple phone numbers, one app; Free as in Freedom; Share one number with multiple people.

Alerts for incoming messages blocked by Original route

The partner that serves our Original route has for some time been censoring some incoming messages, meaning messages from friends and family to you might occasionally be blocked. We have finally managed to get them to tell us when this happens and so we now relay an alert to you, so you can know this has happened and ask your contact to try rewording their message. Reminder that we do offer other routes for those having issues with this. Contact support if this interests you.

(e)SIM nicknames

If you have multiple (e)SIMs through JMP, keeping track of which is which by its ICCID can be a pain. Now you can give each a nickname by opening commands with the bot, tapping 📶 (e)SIM Details or sending the sims command, then selecting Edit (e)SIM nicknames

Some updates to Cheogram Android this year

  • Scanning a Snikket invite works for new accounts
  • Search UI for emoji reactions (including custom emoji)
  • Display notifications for calls missed while offline
  • Don’t clear message field after uploading something
  • Allow selecting text in command UI
  • Initial support for community spaces
  • Show dot on the drawer for unseen, unread messages like chat requests
  • Second message edits no longer treated as separate messages1

Inherited from upstream Conversations

  • Conversations 2.18.0
    • Select backup location
    • Make more URIs, like mailto:, clickable

Cheogram iOS

We’ve been working on an EXPERIMENTAL native client for iOS using Borogove (previously called Snikket SDK). It’s available through Testflight for the adventurous, and push notifications require a Snikket server running the dev version for now. Contact support if you’re both interested in testing it and willing to provide feedback.

JMP at FOSSY

We sponsored FOSSY 2025 and had a great time meeting community members! After giving a few talks, having fun at the social, and selling some subscriptions, (e)SIMs, and (e)SIM adapters, we’re looking forward to seeing everyone again next year in Vancouver, Canada!


To learn what’s happening with JMP between newsletters, here are some ways you can find out:

Thanks for reading and have a wonderful rest of your week!


Amolith
https://secluded.site

Creative Commons Attribution ShareAlike