JMP

XMPPTwitterReddit
Featured Image

Newsletter: Cheogram Android Release, Matrix Alpha

singpolyma@singpolyma.net

Hi everyone!

Welcome to the latest edition of your pseudo-monthly JMP update!

In case it’s been a while since you checked out JMP, here’s a refresher: JMP lets you send and receive text and picture messages (and calls) through a real phone number right from your computer, tablet, phone, or anything else that has a Jabber client.  Among other things, JMP has these features: Your phone number on every device; Multiple phone numbers, one app; Free as in Freedom; Share one number with multiple people.

This month marks the first release of the Cheogram Android app we have been sponsoring.  This app is a fork of the excellent Conversations, and will stay close to upstream going forward.  Some of the improvements relevant to JMP users include:

  • Add contacts without typing @cheogram.com
  • Integrate with the native Android Phone app (optional)
  • Address book integration
  • Messages with both media and text, including animated media
  • Unobtrusive display of subject lines, where present (such as on voicemails)
  • Links to known contacts are shown with their name (improves group text UX)
  • Show timestamps for calls
  • Missed call notifications

All of these features have been built in a standards-compliant way and do not rely on anything particular to Cheogram or JMP at all, so they could be reused with other gateways as well.  You can also get the app from F-Droid.

In other news, we’ve heard for some time that some users want to try using JMP from Matrix.  Since we are so big on bidirectional gateways, we have decided to add support for signing up with JMP using a Matrix ID.  This should be considered an alpha test at this time, and most notably voice does not work with the gateway yet so you will need to use SIP (or forwarding) for voice if you use a Matrix ID.  SMS, MMS, and Voicemail will all be delivered to Matrix just as they are to Jabber.  For this we are using the excellent gateway instance at aria-net.org.  To get started, just head to JMP.chat, choose a phone number, and select “I am a Matrix user” on the next page.

To learn what’s happening with JMP between newsletters, here are some ways you can find out:

Thanks for reading and have a wonderful rest of your week!

Art in screenshots is from Pepper & Carrot by David Revoy, CC-BY. Artwork has been modified to crop out sections for avatars and photos, and in some cases add transparency. Use of this artwork does not imply endorsement of this project by the artist.

Featured Image

Why Bidirectional Gateways Matter

singpolyma@singpolyma.net

A big part of the vision of Sopranica, and Cheogram in particular, is bidirectional gateways.  A bidirectional gateway is one that allows (at a minimum) any user of either protocol to contact any user of the other protocol without creating an account.  This is not possible with all protocols, but works well when both sides are federated.

Simple Example

Take for instance sip.cheogram.com, which is a bidirectional gateway between XMPP and SIP.  Any federated Jabber ID can communicate with any federated SIP URI with no configuration at the gateway.  This is possible because every valid SIP URI is assigned a Jabber ID of the format xmpp:user\40domain.tld@sip.cheogram.com and every Jabber ID is assigned a SIP URI of the format sip:user%40domain.tld@sip.cheogram.com.

Contrast this with irc.cheogram.com, which is not a bidirectional gateway even though IRC is an open protocol, due to the non-federated nature of that protocol.  While every IRC channel and nick is given a Jabber ID, not every Jabber ID can be given a channel or nick on every IRC network out there, and even to do it on a single network would require creating many connections or a special peering arrangement.  Using the Jabber ID assigned to an IRC channel may require registering a nick with that IRC network and configuring the associated password at the gateway.  It works well enough, and is quite useful, but it’s not bidirectional.

User Experience

One of the big advantages of a bidirectional gateway is the seamless user experience for those who know the gateway exists.  Instead of asking “is this room bridged to protocol X” or “do you also have an address on protocol Y” the existance of the bridge is sufficient to know that, yes, with no extra setup by either party, communication will be possible.  One does not need to convince users to switch to the favored protocol, or bend by creating an identity with the other’s favored protocol, but simply to add the other party directly.  Users with Jabber IDs can advertise how they may be contacted via SIP, SMTP, Matrix, SMS, and more without the other party thinking anything more than “this address looks a bit long”.

Raising the Whole Network with Chaining

Because a high-quality bidirectional gateway effectively makes one network out of two networks, any service or gateway added to either network can be used from both sides.  Thus, Matrix, SMTP, or even SMS users can get phone numbers from JMP.  Even further than that, a Matrix user could advertise an SMTP or SIP contact address using the Cheogram gateways, all without any SMTP or SIP gateway needing to exist for Matrix at all.

Stable Addresses

If someone is going to give out an address that goes via a gateway, they need confidence that this address will not need to be changed.  So long as their main address on their preferred protocol remains the same, so should their address on other protocols.  This requires a stable DNS name with gateways that are open to anyone, free of charge.  That is the vision behind Cheogram, an infrastructure project inside of Sopranica to maintain stable addressing for bidirectional gateways.

Conclusion

Obviously there is still lots of work to do. Most of the gateways mentioned in this post are missing important features they could have in order to facilitate more seamless communication.  Clients of every protocol can gain features to make using a bidirectional gateway a more obvious choice for users.  Unique use cases need more testing to find where the rough edges are.  Cheogram infrastructure is supported in part by JMP, but can always use financial support.  Together we can help people connect to all their contacts.  Come join us.

Featured Image

Newsletter: JMP is 5 years old today, and now with international calls!

denver@ozg.ca

Hi everyone!

Welcome to the latest edition of your pseudo-monthly JMP update!

In case it’s been a while since you checked out JMP, here’s a refresher: JMP lets you send and receive text and picture messages (and calls) through a real phone number right from your computer, tablet, phone, or anything else that has a Jabber client.  Among other things, JMP has these features: Your phone number on every device; Multiple phone numbers, one app; Free as in Freedom; Share one number with multiple people.

Today JMP turns 5 years old!  We launched on this day in 2017 with a fairly basic set of features: bidirectional Canada/USA SMS and MMS, with incoming calls directed to a pre-existing phone number.  Since then we’ve kept pushing further and further toward feature-parity with the most popular mobile phone number providers, adding outgoing calls, international texting, and group texting.

We have been actively testing the final piece of this work, international calling, and are pleased to announce today that we made it!  With the addition of international calling, which is available to all JMP users as of right now, we are now phone-feature-complete.

What does this mean?  In short, we now support all the features that other providers of mobile phone numbers support: bidirectional MMS (including group texting), domestic and international SMS, and domestic and international calling.  So if you or your friends have been waiting to try JMP until we finished all these features, now is the time to get the word out and try it!

On top of international calling, we now support using as many minutes per month as you like (with 120 Northern America minutes worth of credit included by default) - see our new comprehensive pricing page for details.

You may be wondering: Where do we go from here?  Well, we still have much work to do, and many more newsletters like this one coming up to let you know about it!  There are two main areas of work left:

First, we want to keep our finger on the pulse of telephony, specifically making sure that we add any features that mobile carriers are adding to their phone numbers, like RCS.  If we find such a feature supported on at least Android and iOS, we’ll get it into JMP as soon as possible (for RCS, the rollout is limited, and only on Android so far - if you know of other phone number features we’re missing, reply to let us know!).  Also, we have not forgotten about the many kind requests we’ve received for JMP numbers outside of the USA and Canada.  Those are on our list!

Second, we want to make JMP as easy to use as possible.  There is still much work to do here, particularly on simplifying signups, adding and managing contacts, and group texting.  We want to make sure that group text threads are easier to follow, which requires a little bit of development work yet.  And we are actively working on improving the Cheogram app that many of you have started using on Android, to include these fixes and an easy way to signup for JMP.  In the not too distant future you’ll be able to install the Cheogram app, pick a JMP number from the app, and start using it right away.

In other news, Dino 0.3 has been released which brings with it voice calling features compatible with those used by Converations, Snikket, Movim and JMP! It’s still early days and so there may be some compatibility issues, but reports say the features are working for people.

Lastly, as part of JMP’s 5 year anniversary, we are unveiling our new logo today too!  You can check it out at the usual places, such as https://jmp.chat/ and elsewhere.  Our old logo (which is still live at its old location for the nostalgic) has served us well for the past half decade, and we appreciate all it’s done for us.  The new logo reflects how increasingly close we’re getting to JMP being easy to use for the average person, and hopefully our approachableness.

As always, we’re very open to feedback and would love to hear from you if you have any comments, questions, or otherwise.  Feel free to reply (if you got this by email), comment, or find us on any of the following:

Here’s to many more years of JMP!

Newsletter: Snikket Hosting, Billing Overage Limits

singpolyma@singpolyma.net

Hi everyone!

Welcome to the latest edition of your pseudo-monthly JMP update!

In case it’s been a while since you checked out JMP, here’s a refresher: JMP lets you send and receive text and picture messages (and calls) through a real phone number right from your computer, tablet, phone, or anything else that has a Jabber client.  Among other things, JMP has these features: Your phone number on every device; Multiple phone numbers, one app; Free as in Freedom; Share one number with multiple people.

We have made a deal with Snikket CIC to include their Jabber hosting as an option for all JMP customers.  Anyone with a paid-up JMP account may get an instance hosted by Snikket CIC at no extra charge.  Your Snikket instance will be covered by your usual JMP monthly subscription.  This will eventually be the default option for new users without a Jabber ID as part of our signup flow.

Many customers have asked us how they can use more than the included 120 voice minutes each month.  Starting soon, JMP will let you pay for as many minutes as you want, and along with this will be coming outbound calls to international destinations as well.  Note that users still paying with the old PayPal system will not be able to access more than the included 120 minutes or international calls, so now is a great time to make the switch.

We have already published what the initial calling rates will be, so go check that out to get an idea of costs.  Once the new pricing goes live, we will convert the 120-minute limit into a dollar equivalent that will give you 120 minutes to the USA and Canada, but will also allow you to use that included limit to cover international calls if you prefer (or a combination of both).

Speaking of costs, we know a concern for many people is getting billed for more overages than they can afford.  Of course, JMP being pre-pay helps with this a little, but if you want to pre-pay for many months and know you won’t accidentally burn up the credit, what can you do?  We have implemented a per-calendar-month overage limit system where you decide the limit.  The default for all customers is $0/month for now (so you would continue to pay the same amount every month, with no overages), but you can raise your limit using the plan settings command in your account settings so that you can take advantage of these upcoming features.

To learn what’s happening with JMP between newsletters, here are some ways you can find out:

Thanks for reading and have a wonderful rest of your week!

Featured Image

How to use Jabber from SMS

singpolyma@singpolyma.net

The Soprani.ca project, and Cheogram in particular, is pretty big on bidirectional gateways.  The most popular Cheogram-hosted instance, so popular that it gets to own Jabber IDs on cheogram.com, is a bidirectional gateway to the telephone network.  How is it bidirectional?  Don’t you need a Jabber ID to use it?  Of course not!

Sending a Message

From any SMS-enabled device, add +12266669977, which is the gateway’s phone number.  Send the following SMS:

/msg someone@server.tld Hello!

The user with Jabber ID someone@server.tld should shortly receive your message.  If they reply, what you see will depend on their relationship to the gateway.  If they have a backend route set (such as JMP, Vonage, or Twilio) then you will get an SMS from their associated phone number.  If not, you will get a message from the gateway’s number like this:

<someone@server.tld says> Oh, fun!

Joining a Chatroom

An SMS user can also join exactly one chatroom at a time.  Send this to the gateway’s number:

/join someroom@conference.server.tld

You should receive a message with the current list of participants, after which you will start seeing messages sent to the room.  After this point, any SMS send to the gateway’s number that is not a valid command (such as /msg) will be sent to your joined room as a message.  You can send /help at any time to get a list of other commands for leaving, setting your nickname, etc.

Making a Voice Call

To call a Jabber ID, first enter it into this form then dial one of the access numbers and follow it up with the extension generated by the form.

The extensions are often very long, so the easiest way to dial them on Android is to create a contact with a phone number of the form:

+access_number,*10816etc

If you have trouble with one access number, try another one.  If the Jabber ID you wish to call is very long some access numbers may time-out waiting for you to dial all the digits.

Newsletter: New website, new forums, new app feature

singpolyma@singpolyma.net

Hi everyone!

Welcome to the latest edition of your pseudo-monthly JMP update!

In case it’s been a while since you checked out JMP, here’s a refresher: JMP lets you send and receive text and picture messages (and calls) through a real phone number right from your computer, tablet, phone, or anything else that has a Jabber client.  Among other things, JMP has these features: Your phone number on every device; Multiple phone numbers, one app; Free as in Freedom; Share one number with multiple people.

This month we launched a full rewrite of our website.  While this rewrite was mostly precipitated by changes at our primary carrier partner, we managed to get a few improvements in there as well.  First of all, the search box on the homepage now accepts more than just area codes: a city then comma then two-letter state or province code is accepted, as well as zip codes or vanity patterns (like ~woof or ~1234).  This search is also now powered by XMPP commands in the backend, so you can now get a JMP number entirely without ever visiting the website, just talk to cheogram.com and send register jmp.chat to get started.

Next, our community has been testing more features in the pre-release Cheogram app.  This app is available to anyone who wants to test it by coming by the chatroom and asking for access.  When the app is a bit more ready, it will be released on F-Droid, hopefully in Q1 2022.  The big new feature right now is dialer integration.  This allows anyone using the app with cheogram.com added to their contacts to head to their native Android dialer and visit Settings > Calls > Calling accounts in the menu.  From there it should be possible to enable the calling account associated with your Jabber ID and then dial out directly from your native Android dialer app over JMP!  Any questions about this feature or the Cheogram app in general should be directed to the chatroom.

As our community grows it makes sense to reach more people where they are at, and not just hang out where we are most comfortable.  This month the freedomware project we sponsor and rely on, Sopranica, is opening two more venues where you will be able to get news from JMP or discuss the project generally: Lemmy and Reddit.  Here is the new complete list of official ways to communicate with our community:

Thanks for reading and have a wonderful rest of your week!

How to Subscribe to This Blog Using Movim

singpolyma@singpolyma.net

This blog is powered by XMPP. That means it is federated over the Jabber network, it has a Jabber ID, and you can subscribe to it using a supporting Jabber client. One such client with support for subscribing, liking, and commenting is Movim. There are several public Movim instances including one hosted by upstream and one by chatterboxtown. Once you are logged in with a Movim instance, you can follow these steps to discover this blog and subscribe.

  1. Click Explore

    in the left navigation menu
    in the left navigation menu
  2. Click

    Communitites Servers
    Communitites Servers
  3. If blog.jmp.chat is not yet known to this instance, you can use the search box to add it

    Search for a new server
    Search for a new server
  4. After entering blog.jmp.chat in the search box and hitting enter, you may need to click away to any other Movim page and then come back to the Communities Servers area as before.

  5. Click

    blog.jmp.chat
    blog.jmp.chat
  6. Click

    JMP Blog
    JMP Blog
  7. Click

    subscribe
    subscribe
  8. New posts from the blog will now show up under News

    in the left navigation menu 

Newsletter: Action required for SIP accounts, new inbound call features, and more!

singpolyma@singpolyma.net

Hi everyone!

Welcome to the latest edition of your pseudo-monthly https://jmp.chat update!

In case it’s been a while since you checked out JMP, here’s a refresher: JMP lets you send and receive text and picture messages (and calls) through a real phone number right from your computer, tablet, phone, or anything else that has a Jabber client. Among other things, JMP has these features: Your phone number on every device; Multiple phone numbers, one app; Free as in Freedom; Share one number with multiple people.

The biggest announcement this month is the launch of our new inbound voice and SIP account system! Due to changes at our major carrier partner, all inbound call handling had to be rewritten and the SIP accounts some people use are moving to a new server with a new server name. As part of this rewrite you can now use the configure calls command to set call forwarding to any XMPP, SIP, or tel URI without involving support. If you haven’t used the JMP bot before, you do so by sending a message to your cheogram.com contact with the text of the command you want to run. You can send help for a list.

If you have not tried it yet, now would be a great time to try our features allowing calling from your Jabber account. All your regular SMS contacts can be called as well with no changes on your part, from any client that supports voice calls. Inbound calls can be routed to your Jabber ID using the configure calls command.

If you still need a SIP account for some reason (such as to use with a device that does not support Jabber calls) you will need to use the reset sip account command to get a username and password on the new server, as the old server will be going away soon. Be sure to use UDP as the transport!

In other news, our founder Denver Gingerich (ossguy) has returned from his leave and is rejoining us in day-to-day operations. You will see him more often in the chatroom and sometimes answering support.

There has also been a bit of movement on the mobile app front. We have been partially sponsoring development work on the now-released Snikket iOS which is now our recommended client for all iOS users. When paired with a Snikket server this client should receive calls and messages reliably, and also supports DTMF (entering digits for phone menus) during calls.

We’ve also had a volunteer working with us to clean up some of the features in our prototype app for Android. Not many visible changes yet (except for a much better icon to open the DTMF pad) but watch this space for updates.

As always, if you have any questions, feel free to reply to this email or find us in the group chat per below. We’re happy to chat whenever we’re available!

To learn what’s happening with JMP between emails like this, here are some ways you can find out:

Thanks for reading and have a wonderful rest of your week!

Newsletter: Blog, New Registration, New Billing, New App!

singpolyma@singpolyma.net

Hi everyone!

Welcome to the latest edition of your pseudo-monthly JMP update!

In case it's been a while since you checked out JMP, here's a refresher: JMP lets you send and receive text and picture messages (and calls) through a real phone number right from your computer, tablet, phone, or anything else that has a Jabber client. Among other things, JMP has these features: Your phone number on every device; Multiple phone numbers, one app; Free as in Freedom; Share one number with multiple people.

In case you haven't seen it yet, we now have an XMPP-powered blog! All newsletter updates, as well as other content like technical deep-dives will be published there. If you just want these updates, don't worry, the mailing list isn't going away. You can check out the blog at https://blog.jmp.chat and follow in your RSS reader or compatible Jabber client such as Movim or Libervia.

JMP also has a new registration flow. This flow properly integrates with our new billing system and represents a lot of behind-the-scenes work to our architecture. The most important part of the new billing system is the referral system. That's right, JMP users can now get single-use invite codes to refer users. The new user gets one free month, and if they decide to upgrade to a paid account the original user will get a free month of credit too! XMPP server operators for closed or vetted groups can also contact support to ask that their server be added to an approved list where all Jabber IDs coming from that server will be given a free month, with the resulting credit if they upgrade going to the server operator.

Speaking of our new billing system, many users have been fully migrated to the new architecture which says goodbye to PayPal and hello to automated credit card and Bitcoin deposits, as well as official support for payment by mail or (in Canada) Interac e-Transfer. Payments can also be made in Bitcoin Cash by contacting support. Users on the new system now have a prepaid balance they can top up any time they like, with the option to automatically top-up a low balance with any amount $15 or more from credit card. Deposits over $30 get a 3% bonus added, and deposits over $140 get a 5% bonus. This paves the way for calling minutes over 120 / month (which will soon be available at the rate of $0.0087 / minute) and also international calling at per-minute rates to be announced later this year. Those who prefer to pay the same amount every month or year, as is done with our legacy PayPal system, will need to wait a bit until we integrate that option into the new system.

We've also had a volunteer working with us to prepare some new features for Android users, most notably DTMF (punching in numbers during a call) so that all phone calls can be done from inside Conversations. The code isn't quite ready for upstream yet, but drop by the chatroom if you want to try out a prototype.

As always, if you have any questions, feel free to reply to this email or find us in the group chat per below. We're happy to chat whenever we're available!

To learn what's happening with JMP between emails like this, here are some ways you can find out:

Thanks for reading and have a wonderful rest of your week!

Featured Image

Adventures in WebRTC: Making Phone Calls from XMPP

0@psycoti.ca

Normally when I'm writing an article, I prefer to not front-load the article with a bunch of technical stuff and instead bring it up organically. This article is different, though, and if anyone is going to get anything out of this I've got to set up a bit of background, so bear with me. I'll go into more detail on these things when it makes sense, but I have to at least introduce the players in our little play.

First, we have calls. You probably know about these, phones aren't exactly new, but I want to clarify that these are voice calls, as you may expect from a phone. One person calls the other, it rings, they pick up, and then audio conversation can be had.

Second, we have XMPP which is a chat protocol that's been popular in the FOSS (Free and Open Source Software) community for years. It has also made appearances in various commercial offerings, from Google's GTalk, Facebook's Messenger (at one point), and others. The big feature XMPP has compared to other chat protocols is that it's a standard, which means there are multiple implementations of both clients (the program the user uses) and servers (which host the user's account). They also have a large collection of extensions which extend the standard to provide more features which clients and servers can choose to implement. Also importantly it's "federated", which means users of one server can talk with users of another server seamlessly.

Third, we have Jingle, which is one of the previously mentioned extensions that allows two XMPP users to setup a peer-to-peer data connection, in this case for exchanging voice data.

Fourth, we have Asterisk. Asterisk is an open source telephone system. You can use it to receive or send phone calls, setup phone trees and extensions, etc. Because it supports many different protocols for sending voice data it can be used to connect a call from one protocol, like Jingle, to another protocol.

And finally, JMP, which is the company I'm working for. JMP integrates XMPP with the mobile phone network, giving XMPP clients a user they can contact representing any phone number, and we'll turn the chats and Jingle calls into SMS messages and phone calls and vice-versa. This allows JMP's customers to use a client of their choice, across mobile devices and desktops, to communicate with people who are still using SMS and traditional telephones and haven't moved to XMPP yet.

Ok, now we can establish the starting point of our story. We have SMS working in both directions already, and thanks to the work of my co-worker singpolyma and a user named eta's patches, we already had phone calls coming in properly. But allowing our users to call-out to phone numbers, that is, from XMPP to cell phones, wasn't working yet. We figured it was just a small tweak to the existing setup, so I set out to find the simple change that was required.

And now, over a month later, here's the path I went through, with gratuitous technical details along the way.

My Initial Testing Setup

It started out pretty simply. I have an app on my phone, Conversations, which is one of the XMPP clients we recommend to our users. I tested that I could receive calls, and everything worked out well. But, when I tried calling a special user created just to test outbound calls my phone would ring, and I would answer it, but the app would just see "connecting" forever.

I tested a normal Jingle call to another user of the same app, and it worked in both directions, so I knew the app worked fine. I didn't see anything in the logs from my phone to describe what might be the problem, so I had to look somewhere else. There were some logs in Asterisk, which is acting as a bridge between Jingle and the phone network, but nothing that stuck out right away. In order to get a more information about what was actually happening, I wanted to get into the code and put my own debugging logic. Luckily, Conversations is FOSS, and so that's very possible. But, it was also inconvenient to install a development build of Conversations onto my phone, because that would replace the app I used every day. I also wanted to use more tools I had on my computer, so I decided to install Movim instead. Movim is another XMPP client that supports the same voice calling features, but it runs on the computer. In a web-browser, in fact. Normally it would run as a hosted setup, where someone runs a Movim server for you out on the web, and you just connect to it as a web-page, but given what I wanted to accomplish I had to run the server myself on my computer so I can get to both sides of this conversation. A quick test confirmed that Movim could send and receive XMPP-to-XMPP calls just fine, but had the same issue with XMPP-to-Phone calls, so it wasn't just a Conversations bug.

I was now ready to start actually digging into this problem.

Initial Jingle Debugging

XMPP, as a protocol, works by sending XML stanzas over a connection. This means it's very easy to just look in some logs and see what's actually going on. For example, if I were to send a normal message it may look something like this:

<message type='chat' id='dd75a234-8f44-46ff-bd37-c878d04aef92' to='user@example.com'>
    <body>
        Hello!
    </body>
</message>

That makes it pretty easy to debug, which is why that's where I started. My initial theory was that there was something wrong with the messages being forwarded back and forth from my users to Asterisk that was making it impossible for them to talk to each other. To investigate that, I changed my local version of Movim to log every XMPP blob it sent and received, so I could inspect them with my human eyes and brain to make sure they look legitimate. To my disappointment they did appear to be relatively sensible on both the Asterisk and Movim side; the information they exchanged seemed to contain accurate addresses and formats. It would have been nice to have been done here.

Oh well! Undeterred I traced through the Movim code to figure out where the information eventually ended up. Somewhere in the code Movim must use these exchanged addresses to establish the voice connection. Eventually I got down to a line in the front-end JavaScript that just took the result, changed it from the format of the data and then called setRemoteDescription on some peer connection object. That's part of the WebRTC APIs. Crap.

Detour 1: WebRTC

WebRTC is a set of APIs supported by modern web browsers that allow users to establish peer-to-peer connections. Normally when a user is visiting a webpage, even a page that gives the appearance of interacting with other users, that interaction is actually done through the web server. To illustrate the difference, let's assume that two people, named Romeo and Juliet, are using a webpage to chat. Romeo visits the webpage; contacting the server and requesting the chat page, which is sent to his browser to show him a box to type things in and the messages he's already received. He sees a new message from Juliet, so he types his response into the box and hits send. What actually happens is that Romeo's browser sends the message he typed to the server along with information about who should receive the message. The server will take the new message and store it along with the other messages in the list of messages Juliet has received. At some point in the future Juliet will load this page and in doing so the server will send her all the messages that have been stored for her, including this new one. If the site is fancy, she won't even have the refresh the page to get them! Maybe it's periodically asking the server if there are any new messages so long as she's on the page, or maybe there's even a WebSocket waiting to be pushed messages, but in either case Romeo's message goes from his computer, through the server to get stored, and then is retrieved from that server when Juliet's computer requests the messages.

That's fine for occasional, short, bits of text or even the occasional picture. But if we imagine that Romeo instead wanted to start an audio or video chat, that's a whole other thing. First of all, it's a lot more data and it's constantly generating more and more data as the call goes on. And that information is actually doubled, because Romeo has to send it to the server, and then the server has to put it somewhere, and then Juliet has to also pull it back down. So it would be nice for the server operator if they could send the messages directly to each other, rather than involving the server.

The second reason is latency, or how long it takes for a sent bit of data to be received. Even if the server could handle all the data Romeo was sending it, it will usually take longer for the data to go from Romeo, up to the server, then from the server down to Juliet, versus just going directly from Romeo to Juliet.

So the way WebRTC works is that the user's browser has support for a bunch of standards for forming peer-to-peer connections (person-to-person, user-to-user, browser-to-browser, whatever you want to call them). If Romeo wanted to start a video call with Juliet, he would send her a special message through the server as normal, but this message would contain information on how Juliet could contact him directly. If she wants to talk with him, she would respond with a special message of her own (also through the server) containing information on how she could be contacted directly. Code on each of their webpages would take the special direct-contact information the other party sent them and give it to the browser through the setRemoteDescription method I mentioned earlier to signal to the browser that it would like a direct connection to be established. The page doesn't have to know or care how that happens, it will just be told when it works or doesn't. And life will be good.

Back to Jingle

Ok, so if life should be good, why was I unhappy that the information I was tracing was going directly into setRemoteDescription? Well, because life wasn't good and things weren't working. And more importantly, it wasn't as easy to figure out what's going on in WebRTC. I had the code for Movim, I'd already changed it to get extra diagnostic information and I could easily add more code to tell me more things, but the browser's implementation of WebRTC is a bunch of C++ that's built directly into the browser. Even though the browser is FOSS, so the code is available, it's still much harder to make a change to it, build a new version of just that part, integrate that into the entire browser, and then build all of that just to test one small change.

So I wanted to avoid that if I could. For now, things were still ok. WebRTC is something the browser knows about, so it should have some kind of tools for WebRTC app developers to help debug things. I was using Firefox so I searched for "Firefox WebRTC debug" and got this. I installed that tool and was instantly disappointed. It told me essentially nothing that I couldn't have found out already. It doesn't give any insight into the inner workings of WebRTC, it just gathers up the events and properties any app could have subscribed to, and subscribes on my behalf, giving me a list of updates. I guess that's better than having to write that code myself, but in this case I just saw a list of updates that seemed reasonable, and still nothing worked. Not very helpful.

screenshot of Firefox's WebRTC debugging UI

The situation looked a little better on Chromium, so I switched over to that. There's a built-in special URI chrome://webrtc-internals/ that provides on the left of the page the same log of events that Firefox's plugin did, but on the right allowed me to peek inside all the various data structures in their current state and see what was going on. Finally I was getting somewhere!

screenshot of Chromium's WebRTC debugging UI

I started by comparing working calls (inbound calls) to broken calls (outbound calls), to get a sense of what was different about them. There were a few false starts here that ended up coming down to randomness instead. It seemed like inbound calls always had some events in one order, and outbound had them in another, so maybe it was a race condition there! But then 1 in 5 inbound calls would actually have them in reverse order and it would still work. It looked like maybe it would work only if srflx candidates were chosen (don't worry about what that means), but then it wouldn't again. I was grasping at straws trying to find a pattern. While I was looking through properties, I did notice one strange thing: one of the pieces of information that is exposed is the number of bits flowing through each candidate pair and each interface. I noticed as I was looking around that the peer connection's inbound data rate was 0, but one of the interfaces had data streaming into it. It just wasn't the interface that had been chosen for the connection... That was weird. It meant Asterisk was trying to send data to the wrong place. To understand what that meant we have to go on another detour!

Detour 2: NATs and ICE

So what are these "candidates", and "addresses", and why are we exchanging them, pairing them up, and choosing them? This is part of the standard called ICE (which has a newer version of the standard also called ICE). This standard also relies on another standard called STUN and optionally an extension to STUN called TURN. That all sounds pretty complicated, but what is it all for? The problem is NATs. First I'll give the basic, oversimplified, version of the problem. Then the basic, oversimplified, version of the solution.

The Problem: NATs

The way practically the entire internet works is that each device is given an address. Then, when one computer wants to talk to another, it wraps the data up into a packet, addresses the packet to the other computer, and then pushes it out the wire (or through the air) onto "the network". If the computer on the other end of that wire (AKA, the router) is the one the packet was addresses to then the packet is received! If not, but the router knows where to find that computer, the packet is sent along that wire and bounces along from computer to computer until it reaches its destination. There's a problem with this, though. In the original design for the internet the addresses were given enough digits to allow for roughly 4 billion different addresses in the best-case scenario. In reality it was much fewer than that. In the 80s that may have seemed like an enormous amount of computers, but these days a single human may have a laptop, tablet, desktop, and a phone. They also may have a thermostat, a fridge, a speaker, a television, and various light bulbs and sensors that are also on the internet. Even in the 90s a company may have had a building full of hundreds or thousands of computers, all for people that had their own computers at home. There was just more demand than expected. So, for this reason (among others) NATs became common.

The idea of a NAT is relatively simple. We have one side, the Local Area Network (LAN), and another side, the Wide Area Network (WAN). Think "inside my network" and "the internet". Inside my network I can give computers whatever addresses I want, so long as it's unique to only the computers that are inside my network. There are some strong recommendations on what kinds of addresses to give out, but technically I can do whatever. So if I have a LAN, and you have a LAN, we can both have computers on our LAN addressed "10.0.0.10", and that's fine because the address is "local" to our own networks. So if one computer on my local network wants to talk to another, it works just like I described before: "10.0.0.10" produces a packet for "10.0.0.20" and sends it along its cable. The router sees the packet for "10.0.0.20" and thinks "Ah, that computer is down this wire" and sends it along, and "10.0.0.20" receives it. All is as it was.

But what if "10.0.0.10" wants to talk to a computer on the internet? It addresses the packet the same as before, and sends it to the router the same as before. The router will look at the destination address and see that it's on the WAN side, which means the data has to go out onto the net. But, the internet is really only useful if packets can be responded to. I want to ask the internet for something, and get an answer back! But the router can't just tell the other side that this traffic is from "10.0.0.10", because that's a private "local-only" address. Multiple networks could have a computer with that address and there'd be no way to figure out which was which! So what the NAT does is to rewrite the packet to put the router's own WAN address as the source of the message before forwarding it on. That way, if the other side does respond, then the response will come back to the router, and the router can figure out which request that response was for, rewrite the destination to be the original sender of the packet, and then forward the response into the LAN, rewritten to look like it's for the proper computer. The final piece is that packets aren't just addressed to a computer, but a combination of computer and "port" so the computer can handle multiple independent connections and know which data is associated with which connection. So if a packet goes out from computer "10.0.0.10" and port "537", then the NAT might give it port "537", or "700", or "12432" on the WAN side. It can pick whatever it wants, what's important is that it remembers its choice. Then when a response packet comes back to port 12432 on the public side it can look up in it's table to see that really means 10.0.0.10 at port 537 on the LAN side, so it knows how to rewrite it properly.

So, that's lovely. Now network operators and consumer network equipment can assign whatever addresses they want on the LAN, and none of it reduces the amount of addresses there are out on the net. People can still talk to the internet, and just works the way the computers expect. You can even put one NAT on the LAN side of another, and then put that on the LAN side of another NAT, and have a tree of NATs! Each packet that comes through gets translated by each layer, and then forwarded on to be translated by the next, and when a response packet comes in each layer performs the transformation and forwards "down" the tree until eventually it gets back to the original computer. So what's the problem? Why bring any of this up? It's because this system only works if everyone inside a NAT only ever wants to reach out to things "on the internet", like servers. But if you remember why you're reading this in the first-place, WebRTC is trying to setup direct peer to peer connections between two different internet users without having traffic go out to a server at all! NATs can't work in that way, because the whole point of their translation is that the computer's address is not something routable on the general internet; it's a local address that only exists within its own LAN. There's nothing you can do to tell a NAT, or the top NAT in a tree of NATs, how to send a packet it's received that it didn't already have a translation saved for

The Solution: STUN, TURN, and ICE

That last sentence really is the key to how we're going to overcome this limitation. If I want people to be able to talk to my browser from anywhere on the internet, what I can do is first send some packet out somewhere so my NAT and any other NATs between me and the internet will all make a translation that will allow responses to that packet to find their way back to me. Then, if somehow the other browser I wanted to communicate with could know the last layer of that translation, the "public" address and port that eventually made it onto the real internet, then they could send traffic there and all the NATs would perform their translation and I would eventually get the packet they sent! Magic! The problem, though, is that I don't know what that final "public" address and port are. Most routers don't expose that kind of information to the LAN, and even if they did there's no guarantee that their WAN address isn't just some other NAT's LAN address. Really we don't care about my router, or how many NATs there are, we just want to know what the internet sees.

So what STUN does is define a format for packets that one can use to find out this information. Then people can run STUN servers, either for public use, or maybe specific to the particular app that's looking to communicate. The STUN server runs on the public internet, and when I send it a packet asking it who I am, it will respond with a packet back to me. But inside that packet it will include the address and port it saw my request come from, which allows me to know my own "public" identity. It would be like calling someone to find my own phone number by asking them what number they see. There's more to STUN than that, but that's all we need for now.

This will work on many NATs, but not all, because of course it has to be difficult. Some NATs go further than what I've said so far. Rather than just remembering an association between a source address and port and whichever public port they pick, they remember all of the source address and port, and destination address and port, when picking an output port. That means if I send a packet to 8.8.8.8 the NAT might pick port "3425" to use, but if I send from the same port on my end to 4.4.4.4 instead rather than re-using 3425 it will pick a totally different port. Because of this, if I send a STUN packet out to address 8.8.8.8 and it tells me my public address and port, and I give that to 4.4.4.4, then when it uses that address to try and talk to me it won't match my NAT's table, because there are no entries for 4.4.4.4 at all. No packets were ever sent to that address, so no entry was made. These are called "symmetric NATs" by STUN, and completely ruins our plan.

To get around this TURN was created as an extension to STUN. TURN allows another kind of request to a STUN server which asks the STUN server to allocate a port on the STUN server itself, and forward any packets it receives there back to me. Then the STUN server tells me what its own address is, and what port it picked for me (very much like an opt-in NAT, actually). So now, even if I have a symmetric NAT, I can still give the other person the STUN server's address and my port there, and I know those packets will make their way back to me down the same connection I used to make this request. This isn't ideal. It's not really peer to peer anymore. Packets are still going through another server, which means the TURN server (really the STUN server that supports TURN) will need to be able to handle the volume of traffic, and there's still an extra jump adding latency.

There are a few silver linings, though. The first is that if Romeo needs to use TURN, that doesn't mean Juliet does. So any packets Juliet sends to Romeo may go through another server, but packets from Romeo to Juliet can go direct, which is still half as much traffic to the TURN server. Even if they both need to use a TURN server, there's no need for them to use the same one. That can mean that Romeo is talking to 8.8.8.8 and Juliet is talking to 4.4.4.4, so each of those servers is only seeing half the traffic. Also, TURN servers are able to be more simple than app servers. They don't have to put anything into a database or anything like that, they just turn each packet they receive into one they send, without even knowing what's in it. This allows them to process more packets per second. And finally, it means the app developer doesn't have to implement both a peer-to-peer mode and a "that didn't work" mode. If they run TURN servers, then at the very least it will use that and run the same as if it was truly peer-to-peer.

So we have our actual addresses, which might work if we aren't behind a NAT. We have STUN addresses which will work for many people that are behind one or more NATs. And we have TURN addresses which should work for everyone. But TURN is more effort, so we'd rather use STUN if that works. And even STUN is more work than it needs to be if direct messages work, like if both of our two users are behind the same NAT. So what we need is a way to figure out which of these work and then pick the best one. This is what the standard ICE adds to STUN and TURN, and where we get to the "candidates" I mentioned earlier.

With ICE Romeo would find all the network addresses his device has, and maybe use STUN to find his public addresses and ports, and maybe even TURN to get a new public address and port. The idea is that any of these might work; they are "candidates" Juliet may be able to use to send packets to. Juliet will do the same thing on her end and get her own list of candidates. Then they send these lists to each other, then they would attach each of their own candidates with each of the other side's candidates and get a list of candidate pairs. So if Romeo had "A, B, C" and Juliet had "X, Y, Z", then both sides would make a list of candidate pairs "(A, X), (A, Y), (A, Z), (B, X), (B, Y), (B, Z), (C, X), (C, Y), (C, Z)". Now they go through each pair sending packets from their candidate to the other candidate. If one of them receives a packet then that means this direction works, at least. They respond with a success message, and then they will immediately send their own check for that pair back, if they hadn't already, to see if it works the other direction too. At the end of this process we will have tested all candidate pairs and we'll have a list of the ones that worked in both directions.

Along with the candidates ICE also has us keep track of what kind of candidate each one is, "host" for network addresses the device has directly, "server reflexive" ("srflx") for the ones we got from STUN, "relayed" ("relay") candidates from TURN, and "peer reflexive" ("prflx") which only comes up in cases where one side successfully receives a packet from an address that isn't otherwise a candidate, meaning there's some other network quirk between the two users. Each of these candidate types is given a priority based on our preferences, for example we'd prefer to use host over srflx, and we'd prefer to use srflx over relay. Then we can combine the priorities of each side of a pair to get a priority for each working pair. At this point we can simply pick the best working pair, given our priorities.

There is one last wrinkle. Networks are complicated, and sometimes things go missing. As a last protection against this, one side of ICE (conventionally the person making the call) is nominated as the "controlling" peer. The controlling peer gets final say in which pair is actually nominated to be used. So once we have a pair we like, the controlling peer sends a request again with a "use-candidate" value to tell the other side that this is it, the other side responds back "ok" so everyone knows we're all clear and that communication on this pair still works. At this point we're finally done with ICE and we have a pair of ports that can be used to talk from peer to peer.

Back to WebRTC

Ok, that was pretty in-depth. Let me remind you where we are here: I'm looking in my WebRTC debugging tools and I'm seeing that the connection object isn't getting any data, and the candidate pair chosen to represent that connection isn't getting any either. But there is a candidate pair that is getting data, it's just not the right one!

So my first thought on how this could happen is that maybe there's a disagreement between Movim/Chrome and Asterisk on which of them is the controlling peer! That would also make sense why inbound calls work, because the caller is always the controlling peer. So if Asterisk thought it was the controlling peer in both cases, then Chrome would agree on inbound calls, but disagree on outbound calls. It felt pretty right. There is a section in the ICE standard on how to solve a situation like this, but maybe it wasn't being followed properly. Here's the problem... this is a pretty internal detail of the implementation of ICE in these two pieces of software. I didn't want to take our production Asterisk server down and add a bunch of logging here, maybe even breaking it in the process. And like I mentioned before I wasn't excited to rebuild Chrome just to test this. I spent a little bit of time looking to see if there was some implementation of ICE that was simpler that Chrome that I could use as a stand-in and be more free to make quick changes to, but nothing jumped out that wasn't going to be hard to adapt to my actual use-case. I lamented: I don't even need logs, what I really wanted was some way to see the data they're sending back and forth without modifying anything. Oh... wait a second...

Wireshark

At this point it became clear that I had been working in web and other special areas for too long. I had been searching for so long for a way to inject logging statements into this flow somehow so I could see what was on the network, when I should have immediately reached for Wireshark. It had previously been a tool in my toolbox, but I hadn't touched it since everything became all-web-all-the-time. Wireshark is a program that just records all of the packets your computer sends and receives, and shows them in an interface that makes it easy to filter, search, and inspect. I didn't need the programs to log what they thought they were doing, I could inspect what they actually did and follow along that way! What's even better is that Wireshark already knows about STUN and TURN, so it can show me the different fields without me having to know how to unpack the bits from the packet myself!

screenshot of Wireshark UI

See here how I can search for "stun" and it'll only show me the packets for STUN? Also notice that I can expand the "STUN" attributes, because it knows about them, and see in plain terms "this is a binding response" and my IP, and also the tiny diamond on the left shows the corresponding request that this response is to. Very handy stuff. Much better than logging.

If you remember what I mentioned before, ICE has bits in the standard that try to correct for the situation where both sides think they're the controller. In order to do that, they declare on each request whether they're making it as the "controlling" or "controlled" party, which means it was easy to figure out if each side thought they were the controlling peer. Sure that this was it, I looked; everything looked like it was to spec. Fuck.

Ok, if Asterisk knew it wasn't the controlling peer, why wasn't it using the candidate pair the controlling peer was nominating? Now that I had real data that was being exchanged, I could start going from packets, to standard, to code, and back, to try and trace out how everything was actually working. After tracing around for a while on how I expected the flow to progress I found a real problem!

 {
    char address[PJ_INET6_ADDRSTRLEN];

-   if (component < 1 || !ice->comp[component - 1].valid_check) {
+   if (component < 1 || !ice->comp[component - 1].nominated_check) {
        return;
    }

    ast_sockaddr_parse(cand_address,
-       pj_sockaddr_print(&ice->comp[component - 1].valid_check->rcand->addr, address,
+       pj_sockaddr_print(&ice->comp[component - 1].nominated_check->rcand->addr, address,
            sizeof(address), 0), 0);
    ast_sockaddr_set_port(cand_address,
-       pj_sockaddr_get_port(&ice->comp[component - 1].valid_check->rcand->addr));
+       pj_sockaddr_get_port(&ice->comp[component - 1].nominated_check->rcand->addr));
 }

 /*! \brief Destructor for locally created ICE candidates */

This code was toward the end of the ICE session where we're taking the result of ICE and preparing to return it to the main code. Every time we get a response to one of our checks it's marked as "valid", and that valid_check field is updated when a new check is found to be valid and has a better priority than what's stored there. That way by the end we will have the best priority valid check easily in reach. Also, any time the controller nominates a candidate we do a similar thing and store the result in the nominated_check field. But here, at the end, we're not using the best nominated check, only the best valid check. That will just work so long as the best valid check is nominated, which is expected by the standard, but not technically required and also not what I was seeing. This is great! Finally something that explains what I was seeing. Asterisk sending to a candidate that wasn't nominated.

So I deployed that and confidently ran it. Still didn't work. That was disappointing.

Ok, back to Wireshark looking for other weird things. Paying closer attention to the actual flow I was seeing, I noticed that a request would go out for one pair and I'd get the response properly. But that's it. If we remember the standard, there's supposed to be an immediate request in the opposite direction to test that direction, but now that was looking I only saw responses to my requests, and never the requests originating from Asterisk. This is important because the ICE negotiation isn't fulfilled until the other side makes these requests. From our perspective only one direction works, and it looks like we can't actually talk to Asterisk using the same channel Asterisk can talk to us on. So we keep trying to nominate a pair, but never receive the expected opposite request to confirm to us that this pair is good, so we try again, and again, etc.

This gave me an area in the code to look at, at least. What's worse, this code actually lives outside of Asterisk's codebase, in an external library made just for Asterisk called pjproject. After some investigation and comparing to the standard I noticed this section:

/* 
 * 7.2.1.4.  Triggered Checks
 *
 * Now that we have local and remote candidate, check if we already
 * have this pair in our checklist.
 */
for (i=0; i<ice->clist.count; ++i) {
    pj_ice_sess_check *c = &ice->clist.checks[i];
    if (c->lcand == lcand && c->rcand == rcand)
        break;
}

/* If the pair is already on the check list:
 * - If the state of that pair is Waiting or Frozen, its state is
 *   changed to In-Progress and a check for that pair is performed
 *   immediately.  This is called a triggered check.
 *
 * - If the state of that pair is In-Progress, the agent SHOULD
 *   generate an immediate retransmit of the Binding Request for the
 *   check in progress.  This is to facilitate rapid completion of
 *   ICE when both agents are behind NAT.
 * 
 * - If the state of that pair is Failed or Succeeded, no triggered
 *   check is sent.
 */

link

So this comment specifically references part of the ICE standard, section 7.2.1.4, in deciding when to send these triggered checks back with the same candidates after receiving a check from the other side. The problem is that it's actually wrong, the standard actually says:

If the state of the pair is Failed, it is changed to Waiting and the agent MUST create a new connectivity check for that pair (representing a new STUN Binding request transaction), by enqueueing the pair in the triggered check queue.

So in their implementation if I've already tried and failed to contact you with a candidate pair, and then later I get a request from you with that pair, I should ignore it. The standard, though, says I should instead try again on that pair, since it may have just started working. That's kinda weird though. If there was an occasional failure and it mysteriously didn't work one in every hundred calls, maybe this would be to blame, but this failed consistently. Every time. What's up? Well, a good place to start is how things end up in the failed state. The code had a timeout hard-coded where it'll retry each request 7 times, with one second between attempts, before deciding the pair doesn't work. That makes some sense, but looking at my Wireshark session I noticed something important.

The way Jingle works, when Romeo clicks the "call" button it sends a request to Juliet's device saying "I'd like a call please", and then it starts gathering candidates and sending them over. Juliet's device shows an incoming call screen or something to ask if she'd like to pickup, and when she answers yes she sends her candidates back to Romeo so they can start negotiating. But in this case we're not calling Juliet's device, we're calling Asterisk. The way Asterisk handles this is that it starts ringing on the phone network, and to speed things up in the meantime it starts gathering candidates and builds an ICE session with Romeo's candidates. This means the ICE session has already started before the other person's phone has even started ringing! Then, if the other person accepts the call, Asterisk will send its candidates down to Romeo with the session acceptance, so Romeo can start his ICE session with Asterisk's candidates.

That means that while the phone has been ringing, Asterisk has already been trying Romeo and finding no one's responding (because Romeo hasn't started his ICE session since he hasn't seen any candidates yet). So after 7 seconds Asterisk decides the candidates don't work. Later, when the call has been answered, Romeo starts ICE and starts sending out ICE messages and gets responses, but doesn't get any triggered checks, so he assumes there's something wrong with his responses and that the channel isn't actually working. So he keeps trying to get through and nominate things, but it never works. Ok!

So, assuming that's our problem, there are a few ways to fix it. The first thing we could do is change the ICE triggered checks to be in line with the standard, so it would retry the failed checks when things actually start on the other side. Another way we could fix it would be to change the way ICE works for Jingle in Asterisk and only start the ICE session once we've sent the call acceptance back to the caller. That way both sides will be starting their ICE around the same time, and so they'll likely line up and actually agree on something. The problems were that the first solution involved changing code in pjproject, which was annoying, and the second one was a somewhat involved change to how Asterisk's Jingle integration worked. Instead I opted for a worse, but far easier, solution which was to simply increase the timeout to 45 seconds. This dodged around the problem by assuming that Asterisk wouldn't have considered that candidate failed by the time the person actually answers the call. That way it'll still send the triggered check, and all will be good with the world.

So I rolled that out, and it actually worked! We plan to return to this issue and build a real solution, but for now it allowed us to keep testing.

Mission Accomplished

Mission Accomplished banner on US Navy Ship

So, I now had Movim successfully and consistently making calls out to Asterisk, and thus to real humans' phones. I told my coworkers I had done it, I had found the problem and fixed it, and all was now well. So we tested it with Conversations, the Android client that we expect many of our users to use. Nope. Just as broken as ever. Ok, maybe I was a little hasty... What about Gajim, another desktop client? Busted. Ok, what about Movim on Firefox, where I started before switching to Chromium just for its dev tools? Totally broken.

Ok, so... maybe there was still a ways to go... Don't worry, though, dear reader. The next fixes won't take as long to explain.

Gajim

I started with Gajim. It's not more important than the others, but when I was testing I noticed Gajim actually printing a useful looking error message in its error log. That's a very nice place to start! It also wasn't mad about the ICE stuff at all, but before that. When the clients are back at the beginning of the setup they negotiate what kind of data is going to go over this call. Is it video, or audio, or file transfer? If it's audio, what format of audio is it, is it stereo, what levels of quality does each side support? These kinds of things, where the two clients are trying to come to a consensus on how we're actually going to go about transmitting audio data, once ICE figures out the connection itself. In Jingle the way this works is that during session initiation you can specify the kind of content you want in this session, or if you want to add content to an existing session you can add new content to a session. From then on we can talk about that content by its name and the person who created, "initiator" for the person making the call, and "responder" for the person who has been called. That's mostly there to prevent a case where Romeo starts a session with Juliet and then they both propose a new audio stream at the same time, and then each side thinks they're negotiating about their own proposal, when in reality there are two proposals. With the creator it becomes clear, they both are talking about their own audio content, and further negotiation is required.

That being said, the code in Asterisk seemed to feel it was always the creator of the audio content. The code was written to send a creator of "initiator" for an inbound call, and "responder" for an outbound call. For inbound calls, Gajim agreed; Asterisk would propose a new session with audio content, and so they were the creator. For an outbound call, though, Gajim would do the same and propose a new session along with audio content to go along, but Asterisk would respond back about an audio stream that Asterisk itself had created. But it hadn't created one, Gajim did. So I made it so Asterisk just always used the "initiator", which seems to match how clients actually established sessions. Incidentally, the reason I didn't notice this before is because Movim doesn't care about the creator field and also just assumes "initiator".

So after that change it now worked in Gajim!

Firefox and Logging

Movim on Firefox was harder to debug. When I was looking at Wireshark, I just saw... nothing. There was occasional single things that would go out, but basically it looked like ICE wasn't doing anything. That made it hard to debug...

By now, though, I'd found where ICE actually lives in the code, specifically over in pjproject and not the main Asterisk code. I'd gained some experience reading that code and working over there, and while I was reading that code I noticed some parts of the code did actually already have logging code. If I could just figure out how to turn it on, it may tell me more about what the code was thinking. I'm embarrassed to say it took me a good while to figure out how to get those logs turned on.

I eventually found some forum post somewhere outlining the simple steps I needed to use to enable logging of the data I wanted:

# First we get to the asterisk command shell
$ sudo asterisk -r
# Then we add a new "logging channel" that will include debug logs
> logger add channel some_filename notice,warning,error,debug
# Then we set core (that is, Asterisk) to log up to debug logs
> core set debug 5
# Then we set pjproject to also log debug logs
> pjproject set log level 5
# And then tell pjsip (not sure why it's not pjproject) to actually log
> pjsip set logger on

Not sure why I didn't just guess all of that...

But anyway, now I could run my tests and it would log out to /var/log/asterisk/some_filename! I will admit, it is nice that I wasn't filling the normal log files with junk and could actually see only my test, rather than wandering through days of logs looking for my portion.

When I was done, I could do the reverse (also from the asterisk command shell):

> pjsip set logger off
> pjproject set log level default
> core set debug off
> logger remove channel some_filename

This would stop putting new logs in my some_filename file, but wouldn't delete it. This is also convenient because I could now search through this file without it getting infinitely longer, or filling up with logs I wasn't interested in.

That being said, even the short file for a test that takes a minute can have thousands of log lines, so it still takes some sifting to find the actual information I'm looking for.

I noticed a few important things looking through the logs. The first, and most obvious, thing is that it builds an ICE session many many times. It'll build one, tear it down, build another one, then tear that down, within a second. This made it hard to follow the history of a single session, but was also very obviously something that might be a problem. The second issue I noticed is that all of the sessions got to a point where they said "Resetting ICE for RTP instance", but some of them said "Nevermind. ICE isn't ready for a reset" afterwards, and then things didn't seem to work after that. All the broken ones had "comp_id=2", which meant they were for the second component. Comparing the same logs with Movim on Chrome, there was no second component. Huh.

So what is a component? ICE has a section in the spec for negotiating multiple independent ports in a way where either they both work or the whole negotiation fails, which could be used by applications which need multiple ports to work in coordination for anything to work. The protocol that WebRTC offers for audio is called RTP (Realtime Transport Protocol), which has two modes of operation. Originally RTP had two connections, one where it would send the audio data, and another called RTCP (RTP Control Protocol) where it would send information about how well the audio was sending so the participants could adjust their quality or something. A later version of RTP added an optional feature called rtcp-mux, which allowed the sending of the RTCP information along the same connection as the audio so we only need one connection, and so only one ICE component. Well, when WebRTC was standardized it was decided that WebRTC required the RTP implementation to support rtcp-mux in order for it to be allowed as part of WebRTC. So in Chromium they take advantage of that and just assume it supports rtcp-mux and only start ICE for one component. Firefox, though, felt it was important to be more backwards-compatible and tries to support both rtcp-mux and traditional RTP+RTCP modes. There's a way to tell if the other server supports rtcp-mux, but that information is sent when the other side answers the call, and by then Firefox has already sent all of the candidates for both components.

Ok, so that's why Firefox acts differently than Chrome, but why is it a problem? Surely it should be fine to negotiate two components and just ignore one, and ICE should work fine in either case. Well, that comes down to the constant building and rebuilding. The way original ICE works, the full set of candidates is gathered by both sides and then exchanged, they're all processed, and then a winner is picked. Jingle, though, made a change where it would send each candidate as it discovered them. That way the ICE session can start sooner and can be looking for candidate pairs while the STUN and TURN stuff is going on, and also if a candidate is found right away it might be decided it's not even worth it to get a TURN candidate since it'll be lower priority than the valid pair we've already found. That method of operating eventually got its own draft standard under the name Trickle ICE, which is similar but slightly different, which means there's a new version of Jingle that's meant to bring Jingle up to date on new versions of ICE including Trickle ICE. It's a draft, though. All that is to say, things are kind of a mess, and the version of ICE that Asterisk supports right now is not the Trickle kind. That's a feature that's being worked on for the future, but it's not released or supported by the XMPP integration at the time of this writing.

So, if Jingle uses Trickle ICE, but Asterisk doesn't support that, how does the XMPP integration in Asterisk work? Well, every time it sees a new candidate trickle in, it just restarts the ICE session as though that candidate is the only one. It's not ideal, but it appears to work a surprising amount of the time! That's likely owing to ICE's flexibility when it gets a request from a pair it doesn't know about, how it just adds it to the list. As long as one side knows all the candidates, the other side should respond properly and come to a consensus. Weird, but fine. Maybe in a newer version of Asterisk that will get better. But in this case Firefox has two different components, and it sends the "component 2" candidates after the "component 1" candidates. So Asterisk will get one candidate and will setup an ICE session, and then immediately gets the component 2 candidate so it tears the first one down and sets up the new one. But this new one only has candidates for component 2. That's something ICE doesn't allow, so Asterisk's ICE session isn't in a "nominating" mood, so it just doesn't do anything. That's why Firefox doesn't appear to be able to negotiate a connection.

So now that we know the problem, how to fix it? Well, we could implement a sketchy version of Trickle inside Asterisk, or even just inside the XMPP integration code for Asterisk. That looked like it was likely to introduce new bugs, and I knew that someone was already working on a real version of Trickle that may just work later. And even though it's nearly broken, it also works in Chromium every time I tested it. I really just wanted Firefox to work the same way Chromium does, so that's what I built. I just put some code in the XMPP integration part of Asterisk that ignores candidates for component 2. I know they're not going to end up using it either way, because all of our clients support rtcp-mux, and this is basically one line of code that is highly unlikely to introduce any new bugs!

Now Movim on Firefox works too! Three down, and one to go!

Conversations

This was it, our most supported client, and if it didn't work here we couldn't really call it a feature. And it still isn't working after all those other fixes. It's hard to use Wireshark because the app is running on my phone, but I now have the power of Asterisk logs at my disposal! Using that I took a look and... everything looked pretty good. Candidates were being exchanged, and negotiated, and being chosen as valid and then... just sitting there. None of the valid options were ever being nominated, so both sides were just waiting until someone gets bored of waiting and ends the call. I decided that this was really interesting, but I might need Wireshark to see more. Great.

There's probably some way I could have networked my phone through my computer or something to use Wireshark, but if I found that I needed more logging or something it would probably be useful if I could build the code for the Android app. And if I can build the code, then I can just run the Android Emulator which lets me run the app on my computer in a fake phone environment. So I pulled down the code for Conversations and got it setup and working in Android Studio. Now it was as easy as clicking the "play" button and running Wireshark to watch the packets the virtual-phone was sending. Since I already knew what to look for from the Asterisk logs, it was pretty easy to look at the stream of packets and see that neither side was including the "use-candidate" attribute which, if you remember, is how the controlling side tells the controlled side which candidates we're going to go with. That would explain why inbound calls work with Conversations, because for inbound calls Asterisk is the controlling side, and includes that "use-candidate" value so we're all on the same page, but when the roles are reversed and Conversations is the controlling side it never nominated anything. That's weird. Unlike with the Firefox side, it's not like it was doing nothing; it was definitely making requests to find a list of valid candidates. And unlike the original Chromium problem triggered checks were coming back just fine, and both sides appeared to know what the valid list is. After combing through packets, I did manage to find one weird thing. Wireshark knew about the attributes in STUN and could pull those values out of the packet and show them to me, but some of the packets around the time I might expect nomination to start had some extra attributes. Values Wireshark didn't recognize and listed only by their type, "0xC001". It's possible it was nothing, but it was something the working implementations didn't do, so it was a thread I had to pull on.

I pulled down the code for libwebrtc, the WebRTC implementation Conversations used, and searched through the code for that value and saw it was associated with STUN_ATTR_NOMINATION. Continuing to trace through the code for where this value was used, and which sets of conditions lead to that code, and so on, I eventually found that there is an option libwebrtc supports called "renomination". I manage to find what could be called a standard only by the most generous definitions, ICE Renomination: Dynamically selecting ICE candidate pairs. This document doesn't actually mention which STUN attributes to use to do the nomination, saying only "we define a new STUN attribute", but the code I was looking at seemed to line up with the intent of the document, at least.

The intention of this standard is to make it easier to control which pairs get nominated by ICE. In base ICE we test to find valid pairs, then some are nominated, and then the nominated pair with the best priority is chosen. Renomination is trying to make it easier for the controlling side to change its mind, for example if a WiFi device goes out of range during negotiation, making the cell network candidates better choices, or something. I'll be honest, I'm not really sure how often this would really apply, because ICE doesn't run for very long. Either way, though, the way this proposed extension handles this "mind changing" is by replacing the "use-candidate" attribute, which is either present or absent, with a different "nomination" attribute that contains a number. This number is more important than the priority, so the controlling side can nominate one pair with value 2, and then later nominate another pair with value 3 and that will now be the best choice no matter what priorities say. So, that's what the "unknown" attribute I saw in Wireshark, and also why no candidates were ever chosen. Conversations was sending one attribute to nominate, and Asterisk was waiting for a different one.

But why? There's a part of the spec that says that renomination is only turned on if both sides include the renomination option in their initial candidate exchange. Well, Asterisk definitely doesn't support it, so what's up? Looking at the Conversations code it looks like the issue is that there's no room in the Jingle standard for exchanging whether or not someone supports an option like this. There just isn't a value for that. So in order to allow it, Conversations just assumed all call partners supported it and included that value in both sides of every session it setup. It also seems like libwebrtc isn't very picky about this. This means if a user running Movim on Chromium is being called by Conversations, Conversations will think both sides supports renomination and Movim will think neither side does. But despite this, when Conversations sends the special attribute instead of the "use-candidate" attribute, Chromium still understands it and doesn't even check if it should expect that kind of value from this session. It just knows what was meant. So this is why Conversations works with Chromium, but any standard implementation of WebRTC that didn't support the draft extension of Renomination would not work. Like Asterisk.

To fix this I did two things. The first thing I did was talk to the Conversations developer about how we can be more careful about when we include this option, and was told we could just never include it. They just didn't have strong opinions on it. So I made a patch to remove that option. That was the easiest fix I've had to make yet. The problem is that unlike the changes I'd been making to our Asterisk server, this code ran on our users' phones. We couldn't control when it would be released, and we couldn't control when our users would install the new version even after release. Since then, that change went out in version 2.9.8, but we didn't know that at the time, and wanted this to work as soon as possible. So, like the Firefox fix, I considered building an implementation of renomination in Asterisk, but after looking at my Wireshark it didn't seem like it was necessary.

Since it was just a stopgap measure until our users migrated to newer versions of Conversations I did the simplest thing that would work, which is to just treat the "renomination" attribute (0xC001) exactly the same as the "use-candidate" attribute; ignoring the number, and the purpose of the extension, entirely. This would break if Conversations ever actually tried to renominate a candidate to a new pair with lower priority, but how likely is that? Probably not very likely. And if they only ever nominated once, or tried to renominate to a higher-priority candidate, then it would just work! The only quirk here is the one I mentioned earlier, which is that ICE is actually implemented in a library to Asterisk called pjproject. I really didn't want to fork that library and cut a new release just for myself with my garbage change in it, solely so I could put it in our Asterisk implementation. Luckily this apparently isn't the first time Asterisk has needed to make some tweaks to pjproject, so there was already a system for that! If I made a git commit in the pjproject repository, and then put the diff from that commit into a folder in a magic location in the Asterisk repo then during the Asterisk build it will make that change to the library for me before building! That meant I could still use an official release of pjproject, and could keep all of my ugly changes together in the Asterisk codebase.

I tested both changes, and they both worked. If the client didn't specify "renominate" as an option, then it spoke standard ICE and would work without my change. And if I left the client as-is it would work with my altered server that pretended to know what renomination was. The phone calls went through, and Conversations worked! All the clients worked!

Somehow Wrapping Up

Well, if you're at the end of this, thanks for following my on my odyssey. We began at the banks of "this is 90% working, there's probably just one weird typo to fix to get it the rest of the way". From there it was down the river of standards and the rapids of interoperability of standards between browsers and old C projects, aided all along by my compass Wireshark. I was confident when the rapids calmed down and the one problem was fixed, but there was a world of trouble up ahead as I didn't see the waterfall coming. My raft came out the other side, a little battered but with things holding together. Calls were being made, and metaphors were being thoroughly stretched. Seriously, why would I need a compass to raft down a river? IT ONLY GOES ONE WAY.

All in all, none of these fixes were "the proper fix", but with them I got the system working. I plan to engage the Asterisk community to open discussions about the actual issues, rather than just producing a patch no one wanted, but in the meantime calls are being made and received. And I got to stretch my legs and learn a bunch of inner workings, which is my interest. I even got a blog post out of it...

Sources

  • Mission Accomplished banner by U.S. Navy photo by Photographer's Mate 3rd Class Juan E. Diaz. (RELEASED) - Source, Public Domain
Creative Commons Attribution ShareAlike