<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[jambonz news and blog posts]]></title><description><![CDATA[I build open-source software for the telecommunication/real-time comms space; most notably drachtio (SIP server) and jambonz (CPaaS).]]></description><link>https://blog.jambonz.org</link><generator>RSS for Node</generator><lastBuildDate>Wed, 15 Apr 2026 21:54:12 GMT</lastBuildDate><atom:link href="https://blog.jambonz.org/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Why we're introducing a commercial license]]></title><description><![CDATA[Today we're introducing a commercial license option for jambonz. The open source version isn't going away—we'll continue to maintain it with security fixes and bug fixes—but over time new features will land first in the commercially licensed version....]]></description><link>https://blog.jambonz.org/why-were-introducing-a-commercial-license</link><guid isPermaLink="true">https://blog.jambonz.org/why-were-introducing-a-commercial-license</guid><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Tue, 20 Jan 2026 15:40:22 GMT</pubDate><content:encoded><![CDATA[<p>Today we're introducing a commercial license option for jambonz. The open source version isn't going away—we'll continue to maintain it with security fixes and bug fixes—but over time new features will land first in the commercially licensed version.</p>
<p><strong>If you're an existing support customer, nothing changes for you</strong>. You can continue on your current plan indefinitely with the same level of service.  If you choose to migrate to the commercial version at some point, most customers will be able to do so at no additional cost. We're also offering an OEM license for those who need source code access and redistribution rights.</p>
<p>Now let me explain why we're doing this.</p>
<h2 id="heading-why-this-change-now">Why this change now</h2>
<p>We're making this change for two reasons:</p>
<ul>
<li>To increase the pace at which we deliver meaningful innovation to our customers, and</li>
<li>To make the product more widely available to prospective customers.</li>
</ul>
<p>When companies adjust their open source strategy, they often cite the same reason: sustainability of the project. That's true for us too, but I want to be more specific about what that means.</p>
<p>Our open source strategy served us well in the early days. When jambonz was new and I was the only developer, "free" was the right price to attract customers willing to take a chance on unproven software. It got us off zero. It worked.</p>
<p>But over time, the challenges facing us have changed, what it means to compete in the VoiceAI space has changed—and the cracks in our business model have become harder to ignore.</p>
<h3 id="heading-the-paradoxes-weve-been-living-with">The Paradoxes We've Been Living With</h3>
<p>Here's something that's bothered me for years: to build a services-only business, we've had to make jambonz harder to deploy without our help. Our deployment scripts and know-how? We've deliberately withheld them from non-paying users, because support revenue is how we fund development.</p>
<p>This has never felt right to me. Many of our users have been frustrated by it too (maybe you were one of them). But when services are your only revenue, you have to create reasons for people to pay for services. The result is a worse product for everyone who isn't a customer yet.</p>
<p>Here's another paradox: the better jambonz works, the worse our business model worked. Support contracts look less necessary when things run smoothly. We frequently have customers sign up to get running, then cancel after a month or two because everything is working fine. We're penalized for building reliable software.</p>
<h3 id="heading-the-math-doesnt-work-anymore">The Math Doesn't Work Anymore</h3>
<p>A services-only business is labor-intensive with thin margins. That limits what we can invest in the product. Meanwhile, VC-funded companies that we compete with have significantly more resources than our small team.</p>
<p>There's so much happening in this space right now. The product needs to evolve in multiple dimensions at once just to keep pace with the rate of innovation. If we stay on our current path, jambonz eventually becomes a project that can only afford security patches—no meaningful new development. That's not good for anyone.</p>
<h3 id="heading-being-honest-about-what-i-want">Being Honest About What I Want</h3>
<p>I apologize for making this a bit personal, but perhaps you'll grant me this brief aside since jambonz was at the beginning, a personal passion project.  I started jambonz because I wanted to challenge myself to build something large and ambitious. I'm in it for the innovation—for doing hard things that I find genuinely fun to try to do. I want to build great software and I want to build a great company.</p>
<p>To do that, we need the resources to compete. Services revenue alone won't get us there. Software licensing revenue will help.</p>
<h2 id="heading-making-jambonz-easier-to-deploy">Making jambonz easier to deploy</h2>
<p>There's a silver lining here. Once we're not dependent on making self-hosting difficult..well, we can stop doing that! We'll be able to provide our devops tools and deployment scripts to everyone. We'll make it much easier to spin up jambonz on your own infrastructure.  </p>
<p>To start with, for instance, you can <a target="_blank" href="https://github.com/jambonz-selfhosting/cloudformation">find Cloudformation scripts here</a> to deploy jambonz in your AWS account, along with <a target="_blank" href="https://docs.jambonz.org/self-hosting/aws">step-by-step instructions</a>.  We'll be adding additional devops scripts and docs shortly in our public repos to deploy on all of the major hosting providers (Azure, GCP, Exoscale, OVHCloud etc) as well as Kubernetes and Docker Compose.</p>
<p><strong>We're also introducing free usage for pre-revenue companies and non-commercial use</strong>. If you're a student, a hobbyist, or a startup that hasn't found its footing yet, jambonz will be available to you at no cost.</p>
<h2 id="heading-for-current-customers">For Current Customers</h2>
<p>If you're on a support contract today, you can stay on it. We're not forcing anyone to migrate, and we're not raising prices on existing customers. The open source version will continue to receive security and bug fixes through 2026 and beyond.</p>
<p>For those who want access to new features as they land, we'll offer commercial licensing terms that respect your existing relationship with us.</p>
<h2 id="heading-what-does-the-license-look-like-and-how-does-it-work">What does the license look like and how does it work</h2>
<p>There are two different types of licenses:</p>
<ol>
<li>The standard license, which you can review <a target="_blank" href="https://www.jambonz.org/legal-terms-selfhosted">here</a></li>
<li>The OEM license, which you can review <a target="_blank" href="https://www.jambonz.org/legal-terms-oem">here</a></li>
</ol>
<p>Most of our existing customers will be well-served by the standard license; only those who want to redistribute our code as an embedded part of their product would need an OEM license.</p>
<p>Running the commercial version of jambonz under the standard license requires a software license key.  The license key is tied to a specific jambonz system or cluster, identified by the unique DNS domain name that you assign when you deploy jambonz.  Everyone can get a trial license to start with and, as mentioned above, non-production users can access extended trial licenses.</p>
<p>Licenses can be purchased for a specific capacity, measured by the maximum number of concurrent sessions that jambonz will handle; or for larger capacities a flat monthly fee can be opted for that enables unlimited capacity on a single cluster or system.  Details on pricing can be found <a target="_blank" href="https://www.jambonz.org/pricing">here</a>.</p>
<p>Standard licenses are billed as a monthly subscription fee.  They can be canceled or modified at any time and unless modified they auto-renew on a monthly billing cycle.  Customers with special needs, such as running air-gapped systems can contact us for accommodations.</p>
<h2 id="heading-some-housekeeping-items">Some housekeeping items</h2>
<ul>
<li>The commercial software versions will be named version 10.x and above.</li>
<li>The open source software versions (of which the current release is 0.9.5-10) will continue to be named 0.9.x.</li>
<li>Many of our Github repos used with version 10 will remain public, MIT-licensed, and open for PRs.  These include our client sdks, our Node-RED plugin, and various npm libraries.</li>
</ul>
<h2 id="heading-looking-forward">Looking Forward</h2>
<p>My goal has never changed: to build great software that delivers real value. Open source was the right tactic to get us started. Commercial licensing is the right tactic to let us keep growing—to fund the innovation that keeps jambonz competitive and to build the kind of company I've always wanted to build.</p>
<p>Thank you for being part of this journey. As always, I'm happy to hear your thoughts.  You can reach me at daveh@jambonz.org, or ping me on our community Slack channel if you are with us there.</p>
<p>—Dave</p>
]]></content:encoded></item><item><title><![CDATA[How to stream text from LLMs using jambonz]]></title><description><![CDATA[What is LLM Streaming?
In the world of Voice AI, providing a responsive, low-latency conversational experience is the holy grail. One technique that is used to minimize latency is end-to-end streaming of text and/or audio to the user. With jambonz yo...]]></description><link>https://blog.jambonz.org/how-to-stream-text-from-llms-using-jambonz</link><guid isPermaLink="true">https://blog.jambonz.org/how-to-stream-text-from-llms-using-jambonz</guid><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Mon, 06 Jan 2025 14:10:37 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-what-is-llm-streaming">What is LLM Streaming?</h1>
<p>In the world of Voice AI, providing a responsive, low-latency conversational experience is the holy grail. One technique that is used to minimize latency is end-to-end streaming of text and/or audio to the user. With jambonz you can build Voice AI applications that leverage LLM streaming and the basic architecture looks like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734798573072/f1cb6698-d4dc-4cd5-ae7b-1f63a99b6dfc.png" alt class="image--center mx-auto" /></p>
<p>In this article we will do a deep dive on the jambonz streaming architecture and show you how to build an example Voice AI streaming application.</p>
<h1 id="heading-jambonz-streaming-architecture">jambonz streaming architecture</h1>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Support for LLM and TTS streaming in jambonz has been greatly improved and simplified in jambonz release 0.9.3, so this article assumes that you are running that release or later.  </p>
<p>In order to take advantage of the streaming feature you also need to build your jambonz application using the websocket API.  (The example code we will show in this article use our Node.js SDK which provides an easy way to implement streaming in your application).</p>
<p>Finally, as of release 0.93 we support the following TTS vendors for streaming:  </p>
<ul>
<li>Deepgram</li>
<li>Elevenlabs</li>
<li>Cartesia</li>
<li>(Rimelabs coming shortly)</li>
</ul>
<p>We are adding additional vendors all the time, so check back with us if you are looking for support from a different vendor.</p>
<h2 id="heading-jambonz-websocket-protocol-for-streaming">jambonz websocket protocol for streaming</h2>
<p>The full websocket api is described here, but in this article we will focus on the streaming commands.</p>
<p>A jambonz application initiates a streaming request during a call session by sending a <a target="_blank" href>tts:tokens</a> message to jambonz.  Typically, as your application receives streaming text tokens from an LLM it simply sends them immediately on to jambonz using a message like this:</p>
<pre><code class="lang-json">{
      <span class="hljs-attr">"type"</span>: <span class="hljs-string">"command"</span>,
      <span class="hljs-attr">"command"</span>: <span class="hljs-string">"tts:tokens"</span>,
      <span class="hljs-attr">"queueCommand"</span>: <span class="hljs-literal">false</span>,
      <span class="hljs-attr">"data"</span>: {
            <span class="hljs-attr">"id"</span>: <span class="hljs-number">101</span>,
            <span class="hljs-attr">"tokens"</span>: <span class="hljs-string">"It was the best of times, it was the "</span>
      }
    }
}
</code></pre>
<p>Let's go through this in detail.</p>
<p>The websocket api always has used the "command" type to send instructions to jambonz, so that part is not new.  The specific command "tts:tokens" is new, and this is how your application sends a chunk of text to jambonz as part of a larger stream of text it is receiving.  </p>
<p>There are two parameters, both required:</p>
<ul>
<li><strong>id</strong>: this can be either a number or a string, as you like.  It must be unique within this session and, as we will see shortly, it is used by jambonz to send you a confirmation that your tokens have been accepted.</li>
<li><strong>tokens</strong>: this is a string of text that appears within a larger text stream you are receiving from the LLM</li>
</ul>
<p>This is simple enough, but instantly several questions may come to your mind:</p>
<ol>
<li>Are text tokens always accepted?  What if I send them before the jambonz application is ready to play them out?</li>
<li>Is there any rate limiting?  What if I overload the server?  In fact, what if a malicious client deliberately tries to overload the server?</li>
<li>Do I need to explicitly tell jambonz when to generate audio from the tokens while streaming?</li>
</ol>
<p>Let's answer those questions.</p>
<h4 id="heading-how-do-i-know-if-my-ttstokens-command-was-accepted">How do I know if my tts:tokens command was accepted?</h4>
<p>For each tts:tokens request you send to jambonz you will get an acknowledgement in the form of a tts:tokens-result:</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"type"</span>: <span class="hljs-string">"tts:tokens-result"</span>,
     <span class="hljs-attr">"data"</span>: {
            <span class="hljs-attr">"id"</span>: <span class="hljs-number">101</span>,
            <span class="hljs-attr">"status"</span>: <span class="hljs-string">"ok"</span>
     }
}
</code></pre>
<p>The message will have an "id" and a "status" property.  The status property will be either "ok", or "failed";  if "failed" then an additional "reason" property shall be included to indicate why the request failed.  The following reasons can occur:</p>
<ul>
<li>"missing tokens": no tokens were provided in the request</li>
<li>"full": the application has sent too many tokens and must temporarily buffer and resend later</li>
<li>"connection to  failed": an error was encountered attempting to connect to the TTS vendor</li>
</ul>
<p>We will spend more time on how to handle failures where the reason = "full" below.</p>
<h4 id="heading-what-if-i-send-tokens-before-jambonz-is-ready-to-play-them">What if I send tokens before jambonz is ready to play them?</h4>
<p>You may wonder what do we mean by "when jambonz is ready to play them"?  </p>
<p>When we look at a sample application later this will be more easily explained, but in general for now we can say that when an incoming call arrives on SIP trunk, say, and jambonz begins handling it, it is possible for your application to response with streaming text even before the audio connection has been fully established.  </p>
<p>For this and other cases, jambonz will simply queue the text tokens and then process them as soon as the stream is open from jambonz to the TTS vendor.  So you can send tokens at any time and not worry particularly about whether the TTS connection is currently open or not.</p>
<h4 id="heading-do-i-need-to-tell-jambonz-when-to-generate-audio">Do I need to tell jambonz when to generate audio?</h4>
<p>Yes, you should send a tts:flush command periodically to jambonz to cause it to tell the TTS engine to generate and flush audio.  </p>
<pre><code class="lang-json">{
      <span class="hljs-attr">"type"</span>: <span class="hljs-string">"command"</span>,
      <span class="hljs-attr">"command"</span>: <span class="hljs-string">"tts:flush"</span>,
      <span class="hljs-attr">"queueCommand"</span>: <span class="hljs-literal">false</span>
    }
}
</code></pre>
<p>Most LLMs have a streaming protocol that make it pretty simple to know when to flush the audio, for instance using the Anthropic API you would do so whenever getting a "message_stop" event:</p>
<pre><code class="lang-js">    <span class="hljs-keyword">for</span> <span class="hljs-keyword">await</span> (<span class="hljs-keyword">const</span> messageStreamEvent <span class="hljs-keyword">of</span> stream) {
      <span class="hljs-keyword">if</span> (messageStreamEvent.delta?.text) {
        <span class="hljs-keyword">const</span> tokens = messageStreamEvent.delta.text;
        session.sendTtsTokens(tokens);
      }
      <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (messageStreamEvent.type === <span class="hljs-string">'message_stop'</span>) {
        session.flushTtsTokens();
      }
    }
</code></pre>
<p>Note: we are using the @jambonz/node-client-ws sdk in the above example code to send the <code>tts:tokens</code> and <code>tts:flush</code> messages.</p>
<h4 id="heading-is-there-any-rate-limiting">Is there any rate limiting?</h4>
<p>Yes, jambonz will only buffer about 5,000 characters for you.  If you send more than that jambonz will return a tts:tokens-result message with a status of "failed" and a reason of "full".  It is your responsibility to stop sending until jambonz sends you an event indicating that the stream is open for sending again.</p>
<p>Note: if you use the @jambonz/node-client-ws sdk, this buffering is handled for you.  You will not need to implement any buffering code in your application.  if you write your code directly to the low-level websocket interface though, you need to take this into consideration.</p>
<p>Now, in the context of understanding the event that jambonz will send you once it is ready to accept tokens again after a "full" condition is encountered, let's look at all of the tts-related events that jambonz sends your application.</p>
<h3 id="heading-tts-related-events-sent-by-jambonz">tts-related events sent by jambonz</h3>
<p>jambonz will send your application the following events over the websocket:</p>
<ul>
<li><strong>stream_open</strong>: this event is sent any time jambonz begins actively streaming text to the TTS vendor.  As we shall see when we look at the example application code, this has to do with how and where you use the <code>say</code> verb with the <code>streaming</code> property in your application.  As noted above, your application can send text tokens in advance of this event and, in the general case, there is no specific action required for your application to take when this event is generated.</li>
<li><strong>stream_closed</strong>: this event is sent when jambonz is not actively streaming text to the TTS vendor.  You can continue sending tokens to jambonz however, and they will be buffered and then sent when the next "stream_open" event occurs.</li>
<li><strong>stream_paused</strong>: this event event is sent when jambonz has buffered the max amount of text that it will accept from your application.  You should reframe from sending any more text until you get a "stream_resumed" event.</li>
<li><strong>stream_resumed</strong>: this event is sent when the text buffer maintained by jambonz has drained to an extent that you can again begin sending text.</li>
<li><strong>user_interruption</strong>: this event is sent when the user has "barged in", i.e. starting speaking during audio playout from the TTS engine.</li>
</ul>
<p>If you use the Node.js sdk you do not necessarily need to respond to any of these events.  If you are writing to the low-level websocket api then minimally you should have a handler for the "stream_resumed" handler because you would need to handle throttling in your application.</p>
<h2 id="heading-jambonz-verb-support-for-streaming">jambonz verb support for streaming</h2>
<p>If you have previously used jambonz, you are aware that text to speech is usually triggered via the <a target="_blank" href>say</a> verb. </p>
<pre><code class="lang-js">session
    .say({<span class="hljs-attr">text</span>: <span class="hljs-string">'Hi there, how can I help you today.'</span>})
</code></pre>
<p>There are two simple changes in release 0.9.3 of jambonz to support streaming.  The first is that the <code>say</code> verb can be used with a <code>streaming</code> property instead of <code>text</code>.</p>
<h3 id="heading-say-verb-with-streaming">say verb with streaming</h3>
<pre><code class="lang-js">session
    .say({<span class="hljs-attr">streaming</span>: <span class="hljs-literal">true</span>})
</code></pre>
<p>The "streaming" property is mutually exclusive with the "text" property; you must specify one or the other.</p>
<p>When a <code>say</code> verb with <code>streaming</code> is executed, jambonz connects to the streaming TTS engine, sends a <code>stream_open</code> event and begins streaming text to the TTS engine.  Any buffered text that was received from your application prior to this will be sent at this time.</p>
<p>If the <code>say</code> verb is nested within a gather, e.g.</p>
<pre><code class="lang-js">session
   .gather({
       say({<span class="hljs-attr">streaming</span>: <span class="hljs-literal">true</span>})
       <span class="hljs-attr">input</span>: [<span class="hljs-string">'speech'</span>]
</code></pre>
<p>Then when the <code>gather</code> ends the <code>say</code> verb is killed.  At that point jambonz will send a <code>stream_closed</code> event to your application.  Any text received from this point will be buffered by jambonz and sent when the next say with streaming is executed.</p>
<p>This is useful, but another alternative is to use a "background say with streaming".</p>
<h3 id="heading-background-say-with-streaming">background say with streaming</h3>
<p>If you have used jambonz you might be familiar with the concept of a "background gather" where your application can essentially always be listening and returning you transcripts while a user speaks.  This can be useful when creating voice bot platforms.</p>
<p>Similarly, in 0.9.3 you can create a "background say" with streaming by using the <a target="_blank" href>config</a> verb.</p>
<pre><code class="lang-js">      session
        .config({
          <span class="hljs-attr">ttsStream</span>: {
            <span class="hljs-attr">enable</span>: <span class="hljs-literal">true</span>,
          },
          <span class="hljs-attr">bargeIn</span>: {
            <span class="hljs-attr">enable</span>: <span class="hljs-literal">true</span>,
            <span class="hljs-attr">sticky</span>: <span class="hljs-literal">true</span>,
            <span class="hljs-attr">minBargeinWordCount</span>: <span class="hljs-number">1</span>,
            <span class="hljs-attr">actionHook</span>: <span class="hljs-string">'/speech-detected'</span>,
            <span class="hljs-attr">input</span>: [<span class="hljs-string">'speech'</span>]
          }
        })
        .say({<span class="hljs-attr">text</span>: <span class="hljs-string">'Hi there, how can I help you today?'</span>})
        .send();
</code></pre>
<p>In the example above the <code>ttsStream</code> is used to enable a <code>say</code> verb with streaming that is operating in the background.  Any text that your application streams to jambonz will be immediately played out.</p>
<p>If you want to disable streaming at some point in your application you simply issue another <code>config</code> verb:</p>
<pre><code class="lang-js">session
   .config({
      <span class="hljs-attr">ttsStream</span>: {
         <span class="hljs-attr">enable</span>: <span class="hljs-literal">false</span>
      }
  })
</code></pre>
<h1 id="heading-jambonz-sdk-support-for-streaming">jambonz sdk support for streaming</h1>
<p>If you've used the <a target="_blank" href="https://www.npmjs.com/package/@jambonz/node-client-ws">jambonz Node.js sdk for websocket api</a> you are used to using the <code>Session</code> class, as in this code snippet:</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> service = <span class="hljs-function">(<span class="hljs-params">{logger, makeService}</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> svc = makeService({<span class="hljs-attr">path</span>: <span class="hljs-string">'/tts-streaming'</span>});

  svc.on(<span class="hljs-string">'session:new'</span>, <span class="hljs-function">(<span class="hljs-params">session</span>) =&gt;</span> {
    logger.debug({session}, <span class="hljs-string">`new incoming call: <span class="hljs-subst">${session.call_sid}</span>`</span>);
</code></pre>
<p>Several new methods and events have been added to the Session class to support streaming.</p>
<h2 id="heading-streaming-related-methods">Streaming-related methods</h2>
<ul>
<li><strong>sendTtsTokens(text)</strong>: sends text tokens to jambonz.  This method will handle flow control with jambonz if necessary.</li>
<li><strong>flushTtsTokens()</strong>: notifies jambonz that the audio provided should be flushed through the TTS engine.</li>
<li><strong>clearTtsTokens()</strong>: clears all buffered and pending audio; typically called to handle a user interruption.</li>
</ul>
<h2 id="heading-streaming-related-events">Streaming-related events</h2>
<ul>
<li><strong>tts:streaming-event</strong>: sent by jambonz to notify the application of a streaming-related event.  </li>
</ul>
<p>As above, event types that can be sent are <code>stream_open</code>, <code>stream_closed</code>, <code>stream_paused</code>, <code>stream_resumed</code></p>
<ul>
<li><strong>tts:user_interrupt</strong>: sent by jambonz to notify the application that the user has barged into the audio playback.  The application should stop sending text tokens at this point and wait for the next utterance from the user.</li>
</ul>
<h1 id="heading-example-application-using-streaming">Example application using streaming</h1>
<p>Let's look at a sample Voice AI application that uses Anthropic to provide a conversational interface with a user.</p>
<p>This sample application can be generated using the <code>npx create-jambonz-ws-app</code>.  If you have not used this before, it is a simple command line utility that can scaffold up different types of jambonz applications, using the websocket api</p>
<pre><code class="lang-bash"> npx create-jambonz-ws-app
Usage: create-jambonz-ws-app [options] project-name

Options:
  -v, --version              display the current version
  -s, --scenario &lt;scenario&gt;  generates a sample websocket app <span class="hljs-keyword">for</span> jambonz (default: <span class="hljs-string">"hello-world"</span>)
  -h, --<span class="hljs-built_in">help</span>                 display <span class="hljs-built_in">help</span> <span class="hljs-keyword">for</span> <span class="hljs-built_in">command</span>


Scenarios available:
- hello-world: a simple app that responds to an incoming call using text-to-speech
- <span class="hljs-built_in">echo</span>: an collect-and-response app that echos <span class="hljs-built_in">caller</span> voice input
- openai-realtime: a conversational voice interface to the OpenAI Realtime API
- deepgram-voice-agent: a conversational voice interface to the Deepgram Voice Agent API
- llm-streaming: example of streaming text tokens from Anthropic LLM
- all: generate all of the above scenarios

Example:
  $ npx create-jambonz-ws-app --scenario <span class="hljs-string">"hello-world, echo"</span> my-app
</code></pre>
<p>To generate an LLM streaming application using Anthropic as the LLM is as simple as this:</p>
<pre><code class="lang-bash">$ npx create-jambonz-ws-app --scenario llm-streaming my-app

Creating a new jambonz websocket app <span class="hljs-keyword">in</span> /Users/dhorton/Downloads/my-app

Installing packages...
$
</code></pre>
<p>The main part of the application is just over 100 lines of code.</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> Anthropic = <span class="hljs-built_in">require</span>(<span class="hljs-string">'@anthropic-ai/sdk'</span>);
<span class="hljs-keyword">const</span> assert = <span class="hljs-built_in">require</span>(<span class="hljs-string">'assert'</span>);
<span class="hljs-keyword">const</span> ANTHROPIC_MODEL = <span class="hljs-string">'claude-3-5-haiku-latest'</span>;
<span class="hljs-keyword">const</span> systemPrompt = <span class="hljs-string">`You are a helpful conversational AI voice bot.
Please keep your answers short and to the point; the user will follow up with more questions if needed.
Please reply with unadorned text that can be read aloud to the user using a TTS engine`</span>;

assert(process.env.ANTHROPIC_API_KEY, <span class="hljs-string">'ANTHROPIC_API_KEY is required'</span>);

<span class="hljs-keyword">const</span> service = <span class="hljs-function">(<span class="hljs-params">{logger, makeService}</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> svc = makeService({<span class="hljs-attr">path</span>: <span class="hljs-string">'/llm-streaming'</span>});

  svc.on(<span class="hljs-string">'session:new'</span>, <span class="hljs-function">(<span class="hljs-params">session</span>) =&gt;</span> {

    session.locals = {
      <span class="hljs-attr">logger</span>: logger.child({<span class="hljs-attr">call_sid</span>: session.call_sid}),
      <span class="hljs-attr">client</span>: <span class="hljs-keyword">new</span> Anthropic({ <span class="hljs-attr">system</span>: systemPrompt }),
      <span class="hljs-attr">messages</span>: [],
      <span class="hljs-attr">assistantResponse</span>: <span class="hljs-string">''</span>
    };
    logger.debug({session}, <span class="hljs-string">`new incoming call: <span class="hljs-subst">${session.call_sid}</span>`</span>);


    session
      .on(<span class="hljs-string">'/speech-detected'</span>, onSpeechDetected.bind(<span class="hljs-literal">null</span>, session))
      .on(<span class="hljs-string">'tts:streaming-event'</span>, onStreamingEvent.bind(<span class="hljs-literal">null</span>, session))
      .on(<span class="hljs-string">'tts:user_interrupt'</span>, onUserInterrupt.bind(<span class="hljs-literal">null</span>, session))
      .on(<span class="hljs-string">'close'</span>, onClose.bind(<span class="hljs-literal">null</span>, session))
      .on(<span class="hljs-string">'error'</span>, onError.bind(<span class="hljs-literal">null</span>, session));

    <span class="hljs-keyword">try</span> {
      session
        .config({
          <span class="hljs-attr">ttsStream</span>: {
            <span class="hljs-attr">enable</span>: <span class="hljs-literal">true</span>,
          },
          <span class="hljs-attr">bargeIn</span>: {
            <span class="hljs-attr">enable</span>: <span class="hljs-literal">true</span>,
            <span class="hljs-attr">sticky</span>: <span class="hljs-literal">true</span>,
            <span class="hljs-attr">minBargeinWordCount</span>: <span class="hljs-number">1</span>,
            <span class="hljs-attr">actionHook</span>: <span class="hljs-string">'/speech-detected'</span>,
            <span class="hljs-attr">input</span>: [<span class="hljs-string">'speech'</span>]
          }
        })
        .say({<span class="hljs-attr">text</span>: <span class="hljs-string">'Hi there, how can I help you today?'</span>})
        .send();
    } <span class="hljs-keyword">catch</span> (err) {
      session.locals.logger.info({err}, <span class="hljs-string">`Error to responding to incoming call: <span class="hljs-subst">${session.call_sid}</span>`</span>);
      session.close();
    }
  });
};

<span class="hljs-keyword">const</span> onSpeechDetected = <span class="hljs-keyword">async</span>(session, event) =&gt; {
  <span class="hljs-keyword">const</span> {logger, client} = session.locals;
  <span class="hljs-keyword">const</span> {speech} = event;

  session.reply();

  <span class="hljs-keyword">if</span> (speech?.is_final) {
    <span class="hljs-keyword">const</span> {transcript} = speech.alternatives[<span class="hljs-number">0</span>];
    session.locals.messages.push({
      <span class="hljs-attr">role</span>: <span class="hljs-string">'user'</span>,
      <span class="hljs-attr">content</span>: transcript
    });
    session.locals.user_interrupt = <span class="hljs-literal">false</span>;

    logger.info({<span class="hljs-attr">messages</span>:session.locals.messages}, <span class="hljs-string">`session <span class="hljs-subst">${session.call_sid}</span> making request to Anthropic`</span>);

    <span class="hljs-keyword">const</span> stream = <span class="hljs-keyword">await</span> client.messages.create({
      <span class="hljs-attr">model</span>: ANTHROPIC_MODEL,
      <span class="hljs-attr">max_tokens</span>: <span class="hljs-number">1024</span>,
      <span class="hljs-attr">messages</span>: session.locals.messages,
      <span class="hljs-attr">stream</span>: <span class="hljs-literal">true</span>
    });

    <span class="hljs-keyword">for</span> <span class="hljs-keyword">await</span> (<span class="hljs-keyword">const</span> messageStreamEvent <span class="hljs-keyword">of</span> stream) {
      <span class="hljs-keyword">if</span> (session.locals.user_interrupt) {
        logger.info(<span class="hljs-string">`session <span class="hljs-subst">${session.call_sid}</span> user interrupted, closing stream`</span>);
        session.locals.messages.push({
          <span class="hljs-attr">role</span>: <span class="hljs-string">'assistant'</span>,
          <span class="hljs-attr">content</span>: <span class="hljs-string">`<span class="hljs-subst">${session.locals.assistantResponse}</span>...`</span>
        });
        session.locals.assistantResponse = <span class="hljs-string">''</span>;
        <span class="hljs-keyword">break</span>;
      }

      logger.info({messageStreamEvent}, <span class="hljs-string">`session <span class="hljs-subst">${session.call_sid}</span> received message stream event`</span>);

      <span class="hljs-keyword">if</span> (messageStreamEvent.delta?.text) {
        <span class="hljs-keyword">const</span> tokens = messageStreamEvent.delta.text;
        session.locals.assistantResponse += tokens;
        session.sendTtsTokens(tokens)
          .catch(<span class="hljs-function">(<span class="hljs-params">err</span>) =&gt;</span> logger.error({err}, <span class="hljs-string">'error sending TTS tokens'</span>));
      }
      <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (messageStreamEvent.type === <span class="hljs-string">'message_stop'</span>) {
        logger.info(<span class="hljs-string">`session <span class="hljs-subst">${session.call_sid}</span> flushing TTS tokens`</span>);
        session.flushTtsTokens();
        session.locals.messages.push({
          <span class="hljs-attr">role</span>: <span class="hljs-string">'assistant'</span>,
          <span class="hljs-attr">content</span>: session.locals.assistantResponse
        });
        session.locals.assistantResponse = <span class="hljs-string">''</span>;
      }
    }
    logger.info(<span class="hljs-string">`session <span class="hljs-subst">${session.call_sid}</span> completed processing stream`</span>);
  }
};

<span class="hljs-keyword">const</span> onUserInterrupt = <span class="hljs-function">(<span class="hljs-params">session</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> {logger} = session.locals;
  logger.info(<span class="hljs-string">`session <span class="hljs-subst">${session.call_sid}</span> received user interrupt, cancel any requests in progress to Anthropic`</span>);
  session.locals.user_interrupt = <span class="hljs-literal">true</span>;
};

<span class="hljs-keyword">const</span> onStreamingEvent = <span class="hljs-function">(<span class="hljs-params">session, event</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> {logger} = session.locals;
  logger.info({event}, <span class="hljs-string">`session <span class="hljs-subst">${session.call_sid}</span> received streaming event`</span>);
};

<span class="hljs-keyword">const</span> onClose = <span class="hljs-function">(<span class="hljs-params">session, code, reason</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> {logger} = session.locals;
  logger.debug({session, code, reason}, <span class="hljs-string">`session <span class="hljs-subst">${session.call_sid}</span> closed`</span>);
};

<span class="hljs-keyword">const</span> onError = <span class="hljs-function">(<span class="hljs-params">session, err</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> {logger} = session.locals;
  logger.info({err}, <span class="hljs-string">`session <span class="hljs-subst">${session.call_sid}</span> received error`</span>);
};

<span class="hljs-built_in">module</span>.exports = service;
</code></pre>
<p>You can see it's pretty simple: we use a "background gather" (config.bargeIn) always be collecting the user's utterance, and a "background say" (config.ttsStream) to stream text from the LLM vendor via jambonz to the TTS Engine.</p>
<p>We have event handlers to receive streaming events from jambonz, and we handle user interruptions by canceling the outstanding request to Anthropic.  </p>
<p>That's it!  Feel to modify to your needs.</p>
]]></content:encoded></item><item><title><![CDATA[Using jambonz for Retell custom telephony]]></title><description><![CDATA[Overview
jambonz is an open source voice gateway platform that can integrate any telephony provider with Retell's VoiceAI platform. It has several advantages over Twilio, including:

more cost-effective: Twilio's per-minute rounding and surcharges fo...]]></description><link>https://blog.jambonz.org/using-jambonz-for-retell-custom-telephony</link><guid isPermaLink="true">https://blog.jambonz.org/using-jambonz-for-retell-custom-telephony</guid><category><![CDATA[Retell]]></category><category><![CDATA[#cpass #voicegateway #opensource #jambonz]]></category><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Wed, 30 Oct 2024 22:40:08 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-overview">Overview</h1>
<p><a target="_blank" href="https://jambonz.org">jambonz</a> is an open source voice gateway platform that can integrate any telephony provider with Retell's VoiceAI platform. It has several advantages over Twilio, including:</p>
<ul>
<li><p>more cost-effective: Twilio's per-minute rounding and surcharges for using features like their voice sdk and bidirectional streaming can be eliminated, and jambonz provides all the same features (and more).</p>
</li>
<li><p>you can bring your own carrier (jambonz has integrated with hundreds of SIP providers and PBXs).</p>
</li>
<li><p>run anywhere: jambonz can run in your cloud, on prem, or you can use our hosted service.</p>
</li>
</ul>
<p>In this blog post we will walk you through deploying jambonz as a custom telephony solution for Retell AI. <strong>You will need</strong>:</p>
<ol>
<li><p>A SIP trunking provider or PBX that you are currently using and want to connect to Retell (presumably, you already have this if you are interested in custom telephony integration).</p>
</li>
<li><p>An account on <a target="_blank" href="https://jambonz.cloud/register">jambonz.cloud</a> (you can sign up for a free 3-week trial) <strong>or</strong>, alternatively, your own hosted jambonz platform (you can deploy in your own AWS account using this <a target="_blank" href="https://aws.amazon.com/marketplace/pp/prodview-55wp45fowbovo">AWS marketplace offering</a>).</p>
</li>
<li><p>A server on which to run a jambonz application (you can use <a target="_blank" href="https://ngrok.com/">ngrok</a> for early testing from your laptop).</p>
</li>
</ol>
<p>In this example, we will assume you are deploying using jambonz.cloud but the instructions are the same for those running a self-hosted jambonz system.</p>
<h1 id="heading-high-level-overview">High-level overview</h1>
<h2 id="heading-how-retell-works">How Retell works</h2>
<p>Retell provides two methods of custom telephony integration, and jambonz supports both:</p>
<ul>
<li><p><a target="_blank" href="https://docs.retellai.com/make-calls/custom-telephony#method-1-elastic-sip-trunking-recommended">Method 1: Elastic SIP Trunking</a>. This is the method recommended by Retell, and we will focus mostly on this approach.</p>
</li>
<li><p><a target="_blank" href="https://docs.retellai.com/make-calls/custom-telephony#method-2-dial-to-sip-endpoint">Method 2: Dial to SIP Endpoint</a>. This method can alternatively be used, but one of the drawbacks of this method is that the built-in call transfer function that Retell provides will not work with it because Retell will not send a REFER when this method is used.</p>
</li>
</ul>
<blockquote>
<p>Note: you can still implement call transfer when using method 2 by creating your own user function that instructs jambonz to transfer the call.</p>
</blockquote>
<h2 id="heading-how-jambonz-works">How jambonz works</h2>
<p>jambonz lets you create sip trunks that can send or receive calls from any number of providers. When calls arrive at a jambonz platform, the first thing jambonz does is to figure out which account is responsible for handling that call (jambonz is a multi-tenant platform). One of the main ways it does that is by looking at the DID/called number and determining which account owns that DID and has a SIP trunk built to the source/sending SIP trunking provider or gateway.</p>
<p>Once the account is identified, jambonz connects to a webhook or websocket application that the account has configured in order to retrieve a set of instructions for the call. It is up to the account holder to provide such an application. In our case, I am going to give you a sample application that will connect the incoming call to your Retell agent, using either method 1 or method 2. You should feel free to modify this sample application to your needs and desires, but it handles quite a few things out of the box:</p>
<ul>
<li><p>inbound and outbound calling</p>
</li>
<li><p>Retell webhooks for inbound call and agent events</p>
</li>
<li><p>call transfer via Retell built-in call transfer function.</p>
</li>
</ul>
<p>OK, with that brief overview of how Retell and jambonz work individually, let's put peanut butter and chocolate together now and make something fabulous here!</p>
<h1 id="heading-step-by-step-instructions">Step by step instructions</h1>
<p>First, let's integrate using method 1: Elastic SIP Trunking</p>
<h2 id="heading-get-the-jambonz-application">Get the jambonz application</h2>
<p>You can find the example jambonz application <a target="_blank" href="https://github.com/jambonz/retell-sip-integration-example">here</a>. Clone it to a directory on your laptop and install it:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> https://github.com/jambonz/retell-sip-integration-example.git
<span class="hljs-built_in">cd</span> retell-sip-integration-example
npm ci
</code></pre>
<h2 id="heading-on-jambonz-create-a-carriersip-trunk-to-send-calls-to-retell">On jambonz: create a Carrier/SIP trunk to send calls to Retell</h2>
<p>jambonz is a BYOC (Bring your own Carrier) platform. The term "Carrier" is used interchangeably with "SIP trunk" and we first want to build a sip trunk from jambonz to Retell.</p>
<p>To do so, log into your jambonz account and select Carrier from the lefthand menu, then click the plus sign to add a carrier.</p>
<p>Give the carrier a name 'Retell'. Check the box for E.164 syntax, uncheck outbound authentication, and then add one SIP gateway with their network address of <code>5t4n6j0wnrl.sip.livekit.cloud</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730317517911/45dda0dd-a9c9-4197-91c0-75b459cd1c2f.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-on-jambonz-create-a-sip-credential-to-use-for-authentication-with-retell">On jambonz: create a SIP credential to use for authentication with Retell</h2>
<p>Click on "Clients" and add a sip client with a name and password.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730317655883/3853c3c3-26cf-4253-a48c-e0d01986399e.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-on-jambonz-add-a-carriersip-trunk-for-your-pstn-provider">On jambonz: add a Carrier/SIP trunk for your PSTN provider</h2>
<p>Create a second Carrier entry, this time you are creating a SIP trunk for your PSTN/DID provider. Add both inbound and outbound gateways for this SIP trunking provider. jambonz provides a lot of options for integrating SIP trunking providers so choose the ones that are relevant to your provider.</p>
<p>You can even use Twilio as a trunking provider if you want, and when you create the Carrier notice there is a dropdown of predefined settings for some carriers, including twilio.</p>
<h2 id="heading-on-jambonz-add-an-application">On jambonz: add an application</h2>
<p>You will be running the jambonz application that you cloned in a bit. Before you do that, we need to add an Application in jambonz. Click to add an application, give it a name and set both the call webhook and call status webhook to <code>wss://&lt;yourdomain&gt;/retell</code>. (If you are using ngrok to test from your laptop then you will use the ngrok domain).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730326415498/58003c95-65be-4ae4-8e2a-123cb1dcf96b.png" alt class="image--center mx-auto" /></p>
<p>For speech vendors you can leave the default setting as your application initially will not need to use any TTS and STT services on jambonz. Save the application</p>
<h2 id="heading-on-jambonz-add-your-phone-numbers">On jambonz: add your phone number(s)</h2>
<p>Now that you have added your sip trunking provider and an application, add the phone numbers that you have routed from that SIP trunking provider to jambonz. For each, route the calls arriving on that number to the application that you created. (If you have many phone numbers you can route all calls from that provider to the application on the Carrier settings page).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730326488791/60bc841c-5ea7-47f3-b7fd-ff978080e6fe.png" alt class="image--center mx-auto" /></p>
<p>Things are now set up on jambonz. We have:</p>
<ul>
<li><p>created a sip trunk so we can receive and send calls to your trunking provider,</p>
</li>
<li><p>created a sip credential (username, password) we can use to authenticate with Retell,</p>
</li>
<li><p>created another sip trunk for our PSTN provider so we can send calls to Retell,</p>
</li>
<li><p>added a jambonz application that provides the "glue" to forward these calls to Retell, and</p>
</li>
<li><p>created routing of calls arriving on phone numbers to this application</p>
</li>
</ul>
<p>Now let's head over to Retell and finish things off.</p>
<h2 id="heading-on-retell-add-phone-numbers">On Retell: add phone number(s)</h2>
<p>In the Retell Dashboard, select "Phone Numbers" and click the plus sign. In the dropdown select "Connect to your number via SIP trunking".</p>
<ul>
<li><p>Add the phone number in E.164 format (ie leading + followed by country code)</p>
</li>
<li><p>For termination URI enter a URI with the DNS of your sip realm in jambonz (you can find that under the Account tab in the jambonz portal), e.g. 'mydomain.sip.jambonz.cloud'</p>
</li>
<li><p>For sip trunk username and password enter the username and password for the SIP credential you created above on jambonz.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738096218572/8a33ab99-2f61-45ba-920c-767a1cd7829b.png" alt class="image--center mx-auto" /></p>
</li>
</ul>
<h2 id="heading-on-retell-associate-the-number-to-an-agent">On Retell: associate the number to an agent</h2>
<p>Select an agent and then associate the phone number to the agent.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730384572321/4dccdbc5-ba89-4a72-9e5a-96e78b531bd1.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-on-retell-set-the-inbound-call-webhook">On Retell: set the inbound call webhook</h2>
<p>Select your Agent and then "webhook settings". Add both an "Inbound call webhook URL" and "Agent Level Webhook URL".</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730318587428/627f35ea-7be3-40ce-b2be-f2ba9b14ea69.png" alt class="image--center mx-auto" /></p>
<p>For both, the host will be the DNS where your jambonz application is running, and the paths will be "/inbound-webhook" and "/agent-events" respectively.</p>
<h1 id="heading-running-the-application-method-1">Running the application - method 1</h1>
<p>Now you are ready to test the application. You must provide your Retell api key and the name of the Retell Carrier that you created on jambonz</p>
<pre><code class="lang-bash">RETELL_API_KEY=xxxxxxxxxx RETELL_TRUNK_NAME=Retell node app.js
</code></pre>
<p>Place a call to one of your phone numbers and it should be connected to your Retell agent. You will see lots of logging of agent events in your console or log where the jambonz application is running.</p>
<h1 id="heading-running-the-application-method-2">Running the application - method 2</h1>
<p>If instead you wish to use method 2 - Dial to SIP Endpoint you do not need to create a Carrier for Retell on jambonz. You still need to create the Carrier entry for your SIP provider, create the application and add the phone numbers as before (you do not need to create a SIP credential).</p>
<p>No setup is needed on the Retell side either; you do not need to add a phone number since we wont be dialing to a phone number, instead we will be calling the <a target="_blank" href="https://docs.retellai.com/api-references/register-call">Register Call</a> api.</p>
<p>To run using method 2, you need to specify your Retell api key and agent id.</p>
<pre><code class="lang-bash">RETELL_API_KEY=xxxxxxxxxxxxxx RETELL_AGENT_ID=agent_yyyyyyyyy node app.js
</code></pre>
<h1 id="heading-outbound-calls">Outbound calls</h1>
<p>To use this application for outbound calls, use the <a target="_blank" href="https://api.jambonz.org/#243a2edd-7999-41db-bd0d-08082bbab401">jambonz REST API</a> to create a new call. To do this you will need to know:</p>
<ul>
<li><p>your jambonz account_sid</p>
</li>
<li><p>your jambonz api key</p>
</li>
<li><p>the application_sid of this application (available once you Add Application in jambonz)</p>
</li>
<li><p>the base URL of the jambonz system you are using (https://api.jambonz.cloud for example on jambonz.cloud)</p>
</li>
<li><p>the name of the Carrier you created for your PSTN provider</p>
</li>
</ul>
<p>You can then format and send an HTTP POST to jambonz like this:</p>
<pre><code class="lang-bash">curl --location -g <span class="hljs-string">'https:/{{baseUrl}}/v1/Accounts/{{account_sid}}/Calls'</span> \
--header <span class="hljs-string">'Authorization: Bearer {{api_key}}'</span> \
--header <span class="hljs-string">'Content-Type: application/json'</span> \
--data <span class="hljs-string">'{
    "application_sid": "{{application_sid}}",
    "from": "15083728299",
    "to": {
        "type": "phone",
        "number": "15082084809",
        "trunk": "{{PSTN carrier name}}"
    }
}'</span>
</code></pre>
<p>Of course, substitute in your own from and to phone numbers. The example above assumes that you have created a BYOC trunk on jambonz that you will use to outdial the user.</p>
<p>The result of running the above curl command, then, will be to dial the user first and when they answer connect them to your Retell agent.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>This blog post showed how you can easily deploy the jambonz voice gateway to provide custom telephony integration to Retell AI, saving costs and enabling interoperability with any SIP trunking/DID provider.</p>
<p>If you have any questions about jambonz, or would like to inquire about our support plans please email us at support@jambonz.org, or join our Slack channel by going to <a target="_blank" href="https://joinslack.jambonz.org">joinslack.jambonz.org</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Some initial thoughts on OpenAI's Realtime API]]></title><description><![CDATA[I spent the past weekend adding support for OpenAI's Realtime API to jambonz. My first impression was very positive: as others have reported,  the conversational flow was very much at the tempo of a normal conversation. 
I'd implemented support for i...]]></description><link>https://blog.jambonz.org/some-initial-thoughts-on-openais-realtime-api</link><guid isPermaLink="true">https://blog.jambonz.org/some-initial-thoughts-on-openais-realtime-api</guid><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Mon, 14 Oct 2024 11:40:51 GMT</pubDate><content:encoded><![CDATA[<p>I spent the past weekend adding support for <a target="_blank" href="https://openai.com/index/introducing-the-realtime-api/">OpenAI's Realtime API</a> to jambonz. My first impression was very positive: as others have reported,  the conversational flow was very much at the tempo of a normal conversation. </p>
<p>I'd implemented support for interrupting the agent, so that worked nicely and I was able to have amazingly lifelike conversations with the agent. I'd also implemented support for function calls, so I was able to supply tools to the agent and see them used properly to increase the value content delivered by the conversation.</p>
<p>Then I decided to do a little experiment. I decided to use jambonz to create an application that would use the OpenAI API for a Voice/AI conversation, <strong>while at the same time</strong> sending the user's audio input to Deepgram for simultaneous speech recognition processing.</p>
<p>The goal was to compare the transcripts returned by OpenAI (Whisper) with those returned by Deepgram for the same audio stream, delivered simultaneously to each service. And furthermore, to measure the relative latency of each in processing the same realtime audio stream.</p>
<p>I chose Deepgram as the comparison because in my opinion they are the premier ASR vendor in terms of accuracy and latency for English language recognition.</p>
<h1 id="heading-the-setup">The setup</h1>
<p>The setup consisted of a jambonz server running on an AWS EC2 instance, configured to received calls from a Twilio Elastic SIP Trunk. The calls were routed to a jambonz application that simultaneously started a Voice/AI conversation with OpenAI API while streaming the caller's audio to Deepgram for recognition. To make it a fair comparison, I did not provide any phase hints to Deepgram or other special configuration. The OpenAI and Deepgram services were both accessed via their hosted endpoints, and my EC2 instance was in the us-east-2 region of AWS.</p>
<h1 id="heading-the-measurements">The measurements</h1>
<p>The OpenAI API provides a wealth of event data during the conversation by means of <a target="_blank" href="https://platform.openai.com/docs/guides/realtime/server-events">server events</a> that it sends back over the websocket connection. These include <code>conversation.item.input_audio_transcription.completed</code> which is emitted when a final transcript is made from a user utterance. So I used this event to determine when OpenAI, and the underlying Whisper ASR, had completed speech recognition of the user's speech and what it determined that the user had said.</p>
<p>Similarly, Deepgram also provides events over a websocket interface that include both partial and final transcripts. I used the arrival of final transcripts for the same user utterance to compare with OpenAI.</p>
<h1 id="heading-the-conversation">The conversation</h1>
<p>For this test, I had a conversation where I asked OpenAI a series of questions about Alexander the Great. You can listen to the conversation below, it went pretty well although you'll hear the OpenAI assistant misunderstood one of my questions at the end and had to be nudged back on track. All in all though, a very positive conversation experience, especially since I was calling in from my mobile phone and not making any effort to speak more clearly or slower than usual.</p>
<p>You can listen to the conversation <a target="_blank" href="https://recordings.jambonz.cloud/alex-the-great-conversation-with-openai.mp3">here</a></p>
<p>However, when we look at the detail of the transcripts that OpenAI made and the latency compared to Deepgram, we find that OpenAI is both less accurate and slower.  Let's go through the conversation turn by turn, and look at what happens:</p>
<h2 id="heading-turn-by-turn-analysis">Turn by turn analysis</h2>
<h3 id="heading-turn-1">Turn 1</h3>
<ul>
<li>Assistant: <em>"How can I help you today?"</em></li>
<li>Me: <em>"Well, I'd like to ask some questions about Alexander the Great."</em></li>
<li>OpenAI transcript: <em>"Well, I'd like to ask some questions about Alexander the Great."</em></li>
<li>Deepgram transcript: <em>"well i'd like to ask some questions about alexander the great"</em></li>
<li>latency: Deepgram faster by <strong>0.965 secs</strong></li>
</ul>
<blockquote>
<p>So far, so good.  Both OpenAI whisper and Deepgram return accurate transcripts, with Deepgram being faster.  </p>
</blockquote>
<h3 id="heading-turn-2">Turn 2</h3>
<ul>
<li>Assistant: <em>"Sure, I'm happy to help! What do you want to know about Alexander the Great?"</em></li>
<li>Me: <em>"Well, I understand he had a pretty brief reign, can you tell me about when that was?"</em></li>
<li>OpenAI: <em>"understand he had a pretty brief reign. Can you tell me a little bit about when that was?"</em></li>
<li>Deepgram: <em>"well i understand he had a pretty brief rein can you tell me a little bit about when that was with"</em></li>
<li>Latency: Deepgram faster by <strong>3.12  secs</strong></li>
</ul>
<blockquote>
<p>Again, both basically captured what I said in a form that was understandable to the LLM.  However, OpenAI missed the first two words I spoke.  This seems to be an intermittent problem with OpenAI because I have observed it on multiple occasions.  Deepgram was basically accurate though strangely it invented the word "rein".<br />In this turn, Deepgram was  much faster than OpenAI.  However, if you listen to audio you can see the slower latency did not impact the conversational flow in a noticeable way.  More on that later.</p>
</blockquote>
<h3 id="heading-turn-3">Turn 3</h3>
<ul>
<li>Assistant: <em>"Certainly! Alexander the Great reigned from 336 BC until his death in 323 BC. Despite the brevity of his reign, he created one of the largest empires in history, stretching from Greece to Egypt and into northwest India. Quite the accomplishment for about 13 years of work, wouldn't you say?"</em></li>
<li>Me: <em>"How far into Asia did his empire extend and where did he stop?"</em></li>
<li>OpenAI: <em>"How far into Asia did his empire extend and where did he stop?"</em></li>
<li>Deepgram: <em>"how far into asia did his empire extend and and where did he stop"</em></li>
<li>Latency: Deepgram faster by <strong>1.64  secs</strong></li>
</ul>
<blockquote>
<p>Everything still going well and both recognizers achieved accurate transcripts.</p>
</blockquote>
<h3 id="heading-turn-4">Turn 4</h3>
<ul>
<li>Assistant: <em>"Alexander's empire extended as far as the Hyphasis River, known today as the Beas River, in what is now the Punjab region of modern-day India and Pakistan. He reached this point in 326 BC, but his troops, exhausted and longing to return home, mutinied, forcing him to turn back. So, he never actually lost a battle, but his campaign ended due to the morale and fatigue of his army."</em></li>
<li>Me: <em>"I thought he actually stopped at the Inus river. Am I wrong?"</em></li>
<li>OpenAI: <em>"I thought he actually stopped at the Inis River, am I wrong?"</em></li>
<li>Deepgram: <em>"i thought he actually stopped at the ines river am i wrong"</em></li>
<li>Latency: Deepgram faster by <strong>1.9  secs</strong></li>
</ul>
<blockquote>
<p>Both recognize the utterance.</p>
</blockquote>
<h3 id="heading-turn-5">Turn 5</h3>
<ul>
<li>Assistant: <em>"Ah, close but not quite! Alexander reached the Hyphasis River—modern-day Beas River— and intended to push further into the Indian subcontinent toward the Ganges River. However, his army, weary and anxious about facing larger armies and more war elephants, refused to march further east. This forced Alexander to turn back, marking the easternmost extent of his conquests, rather than at the Indus River."</em></li>
<li>Me: <em>"In those days, I understand many people thought of Alexander as a god or the son of a god.  But what about his generals? How did they regard him?"</em></li>
<li>OpenAI: <em>"In those days, I understand many people thought of Alexander as the son of a god.  But what about his generals? How did they regard him?"</em></li>
<li>Deepgram: <em>"those days understand many people thought of alexander as god or a son of a god but what about his generals how did they regard him"</em></li>
<li>Latency: Deepgram faster by <strong>1.63  secs</strong></li>
</ul>
<blockquote>
<p>Conversation still on track and user utterances being recognized more or less accurately by both recognizers, though OpenAI missed the reference to "a god".</p>
</blockquote>
<h3 id="heading-turn-6">Turn 6</h3>
<blockquote>
<p>Here things start to get a little more interesting..</p>
<ul>
<li>Assistant: <em>"Alexander's relationship with his generals was complex. While many respected and admired him for his brilliance and battlefield successes, there was also tension. Alexander's adoption of Persian customs and his claim to divine status, presenting himself as the son of Zeus-Ammon, created friction. Some generals felt alienated by his shift away from traditional Macedonian ways, leading to conspiracies and tension within his ranks. Nonetheless, his unparalleled success in expanding the empire often kept them loyal, despite their personal misgivings. His ability to maintain control over such a vast and diverse group was a testament to his leadership and charisma."</em></li>
<li>Me: <em>"On a modern day map where is Macedonia?  I understand Macedonia is where he's from, is that right?"</em></li>
<li>OpenAI: <em>"On a modern day map, where is Macedonia? I understand Macedonia's where I'm from, is that right?"</em></li>
<li>Deepgram: <em>"on a modern day map where is macedonia i understand macedonian is where he's from is that right"</em></li>
<li>Latency: Deepgram faster by <strong>1.36  secs</strong></li>
</ul>
<p>Here OpenAI makes an interesting mistake.  I said "he's from" but it transcribed "I'm from" even though those two utterances dont sound alike.  I don't think this is a one-off error either, as I spotted other instances where I used a pronoun like "you" and OpenAI returned a transcript with "I" instead.  We'll note in the next turn though that OpenAI's LLM does not get knocked off track by this error - an example of the LLM "covering up" for mistakes in the Whisper ASR.</p>
</blockquote>
<h3 id="heading-turn-7">Turn 7</h3>
<ul>
<li>Assistant: <em>"That's right, Alexander the Great was from Macedon, an ancient kingdom located in the northern part of Greece. On a modern map, the region that was ancient Macedon is mostly in the northern part of Greece, ..(interrupted)."</em></li>
<li>Me: <em>"How did he defeat Greece..in the first part..in the first place?"</em></li>
<li>OpenAI: <em>"How did he eat reef?  in the first place."</em></li>
<li>Deepgram: <em>"how did he defeat greece in the first part, the first place"</em></li>
<li>Latency: Deepgram faster by <strong>0.53  secs</strong></li>
</ul>
<blockquote>
<p>Whoops.  Now OpenAI Whisper has failed and recognized "defeat" as "eat" and "Greece" as "reef".  Meanwhile, Deepgram recognizes the statement correctly.  This will briefly derail the conversation in the next turn.</p>
</blockquote>
<h3 id="heading-turn-8">Turn 8</h3>
<ul>
<li>Assistant: <em>"Alexander the Great died in 323 BC in the palace of Nebuchadnezzar II in Babylon, which is in modern-day Iraq. The exact cause of his death remains a mystery and has been the subject of much ..(interruption)."</em></li>
<li>Me: <em>"How did he defeat mainland Greece was the question."</em></li>
<li>OpenAI: <em>"How do you eat mainland Greece? was the question."</em></li>
<li>Deepgram: <em>"how did he defeat mainland greece was the question"</em></li>
<li>Latency: Deepgram faster by <strong>0.80  secs</strong></li>
</ul>
<blockquote>
<p>Given another chance, OpenAI still does not accurately understand what I said while Deepgram does.  But now something really interesting happens.  Even though the Whisper ASR returns an incorrect, meaningless (how would one eat "mainland Greece"?), and misleading transcription, the LLM again manages to use its context to power through to a response that is on target.</p>
</blockquote>
<h3 id="heading-turn-9">Turn 9</h3>
<ul>
<li>Assistant: <em>"My apologies for the confusion. Alexander the Great never actually invaded mainland Greece. He was from Macedon, which was a kingdom located to the north of the Greek city-states. After the assassination of his father, King Philip II, in 336 BC, Alexander became king and continued his father's efforts to unite the Greek city-states under Macedonian hegemony. Alexander's influence over Greece was solidified by his role as the leader of the Corinthian League, a coalition of Greek states. From there, he launched his campaign against the Persian Empire, which is where most of his famous conquests took place. So, rather than invading mainland Greece, Alexander integrated it into his empire before embarking on his legendary campaigns across Asia."</em></li>
</ul>
<h1 id="heading-conclusions">Conclusions</h1>
<h2 id="heading-deepgram-is-more-accurate">Deepgram is more accurate</h2>
<p>Deepgram fairly consistently returned more accurate transcripts.  OpenAI return two transcripts that had significant errors (I define "significant" as an error that could potentially derail the conversation) while Deepgram returned none.</p>
<h3 id="heading-but-it-doesnt-always-matter">BUT, it doesn't always matter</h3>
<p>The errors in the Whisper ASR were sometimes handled or "covered up", if you will, by the OpenAI LLM model.  Presumably, the previous context of the conversation up to that point is used to inform the model in a way that lets it answer accurately even within some range of incorrect transcriptions.</p>
<h2 id="heading-deepgram-is-faster">Deepgram is faster</h2>
<p>Deepgram was faster in returning final transcripts in every turn of the conversation, by significant amounts.</p>
<h2 id="heading-but-the-conversation-latency-was-still-extremely-low">BUT, the conversation latency was still extremely low</h2>
<p>Even though Whisper had greater latency, the conversation flowed at near-human tempo.  Why is this?  Why does OpenAI not pay a noticeable penalty for the slower latency?  Most likely because of the nature of streaming recognized tokens into the LLM.  Even though the full transcript took longer, the ability to stream tokens well before the final transcript was created allows the rapid conversational response.</p>
<h2 id="heading-context-matters-and-what-this-says-about-asr-apis-in-general">Context matters, and what this says about ASR APIs in general</h2>
<p>We can see here that speech-to-speech APIs enjoy a big advantage over more traditional "voice =&gt; ASR =&gt; text =&gt; NLU/Intent/Dialog =&gt; text =&gt; TTS =&gt; voice" pipelines, and that is....<strong>Context</strong>.  We see context of the conversation being used here to improve results and rub out imperfections in transcriptions.</p>
<p>In the more traditional speech-to-text APIs provided by speech vendors this opportunity for context is lost, for several reasons:</p>
<ol>
<li>When we send speech to the ASR for transcription, the ASR has no knowledge of what question or prompt we just served up to the user.  In other words, the users speech is in response to some prompt, but the ASR has no knowledge of what that prompt was.  This is a shortcoming I have cited for some time - see my earlier blog post: <a target="_blank" href="https://blog.jambonz.org/speech-companies-are-failing-at-conversational-ai">Speech companies are failing at Conversational AI</a>, but so far I have been unsuccessful to get speech companies to recognize the opportunity here to expand their apis to take advantage of this.</li>
<li>Often, in traditional ASR, each distinct utterance sent to the speech vendor is treated as a separate, standalone utterance.  The ASR does not even have the ability to comprehend that 10 different utterances from the same user are all part of the same conversation, and likely on the same topic.</li>
</ol>
<p>This makes the traditional voice =&gt; speech =&gt; pipeline much more brittle and easier to derail than speech-to-speech.  But it does not need to be this way, and I do not conclude that speech-to-speech will necessarily dominate in areas like CXAI.</p>
<p>Why is this?  Because first, the only significant improvement (and make no mistake, it is hugely significant) that speech-to-speech offers is that it approaches human-level conversational flow.  But to the extent that this is due to more effective use of context, as I've just described, this is something that can be radically improved in the traditional voice=&gt;text=&gt;voice pipeline, as I have also just described.</p>
<p>Second, to be usable in CXAI environments by global brands, guardrails are necessary. And guardrails inevitably damage latency.  Traditional NLU/Intent/Dialog based CXAI platforms already have guardrails built in, and if you start losing a significant amount of the latency improvement in speech-to-speech all you are left with is a more expensive solution without that compelling differentiator.</p>
<p>Visit us at jambonz.org or email us at support@jambonz to learn more, or to find out about installing your own jambonz voice platform for Voice/AI applications.</p>
]]></content:encoded></item><item><title><![CDATA[Microsoft Teams Direct Routing with jambonz]]></title><description><![CDATA[This blog was comes to you thanks to Peter Mijnster at SoundOfData and Antony Jukes at Callable.io, who were gracious enough to capture and relay their experience in setting up jambonz as a Direct Routing SBC for Microsoft Teams. Thanks Peter and Ant...]]></description><link>https://blog.jambonz.org/microsoft-teams-direct-routing-with-jambonz</link><guid isPermaLink="true">https://blog.jambonz.org/microsoft-teams-direct-routing-with-jambonz</guid><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Thu, 15 Aug 2024 13:15:24 GMT</pubDate><content:encoded><![CDATA[<blockquote>
<p>This blog was comes to you thanks to <a target="_blank" href="https://www.linkedin.com/in/petermijnster/">Peter Mijnster</a> at <a target="_blank" href="https://www.soundofdata.com/">SoundOfData</a> and <a target="_blank" href="https://www.linkedin.com/in/antony-jukes-7b139782/">Antony Jukes</a> at <a target="_blank" href="https://www.callable.io/">Callable.io</a>, who were gracious enough to capture and relay their experience in setting up jambonz as a <a target="_blank" href="https://learn.microsoft.com/en-us/microsoftteams/pstn-connectivity#teams-phone-with-direct-routing">Direct Routing SBC for Microsoft Teams</a>. Thanks Peter and Antony!</p>
</blockquote>
<p>Microsoft Teams Direct Routing is a feature of Microsoft Teams that allows organizations to connect their existing telephony infrastructure to Microsoft Teams, enabling users to make and receive calls through the Microsoft Teams interface. Essentially, it allows you (and your users) to use Microsoft Teams as a unified communication platform for both internal and external communications, including traditional phone calls.</p>
<p>When setting up Microsoft Teams Direct Routing, you need a Session Border Controller (SBC) to manage the connection between your on-premises telephony infrastructure and the Microsoft Phone System. The SBC acts as an intermediary, handling the translation of signaling and media between the two systems, ensuring seamless communication.</p>
<p><a target="_blank" href="https://www.jambonz.org/">jambonz</a> is an open-source voice platform that is widely deployed by CX/AI, Voice/AI, and CPaaS vendors.  It includes SBC functionality and  integrates your existing voice infrastructure, such as SIP trunks, with Microsoft Teams; enabling features like inbound and outbound calling, voicemail, and emergency calling within the Microsoft Teams environment.</p>
<p>In simpler terms, Microsoft Teams Direct Routing with jambonz allows you to leverage your existing phone infrastructure while still enjoying the collaborative and communication features of Microsoft Teams. It's a way to bring together the best of both worlds: the familiarity and reliability of traditional telephony with the modern capabilities of a unified communication platform like Microsoft Teams.</p>
<h2 id="heading-the-jambonz-setup">The jambonz Setup</h2>
<p>To ensure a seamless demonstration, we've established a jambonz environment using a single EC2 instance (a "jambonz mini" system) hosted on Amazon Web Services (AWS) in the eu-central-1 region within a single dedicated VPC.</p>
<blockquote>
<p>Note: We’ve used our custom AWS CloudFormation templates which have been designed to meet our specific requirements and code guidelines. While these templates differ from the standard jambonz repository, the end-product remains identical.</p>
<p>Note: In most of the verbiage below, we have replaced our actual domain name with 'contoso' in order to avoid attracting traffic from web crawlers.  You'll still see the actual domain name in some of the images, but hopefully you get the idea.</p>
</blockquote>
<p>We’ve setup our jambonz server with the following domain name, eu-central-jbz-teams.contoso.app. This domain is only used for accessing the Jambonz web portal and has no further use for getting Microsoft Teams Direct Routing up and running. The reason we’ve included the eu-central in the URI is because we like to know in which region our servers are running and this will help when we introduce Microsoft Teams Direct Routing with jambonz in multiple regions as a global proposition.</p>
<h2 id="heading-certificates-and-dns">Certificates and DNS</h2>
<p>As mentioned above, we will be using a different domain for the Microsoft Teams Direct Routing functionality. This is both true for jambonz as well as Microsoft Teams (Microsoft 365). We’ve will be using the following domain name for this demo, <code>teams.contoso.com</code> with a matching wildcard certificate so that we can create subdomains for individual customers ("tenants" in Teams lingo).</p>
<p>The first thing we need to do is to add the FQDN of our jambonz server to our DNS-server as an A-record. In our case that will be eu-central-sbc001a.contoso.com.</p>
<p><strong>Example of a possible (future) multi-region high-availability approach and FQDN’s for the first tenant that is operating in Europa and North-America:</strong></p>
<ul>
<li><p>eu-central-sbc001a.teams.contoso.com (1st jambonz SIP-SBC for calls in Europe)</p>
</li>
<li><p>eu-central-sbc001b.teams.contoso.com (2nd jambonz SIP-SBC for calls in Europe)</p>
</li>
<li><p>us-west-sbc001a.teams.contoso.com (1st jambonz SIP-SBC for calls in North-America)</p>
</li>
<li><p>us-west-sbc001b.teams.contoso.com (2nd jambonz-SIP-SBC for calls in North-America)</p>
</li>
</ul>
<p><strong>Example of a possible (future) multi-region high-availability approach and FQDN’s for the second tenant that is operating in Europa, North-America and China</strong>:</p>
<ul>
<li><p>eu-central-sbc002a.teams.contoso.com (1st jambonz SIP-SBC for calls in Europe)</p>
</li>
<li><p>eu-central-sbc002b.teams.contoso.com (2nd jambonz SIP-SBC for calls in Europe)</p>
</li>
<li><p>us-west-sbc002a.teams.contoso.com (1st jambonz SIP-SBC for calls in North-America)</p>
</li>
<li><p>us-west-sbc002b.teams.contoso.com (2nd jambonz-SIP-SBC for calls in North-America)</p>
</li>
<li><p>ap-east-sbc002a.teams.contoso.com (1st jambonz SIP-SBC for calls in China)</p>
</li>
<li><p>ap-east-sbc002b.teams.contoso.com (2nd jambonz SIP-SBC for calls in China)</p>
</li>
</ul>
<blockquote>
<p>It’s important to remember that the FQDN’s need to be unique. A FQDN can only be used once in all of the Microsoft 365 tenants. That’s why it’s important to define a FQDN-strategy beforehand and stick to it because changing it afterwards will impact the tenants (customers). So, using this approach might work for you, you could use customer names, or whatever works for you.</p>
<p>It’s also worth mentioning that tenants (customers) are adding these FQDN’s to their Microsoft 365 tenant which you do not have access to, nor have any control over. We strongly advise against reusing these FQDN’s as you cannot remove them from their Microsoft 365 tenant’s domain list only from your own DNS-servers. And sooner or later this will cause issues.</p>
</blockquote>
<h2 id="heading-configure-drachtio-server">Configure drachtio server</h2>
<p>Before we can communicate properly with the Microsoft Team Direct Routing SIP proxy we need to configure the <a target="_blank" href="https://drachtio.org">drachtio server</a> component of jambonz to accept SIP traffic over TLS from MS Teams. </p>
<p>You will need to complete two tasks:</p>
<ol>
<li>configure drachtio server to listen on port 5061 for sip over tls</li>
<li>configure drachtio to know where on the server the TLS certificates are installed.</li>
</ol>
<p>Since we are running jambonz on AWS using EC2 instances, we'll edit the drachtio.service systemd file to add an additional contact for sip over tls.  While you're at it, if you want to support webrtc clients sending sip over websockets, you can add that additional contact as well.  An example drachtio service file configured to listen for both sip over tls and wss looks like this:</p>
<pre><code>[Unit]
Description=drachtio
After=syslog.target network.target local-fs.target

[Service]
; service
Type=forking
ExecStartPre=<span class="hljs-regexp">/bin/</span>sh -c <span class="hljs-string">'systemctl set-environment LOCAL_IP=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4`'</span>
ExecStartPre=<span class="hljs-regexp">/bin/</span>sh -c <span class="hljs-string">'systemctl set-environment PUBLIC_IP=`curl -s http://169.254.169.254/latest/meta-data/public-ipv4`'</span>
ExecStart=<span class="hljs-regexp">/usr/</span>local/bin/drachtio --daemon \
--contact sip:${LOCAL_IP};transport=udp --external-ip ${PUBLIC_IP} \
--contact sips:${LOCAL_IP};transport=tls --external-ip ${PUBLIC_IP} \
--contact sips:${LOCAL_IP}:<span class="hljs-number">8443</span>;transport=wss --external-ip ${PUBLIC_IP} \
--contact sip:${LOCAL_IP};transport=tcp --external-ip ${PUBLIC_IP} \
 --address <span class="hljs-number">0.0</span><span class="hljs-number">.0</span><span class="hljs-number">.0</span> --port <span class="hljs-number">9022</span> --homer <span class="hljs-number">172.20</span><span class="hljs-number">.10</span><span class="hljs-number">.193</span>:<span class="hljs-number">9060</span> --homer-id <span class="hljs-number">10</span> --homer <span class="hljs-number">172.20</span><span class="hljs-number">.10</span><span class="hljs-number">.193</span>:<span class="hljs-number">9060</span> --homer-id <span class="hljs-number">10</span>
..
</code></pre><p>For the second task, telling drachtio where to find the TLS certificates, edit the /etc/drachtio.conf.xml file and enter the <code>key-file</code>, <code>cert-file</code>, and <code>chain-file</code> properties under the <code>tls</code> tag, as described <a target="_blank" href="https://drachtio.org/docs/drachtio-server#tls">here</a>.</p>
<p><em>Checking drachtio with openssl</em></p>
<p>Before proceeding further it's a good idea to test that basic TLS connectivity is working.</p>
<pre><code class="lang-bash">openssl s_client -connect &lt;URL or IP&gt;:&lt;port&gt;
openssl s_client -connect eu-central-sbc001a.teams.contoso.com:5061
</code></pre>
<h2 id="heading-configure-jambonz">Configure jambonz</h2>
<p>We are using the default out-of-the-box settings for Jambonz. This means our “Settings” configuration page look like this:</p>
<ul>
<li><p><strong>Domain Name</strong>: eu-central-jbz-teams.contoso.app</p>
</li>
<li><p><strong>Sip Domain Name</strong>: sip.eu-central-jbz-teams.contoso.app</p>
</li>
<li><p><strong>Monitoring Domain Name</strong>: grafana.eu-centreal-jbz-teams.contoso.app</p>
</li>
</ul>
<p>On the <em>Service Provider</em> tab of the <em>Settings</em> configuration page we’ve ticked the box to enable Microsoft Teams Direct Routing and entered the following SBC domain name:</p>
<ul>
<li>teams.contoso.com</li>
</ul>
<p>After saving ticking the box and saving, we now see the <em>MS Teams Tenants</em> menu item on the lefthand-side of the jambonz portal, under the <em>BYO Services</em> section. </p>
<p>On the <em>MS Teams Tenants</em> configuration page we’ve added a Microsoft Teams Tenant with the following settings:</p>
<ul>
<li><p><strong>Domain Name</strong>: eu-central-sbc001a.teams.contoso.com</p>
</li>
<li><p><strong>Account</strong>: default</p>
</li>
<li><p><strong>Application</strong>: hello world</p>
</li>
</ul>
<blockquote>
<p>These setting will work for our demo because we only have one tenant. However, if we wanted to scale to multiple tenants we would need to start defining accounts and applications - one jambonz account for every tenant (customer).  Each jambonz customer can have its own MS Teams tenant entry.</p>
</blockquote>
<h2 id="heading-the-carrier-microsoft-tenant-setup">The Carrier Microsoft Tenant Setup</h2>
<p>Now we can finally head of to the Microsoft 365 Admin Center of the “carrier tenant”. In almost all cases this is you, the owner of the jambonz environment where you will configure Microsoft Teams Direct Routing. As owner of the Microsoft 365 Tenant we need to add the “base domain” to our list of domain names.</p>
<blockquote>
<p>We do not need to setup any services like Microsoft Exchange or Microsoft Teams. In this screenshot you will see we also have contoso.com added to the list of domains, this is not required.</p>
<p>Note: The “base domain” is not the same thing as your Microsoft 365 tenants' default domain.</p>
</blockquote>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723470713830/71b7526b-55ac-4881-a4ee-af9b3388293a.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-the-licensed-service-account">The Licensed Service Account</h3>
<p>Unfortunately, simply adding the domain to our carrier tenant won’t actually do anything useful. The domain needs to be “activated”. This basically comes down to creating a service account and assigning a license to that service account. Because we like things to be free, we’ve opted to use a <strong>Microsoft Teams Phone Resource Account</strong> license which we bought through the Microsoft 365 Admin Center.</p>
<p>The easiest way to correctly create the service is account is directly from the Microsoft Teams Admin Center. It is important to note that the service account must be created in the “base domain”. In our case the UPN of the service account looks like this:</p>
<ul>
<li>sa-sbc@teams.contoso.com</li>
</ul>
<p>After successful creation of the service account we’ve assigned the license to the service account using the Microsoft 365 Admin Center.</p>
<blockquote>
<p>Keep in mind that acquiring and assigning licenses, and seeing their effects and outcomes in all of the Microsoft admin centers, may take considerable time. In rare cases this can take up to 24 hours.</p>
</blockquote>
<h3 id="heading-the-customer-microsoft-tenant-setup">The Customer Microsoft Tenant Setup</h3>
<p>At this point we can “instruct” the client to setup their Microsoft 365 tenant. To do this they must add the FQDN of our jambonz Server, eu-central-sbc001a.teams.contoso.com, to their domain list. When doing so, we the carrier, must add the required TXT-record to allow adding this domain to the customers Microsoft 365 tenant to our DNS-server.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723470745735/6ff84351-8c30-4459-92aa-aceebaf30a86.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1723470776371/c3299f59-c68f-438a-920b-2a497650fdaf.png" alt class="image--center mx-auto" /></p>
<blockquote>
<p>Once again we do not need to setup any services like Microsoft Exchange or Microsoft Teams to make Microsoft Teams Direct Routing work. We can safely remove the TXT-record from our DNSserver after the FQDN of the jambonz Server has been addded to the customers' domain list.</p>
</blockquote>
<h3 id="heading-the-licensed-service-account-again">The Licensed Service Account (Again)</h3>
<p>The domain needs to be “activated” in the Microsoft 365 customer tenant as well. This is exactly the same as for Microsoft 365 carrier tenant. It’s up to the client how they want to do that, but the we recommend doing what we did with the Microsoft 365 carrier tenant. In our case the UPN of the service account looks like this:</p>
<ul>
<li>sa-sbc@service@eu-central-sbc001a.teams.contoso.com</li>
</ul>
<p>After successful creation of the service account we’ve assigned the license to the service account using the Microsoft 365 Admin Center.</p>
<blockquote>
<p>Keep in mind that acquiring and assigning licenses, and seeing their effects and outcomes in all of the admin centers, may take considerable time. In rare cases this can take up to 24 hours.</p>
</blockquote>
<h3 id="heading-configure-microsoft-teams">Configure Microsoft Teams</h3>
<p>Before we can make calls using the Microsoft Teams client we need to license a regular user account with an eligble Microsoft Teams license. We’ve opted to go with a <strong>Microsoft Teams Phone Standard</strong> license. For this demo we’re using a trial license. We’ve assigned the license to the regular user account we’re going to use for this demo from the Microsoft 365 admin center.</p>
<p>To speed things up we’ve run a few PowerShell commands to enable Microsoft Teams Enterprise Voice for the user account and assigned a phone number that we, the carrier, own to the user account.</p>
<blockquote>
<p>Before we continue we must first create a PSTN Usage Record. You can do this from the Microsoft Teams admin center Direct Routing section. For this demo we’ve named it “Jambonz”.</p>
</blockquote>
<pre><code class="lang-text">[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

#Install and Import Module
Install-Module MicrosoftTeams
Import-Module MicrosoftTeams

#Connect to Microsoft Teams
Connect-MicrosoftTeams

#Set Microsoft Teams Enterprise Voice with Voice Routes
New-CsOnlineVoiceRoute -Name "Jambonz Voice Route" -OnlinePstnGatewayList @{add="eu-central-sbc001a.teams.contoso.com"} -Priority 1 -OnlinePstnUsages "Jambonz
New-CSOnlineVoiceRoutingPolicy "Jambonz Routing Policy" -OnlinePstnUsages "Jambonz"
Set-CsPhoneNumberAssignment -Identity "peter.mijnster@onetribe.nl" -EnterpriseVoiceEnabled $true
Set-CsPhoneNumberAssignment -Identity "peter.mijnster@onetribe.nl" -PhoneNumber "+31101234567" -PhoneNumberType DirectRouting
Grant-CsVoiceRoutingPolicy -Identity "peter.mijnster@onetribe.nl" -PolicyName "Jambonz Routing Policy"
</code></pre>
<p>What this does is install the appropriate Microsoft PowerShell modules, importing them, and connecting to Microsoft Teams. The next section creates a Microsoft Teams Voice Route with the aforementioned FQDN and a very relaxed number pattern. It also adds the “Jambonz” PSTN Usage Record to the Microsoft Teams Voice Route which is called “Jambonz Voice Route”.</p>
<p>In the next section we enable Microsoft Teams Enterprise Voice for the user Peter Mijnster (Hi there) together with a phone number.</p>
<h2 id="heading-were-done">We're Done!</h2>
<p>So, we’re pretty much done. Only thing left is testing the that voice calls actually get handled by jambonz. We can use “pm2 logs” and “/var/log/drachtio/drachtio.log” on the Jambonz server to verify connectivity and also troubleshoot if things don’t work as expected.</p>
<h2 id="heading-whats-left">What’s Left?</h2>
<p>To make a call to an actual phone number and not the hello world application we need to develop some custom applications. We’ve got two example JSON-scripts (Node-RED) on what might work for you.</p>
<ul>
<li><p><a target="_blank" href="https://gist.github.com/davehorton/94b64b8bafcfdf8779d256b292e644bf">inbound</a></p>
</li>
<li><p><a target="_blank" href="https://gist.github.com/davehorton/b0c48873cd2fb13a55d8d6bfcbf38d1d">outbound</a></p>
</li>
</ul>
<p>If you would like more information on jambonz please email us at support@jambonz.org or <a target="_blank" href="https://joinslack.jambonz.org">join our Slack channel</a>.</p>
<p>And finally, please visit our valued customers if you are looking for solutions:</p>
<ul>
<li><a target="_blank" href="https://www.soundofdata.com/">soundofdata</a> - Your partner in customer service connectivity. We transform the accessibility of your customer service. Anywhere in the world, through any channel. Automated where possible, personalized where necessary.</li>
<li><a target="_blank" href="https://www.callable.io/">Callable.io</a> - Disuptive wholesale CPaaS: Callable's mission is to give resellers the tools to supercharge their communications portfolio enabling increased sales and customer retention.</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Using jambonz for telephony integration with Retell AI]]></title><description><![CDATA[Note: If you prefer learning visually, watch the youtube video.

Lately, we've noticed a few Retell AI users have been finding their way to the jambonz open source voice gateway project in order to reduce their telephony costs by using our Bring your...]]></description><link>https://blog.jambonz.org/using-jambonz-for-telephony-integration-with-retell-ai</link><guid isPermaLink="true">https://blog.jambonz.org/using-jambonz-for-telephony-integration-with-retell-ai</guid><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Sun, 21 Jul 2024 20:04:33 GMT</pubDate><content:encoded><![CDATA[<blockquote>
<p>Note: If you prefer learning visually, watch the <a target="_blank" href="https://youtu.be/QQxt6eVUDHU">youtube video</a>.</p>
</blockquote>
<p>Lately, we've noticed a few <a target="_blank" href="https://www.retellai.com/">Retell AI</a> users have been finding their way to the <a target="_blank" href="https://jambonz.org">jambonz open source voice gateway</a> project in order to reduce their telephony costs by using our Bring your own carrier (BYOC) feature.</p>
<p>Specifically, these folks were searching for a way to free themselves from Twilio "lock-in". Twilio's price model, which features <a target="_blank" href="https://help.twilio.com/articles/223132307">costly per-minute rounding of calls</a> <strong>plus</strong> added Voice API costs <strong>plus</strong> added Streaming costs results in a high bundled cost that makes the Voice-AI business proposition simply untenable for much of the served market. Luckily, there is a solution - jambonz to the rescue!</p>
<p>So, to all you Retell AI users, welcome aboard! In this blog post we get you up and running on jambonz for free, and eliminate those nasty costs. Try us out! With jambonz, you can:</p>
<ul>
<li><p>connect any sip trunking provider (or PBX, or SBC) to jambonz, bypassing twilio and those associated added costs</p>
</li>
<li><p>use jambonz's bi-directional streaming feature to integrate directly with Retell AI</p>
</li>
<li><p>use phone numbers that you have from your carriers and provision them directly into jambonz</p>
</li>
<li><p>enjoy features like barge-in and call transfer</p>
</li>
<li><p>run it all on <a target="_blank" href="https://jambonz.cloud/">our cloud</a> or <a target="_blank" href="https://blog.jambonz.org/installing-jambonz-using-aws-marketplace">self-host</a> your own jambonz instance on AWS for enhanced privacy and control.</p>
</li>
</ul>
<p>Let's get started!</p>
<h2 id="heading-sign-up-for-free-jambonz-account">Sign up for free jambonz account</h2>
<p>The easiest way to get started is to create a free jambonz trial account on <a target="_blank" href="https://jambonz.cloud/register">jambonz.cloud</a>. Try it for free for 3 weeks and if you like it you can either continue on a paid plan, or use the AWS marketplace offering to deploy your own jambonz instance in your AWS account. (Contact us for details about deploying a clustered solution if you need to support large call volumes).</p>
<h2 id="heading-configure-your-carrier-and-phone-number">Configure your carrier and phone number</h2>
<p>Once you've created a jambonz account, you are able to add one or more Carriers / SIP trunking providers in the jambonz portal. In the example below, I actually use an elastic SIP trunk from Twilio but you can use any SIP provider or PBX - we integrate with anything that can send us VoIP traffic using SIP.</p>
<p>When you first log in to your new account you won't yet have any carriers defined.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721518653741/8d5c7ca2-3dcc-440f-ad2a-3f3d294c4fd3.png" alt class="image--center mx-auto" /></p>
<p>Click on "Add carrier" to add a new carrier. This screen will appear:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721518726272/92adc12a-a009-4045-b42d-c268ba9a1b21.png" alt class="image--center mx-auto" /></p>
<p>The main thing you are going to need to configure when adding a Carrier is the IP addresses of their sip gateways that will be sending us calls. In addition, you will add the IP addresses or DNS name of their SIP gateways that we can send outbound calls to. You can also see on the screen above that we show you our sip signaling address, because you will need to configure that in the carrier / SBC / PBX that will be sending us the calls.</p>
<p>You'll also note we have a dropdown of preconfigured carriers. If you are using one of these carriers, just select it from the dropdown and the IP addresses will be pre-populated for you. Otherwise fill them in as show below. In my example, I am actually using a twilio elastic sip trunk as my carrier.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721519011467/6b706640-efca-4147-aea5-0712fb525835.png" alt class="image--center mx-auto" /></p>
<p>There are quite a few advanced options you can play with when configuring a carrier as well if you need to, but for now we will stick with the basic case.</p>
<p>Once you have configured your carrier, get a phone number / DID from your carrier and then add it under Phone numbers in the jambonz portal</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721519356880/7bb8f4d6-9167-407b-893b-4b2cbc530f32.png" alt class="image--center mx-auto" /></p>
<p>For now, leave the application dropdown at "Choose application" and hit Save. We'll create the application that integrates with Retell AI in the next step, then come back and update the phone number to point to it.</p>
<h2 id="heading-run-a-jambonz-application-that-connects-to-retell-ai">Run a jambonz application that connects to Retell AI</h2>
<p>You control jambonz by either webhooks or websockets. In this case, we are going to run a webhook application that will provide instructions to jambonz, but the app will also accept websocket connections from jambonz with the streaming audio (in jambonz, this is done using the <a target="_blank" href="https://www.jambonz.org/docs/webhooks/listen/">listen verb</a>).</p>
<p>For this example, I will run the application locally on my laptop and use ngrok to provide a tunnel that allows jambonz to connect.</p>
<p>First, check out and install the application.</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> https://github.com/jambonz/retellai-audio-socket.git
<span class="hljs-built_in">cd</span> retellai-audio-socket
npm ci
</code></pre>
<p>Now edit the ecosystem.config.js file to add information about your jambonz account and your retell agent. Before editing, the file looks like this:</p>
<pre><code class="lang-js"><span class="hljs-built_in">module</span>.exports = {
  <span class="hljs-attr">apps</span> : [{
    <span class="hljs-attr">name</span>: <span class="hljs-string">'retellai-shim'</span>,
    <span class="hljs-attr">script</span>: <span class="hljs-string">'app.js'</span>,
    <span class="hljs-attr">instance_var</span>: <span class="hljs-string">'INSTANCE_ID'</span>,
    <span class="hljs-attr">exec_mode</span>: <span class="hljs-string">'fork'</span>,
    <span class="hljs-attr">instances</span>: <span class="hljs-number">1</span>,
    <span class="hljs-attr">autorestart</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">watch</span>: <span class="hljs-literal">false</span>,
    <span class="hljs-attr">max_memory_restart</span>: <span class="hljs-string">'1G'</span>,
    <span class="hljs-attr">env</span>: {
      <span class="hljs-attr">NODE_ENV</span>: <span class="hljs-string">'production'</span>,
      <span class="hljs-attr">LOGLEVEL</span>: <span class="hljs-string">'info'</span>,
      <span class="hljs-attr">HTTP_PORT</span>: <span class="hljs-number">3000</span>,
      <span class="hljs-attr">JAMBONZ_ACCOUNT_SID</span>: <span class="hljs-string">'your_account_sid'</span>,
      <span class="hljs-attr">JAMBONZ_API_KEY</span>: <span class="hljs-string">'your_api_key'</span>,
      <span class="hljs-attr">JAMBONZ_REST_API_BASE_URL</span>: <span class="hljs-string">'https://jambonz.cloud/api/v1'</span>, 
      <span class="hljs-attr">RETELL_API_KEY</span>: <span class="hljs-string">'your_retell_api_key'</span>,
      <span class="hljs-attr">RETELL_AGENT_ID</span>: <span class="hljs-string">'your_retell_agent_id'</span>,
      <span class="hljs-attr">WS_URL</span>: <span class="hljs-string">'wss://your_ngrok_or_other_domain_where_this_app_is_running'</span>,
    }
  }]
};
</code></pre>
<p>You can find your jambonz account sid and api key by clicking Account on the lefthandside menu in the portal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721520074455/6983fc61-2723-4cf2-863c-95e96f3303bc.png" alt class="image--center mx-auto" /></p>
<p>scroll down to find the api keys, then:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721520093963/68ca3f33-cc1e-4d71-bab0-77b029c45a31.png" alt class="image--center mx-auto" /></p>
<p>Update the ecosystem file with these values. The <strong>JAMBONZ_REST_API_BASE_URL</strong> variable is already correctly populated with the url for jambonz.cloud, so you can leave it as is.</p>
<p>The <strong>WS_URL</strong> environment variable should be populated with the URL for the endpoint that will serve this application. Since I am going to be using ngrok and have an ngrok domain of <a target="_blank" href="http://jambonz-apps.drachtio.org">jambonz-apps.drachtio.org</a> I set it as:</p>
<pre><code class="lang-bash">WS_URL: <span class="hljs-string">'wss://jambonz-apps.drachtio.org'</span>,
</code></pre>
<p>You, of course, will replace this with your own domain or other public URL that is reachable from jambonz.cloud.</p>
<p>The remaining environment variables are fairly self-explanatory.</p>
<h2 id="heading-your-retell-ai-agent">Your Retell AI agent</h2>
<p>I am using a basic retell single-notification agent ("Anna from retell towing"). The prompt is as follows</p>
<pre><code class="lang-text">## Identity
You are Anna, a phone agent who is responsible for informing the other person that their vehicle has been towed by Retell Towing Company.

## Background for Retell Towing Company
Location: 213 2nd Street
Business hour: Monday to Friday, 9am-5pm.
Towing fee: $600.
Vehicle parking fee: $50 per day.

## Style Guardrails
Be conversional. Use everyday language to create a cozy and friendly vibe in the conversation. 
Be empathetic. The customer might be frustrated when hearing the news, so use empathetic words when delivering the news. If the customer is frustrated or upset, be sure to speak with a soothing tone and offer comforting words.

## Steps
1. Ask if this is a good time to talk.
  - if not, call function end_call to hang up and say will call back later.
2. Ask if user is the from owner of the white Tesla with plate number 12345 parking on the Main street.
  - if not, call function end_call to hang up and say sorry for the confusion.
3. Inform user that their Tesla has been towed, please come and collect ASAP to avoid any additional parking fee. Tell user about your business hour and location.
4. Ask if user has any questions, and if so, answer them until there are no questions.
  - If user asks something you do not know, let them know you don't have the answer. Ask them if they have any other questions.
  - If user do not have any questions, call function end_call to hang up.

Finally, if the user asks to speak to an agent at any time, or becomes very angry, transfer them to an agent using the call_transfer tool. The agent phone number is 16176354500 and you should use the SIP REFER method to transfer the call.
</code></pre>
<h3 id="heading-call-transfer">Call transfer</h3>
<p>I've added a simple custom function to do call transfer, instructing Retell AI to send an HTTP POST to the application with the details like the agent phone number when the user requests transfer to an agent. (For those of you familiar with SIP, we support either the SIP REFER method for transferring a call or a SIP INVITE).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721520951502/b5f7255f-2177-4efa-a349-51a9663b918e.png" alt class="image--center mx-auto" /></p>
<p>Note again that the url is my ngrok domain where I am running the app, with the path "/transfer-requested". The JSON schema for the parameters is as follows:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"type"</span>: <span class="hljs-string">"object"</span>,
  <span class="hljs-attr">"properties"</span>: {
    <span class="hljs-attr">"agent_number"</span>: {
      <span class="hljs-attr">"type"</span>: <span class="hljs-string">"string"</span>,
      <span class="hljs-attr">"description"</span>: <span class="hljs-string">"The phone number of the agent"</span>
    },
    <span class="hljs-attr">"use_sip_refer"</span>: {
      <span class="hljs-attr">"type"</span>: <span class="hljs-string">"boolean"</span>,
      <span class="hljs-attr">"description"</span>: <span class="hljs-string">"if true use a SIP REFER to transfer the call; otherwise use a SIP INVITE"</span>
    },
    <span class="hljs-attr">"calling_number"</span>: {
      <span class="hljs-attr">"type"</span>: <span class="hljs-string">"string"</span>,
      <span class="hljs-attr">"description"</span>: <span class="hljs-string">"The calling party number to use on the call transfer; do no supply unless explicitly directed to"</span>
    }
  },
  <span class="hljs-attr">"required"</span>: [
    <span class="hljs-string">"agent_number"</span>,
    <span class="hljs-string">"use_sip_refer"</span>
  ]
}
</code></pre>
<p>At this point I can run my application locally and expose a public endpoint using ngrok.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># in one terminal</span>
ngrok http --domain jambonz-apps.drachtio.org 3000
</code></pre>
<pre><code class="lang-bash"><span class="hljs-comment"># in second terminal</span>
pm2 start ecosystem.config.js
</code></pre>
<p>That's it! Now I can call my phone number and be connected to my Retell AI agent.</p>
<p>Feel free to try it out. If you want to learn more about jambonz or ask questions of the community, join our Slack channel by going to <a target="_blank" href="https://joinslack.jambonz.org">joinslack.jambonz.org</a> or email us at support@jambonz.org.</p>
<p>Enjoy!</p>
]]></content:encoded></item><item><title><![CDATA[Text-to-speech latency: the jambonz leaderboard]]></title><description><![CDATA[The emergence of AI and Large Language Models (LLMs) onto the tech landscape promises to reshape everything: how we work, how we play, and how we engage with others. Of course - let's be honest: not much of that has happened yet. Someday we'll surely...]]></description><link>https://blog.jambonz.org/text-to-speech-latency-the-jambonz-leaderboard</link><guid isPermaLink="true">https://blog.jambonz.org/text-to-speech-latency-the-jambonz-leaderboard</guid><category><![CDATA[rimelabs]]></category><category><![CDATA[deepgram]]></category><category><![CDATA[whisper]]></category><category><![CDATA[Google]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[tts]]></category><category><![CDATA[elevenlabs]]></category><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Mon, 15 Apr 2024 14:36:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1713184593452/9693e146-c295-46de-8640-b85af67ab2c7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The emergence of AI and Large Language Models (LLMs) onto the tech landscape promises to reshape everything: how we work, how we play, and how we engage with others. Of course - let's be honest: not much of that has happened yet. Someday we'll surely experience the "sonic boom" moment when the actual rate of progress catches up to the hype, but sorry folks, we're not there yet.</p>
<p>Instead, the most notable impact to date has been the refocusing of huge amounts of private and public capital into any and all product categories thought to either benefit from or drive AI technologies. Those of us laboring to make our daily bread in the CX/AI space find ourselves the lucky beneficiaries of this effect. We get to play with new speech technologies developed by startup companies newly flush with VC cash and eager to brag about how many NVIDIA GPUs they bought over the weekend. For those of us working adjacent to them and out of the VC spotlight, it's like eating at the high school table with the rich kids who suddenly and inexplicably want to share their nicely packaged lunches with us.</p>
<p>I'll be honest, the sprouting-up of new text-to-speech (TTS) vendors that we've seen over the past year or so was not something I expected because, quite frankly, I thought your dad's Google TTS and Microsoft TTS were pretty damn fine, not to mention that the investment theme is lost on me when a market already has close to commodity-level pricing. Oh well, that just goes to show what I know.</p>
<p>In our upcoming jambonz 0.9.0 release we've added support for TTS services from a bunch of these sassy newcomers that want to challenge the giants, and we thought it would be a good time to put them to test. What age-old story are we going to see here: the new upstarts disrupting the failing dinosaurs? Or would it be the well-heeled Daddy Warbucks incumbents quashing the neophytes? Let's find out!</p>
<h1 id="heading-introducing-your-contestants">Introducing your contestants!</h1>
<p>The <a target="_blank" href="https://jambonz.org">jambonz</a> open source voice gateway for CX/AI providers has been widely adopted by many CX/AI providers, including the leading vendors in the space. Our "bring your own everything" design enables customers to connect their preferred carriers and speech vendors and so we have always made it our mission to give our customers the broadest selection of speech vendors for both text-to-speech and speech-to-text. </p>
<p>As well, we strive to give customers detailed insights into the behavior of their service through an open telemetry observability framework that reports on data such as time-to-first-byte for TTS requests.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713182862647/e6ad82c3-4356-4dc6-bd21-65cf22e298bb.png" alt class="image--center mx-auto" /></p>
<p>In upcoming release 0.9.0 we added several new vendors for text-to-speech, and we've also made an effort to support streaming APIs where available to reduce the latency experienced by users, so it seemed like a good time to do some benchmark testing and establish a leaderboard. In our testing we compared:</p>
<ul>
<li><p>Deepgram</p>
</li>
<li><p>Elevenlabs</p>
</li>
<li><p>Google <sup>*</sup></p>
</li>
<li><p>Microsoft</p>
</li>
<li><p>PlayHT</p>
</li>
<li><p>Rime Labs</p>
</li>
<li><p>Whisper</p>
</li>
</ul>
<blockquote>
<p><sup>*</sup>With all other vendors we measured time-to-first-byte; however with Google we were forced to measure time-to-last-byte as we have not implemented a proper streaming API integration for them (yet).</p>
</blockquote>
<h1 id="heading-the-testbed">The testbed</h1>
<p>We ran the tests using a jambonz server running in AWS us-east-1 region on a single EC2 t2-medium instance. We ran against the hosted SaaS service for each of the vendors. We tested two different scenarios, both common to conversational AI:</p>
<ul>
<li><p>a very short user prompt (e.g., "Hello and thank you for calling. How can I assist you today?"),</p>
</li>
<li><p>and a slightly longer prompt that a caller might commonly encounter (e.g., "It seems like you're having trouble logging into your account. For security reasons, please provide the email address associated with your account. Once verified, I will send a password reset link directly to your email. Alternatively, say 'help' for more assistance.")</p>
</li>
</ul>
<p>We tested 5 variations of short and long prompts on each TTS engine, using English language:</p>
<p><em>short prompts</em></p>
<ul>
<li><p>Hello and thank you for calling. How can I assist you today?</p>
</li>
<li><p>Please hold while I transfer you to a customer service representative.</p>
</li>
<li><p>I'm sorry, I didn't catch that. Could you please repeat your request?</p>
</li>
<li><p>Your current balance is $347.92. Would you like to make a payment now?</p>
</li>
<li><p>Thank you for your patience. A representative will be with you shortly.'</p>
</li>
</ul>
<p><em>longer prompts</em></p>
<ul>
<li><p>Thank you for calling our customer support line. To better assist you, please state the reason for your call, such as 'billing', 'technical support', or 'account information'. You can also say 'more options' to hear additional services.</p>
</li>
<li><p>You have indicated that you are calling about a billing issue. If you would like to proceed with a payment, please say 'Make a payment'. If you need details about your last transaction or have a billing dispute, please say 'Billing details'</p>
</li>
<li><p>It seems like you're having trouble logging into your account. For security reasons, please provide the email address associated with your account. Once verified, I will send a password reset link directly to your email. Alternatively, say 'help' for more assistance.</p>
</li>
<li><p>Our records show that your warranty is due to expire in 30 days. To extend your warranty for another year, please say 'Extend warranty'. If you would like to know the benefits of extending your warranty, please say 'Explain benefits'.</p>
</li>
<li><p>If you are calling to update your personal information, such as address or phone number, please clearly state the new information after the beep. For any changes to sensitive data, such as your password or payment methods, please ensure you have your security pin ready.</p>
</li>
</ul>
<p>In all cases (except Google, as described above) we measured the time from sending the request to the service to receiving the first byte of audio. We give more details on the configuration of each TTS service later in this blog post.</p>
<h1 id="heading-results">Results</h1>
<p>Before we review the results, there is one additional subtlety to be aware of when measuring latency.  Here we are measuring time to first byte, which is an important metric.  However, all providers send a small amount of silence at the beginning of generated audio, and that amount we found to differ by provider.  The experience of the user will be the time to first byte <strong>plus</strong> the duration of leading silence.  In our experience, the vendors fell into two categories:</p>
<ul>
<li>those providing very short duration of leading silence; this includes Deepgram (~150 ms), Elevenlabs (~100ms), Microsoft (~150ms), and RimeLabs (~200ms); and</li>
<li>those providing longer duration of leading silence: Google (~600ms), Whisper (~670ms), and PlayHT (~637ms).</li>
</ul>
<p>Keep these in mind as we review the results.</p>
<p>Without further ado, here are the results from the tests using the short audio requests.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713120196606/19869269-736a-47b3-bee6-70c6d1f348fb.png" alt class="image--center mx-auto" /></p>
<p>and here are the results from testing the longer audio segments.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1713120217328/06ed49d5-17ff-4e44-a4f7-817d8e8e841d.png" alt class="image--center mx-auto" /></p>
<p>And here are this detailed data from the tests.</p>
<h4 id="heading-short-audio-tests-time-to-first-byte-ms">short audio tests - time to first byte (ms)</h4>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Prompt</td><td>Deepgram</td><td>Elevenlabs</td><td>Google</td><td>Microsoft</td><td>PlayHT</td><td>RimeLabs</td><td>Whisper</td></tr>
</thead>
<tbody>
<tr>
<td>1</td><td>324</td><td>481</td><td>183</td><td>127</td><td>121</td><td>326</td><td>405</td></tr>
<tr>
<td>2</td><td>348</td><td>451</td><td>173</td><td>345</td><td>61</td><td>248</td><td>649</td></tr>
<tr>
<td>3</td><td>355</td><td>613</td><td>185</td><td>293</td><td>75</td><td>250</td><td>601</td></tr>
<tr>
<td>4</td><td>324</td><td>645</td><td>274</td><td>427</td><td>59</td><td>201</td><td>472</td></tr>
<tr>
<td>5</td><td>355</td><td>471</td><td>192</td><td>318</td><td>50</td><td>187</td><td>470</td></tr>
<tr>
<td><strong>avg.</strong></td><td><strong>341</strong></td><td><strong>532</strong></td><td><strong>201</strong></td><td><strong>302</strong></td><td><strong>73</strong></td><td><strong>242</strong></td><td><strong>519</strong></td></tr>
</tbody>
</table>
</div><h4 id="heading-long-audio-tests-time-to-first-byte-ms">long audio tests - time to first byte (ms)</h4>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Prompt</td><td>Deepgram</td><td>Elevenlabs</td><td>Google</td><td>Microsoft</td><td>PlayHT</td><td>RimeLabs</td><td>Whisper</td></tr>
</thead>
<tbody>
<tr>
<td>1</td><td>460</td><td>771</td><td>450</td><td>293</td><td>67</td><td>642</td><td>600</td></tr>
<tr>
<td>2</td><td>363</td><td>739</td><td>420</td><td>345</td><td>50</td><td>320</td><td>772</td></tr>
<tr>
<td>3</td><td>357</td><td>833</td><td>465</td><td>356</td><td>168</td><td>306</td><td>581</td></tr>
<tr>
<td>4</td><td>435</td><td>1404</td><td>338</td><td>364</td><td>65</td><td>322</td><td>781</td></tr>
<tr>
<td>5</td><td>472</td><td>783</td><td>367</td><td>409</td><td>111</td><td>340</td><td>830</td></tr>
<tr>
<td><strong>avg.</strong></td><td><strong>417</strong></td><td><strong>906</strong></td><td><strong>408</strong></td><td><strong>353</strong></td><td><strong>92</strong></td><td><strong>386</strong></td><td><strong>712</strong></td></tr>
</tbody>
</table>
</div><h1 id="heading-our-findings">Our findings</h1>
<p>Wow! We were not expecting this.</p>
<ul>
<li><p><strong>PlayHT</strong> (avg 73ms short audio/92ms long audio) was the winner by a mile, with blazingly fast results. Sub-100 ms times (what!!??) to first byte is quite astonishing, given that we are including network round-trip time into that measurement. We truly did not expect this, and count us impressed. PlayHT's voices are regarded as high-quality natural-sounding voices and they offer a voice cloning feature as well. However, we experienced some minor audio defects in our testing: PlayHT incorrectly pronounced "$347.92" as "three hundred forty-seven dollars and ninety-two two". However, all the longer prompts played perfectly.   Now, again, this very short time-to-first-byte needs to be factored in with the fact that the audio itself contains a bit more leading silence than some of the other vendors.</p>
</li>
<li><p>Our next biggest surprise was <strong>Google</strong> (201ms/408ms).  We were surprised on two fronts: first, Google had an extra burden in that we were measuring time to last byte instead of first, because we have not yet implemented streaming support for Google in jambonz; and secondly, we are historically used to seeing in the neighborhood of 800ms+ times for Google to synthesize audio. (Keep in mind that Google does return a fair amount of leading silence so head-to-head in overall user experience of latency they would probably fall slightly behind Microsoft, for instance.) 
Something must have changed recently over at Google to deliver these impressive numbers.</p>
</li>
<li><p>One of the new entrants, <strong>RimeLabs</strong> (242ms/386ms) turned in some very fast time as well. And if you calculate in the fact that they return much less leading silence than PlayHT, they provide arguably the fastest user experience.  RimeLabs also has an optional feature to reduce latency even further by skipping the text normalization phase; when we enabled this feature the numbers got even better (217ms/274ms), edging out everyone except PlayHT.  However, they recommend enabling this only on text "where there are no digits, abbreviations, or tricky punctuation" and we found this to be true: when we enabled it our text containing the account balance did not play correctly. Additionally, in our testing, we noted some slight but detectable pauses during longer sentences where they did not belong. And finally, it might be personal preference, but most of the voices seem to lack emotion, as if they are doing a task they are not interested in. If I were calling into a contact center I'd feel like I was talking to a bored gen-X'er who was counting the minutes until they could go off shift.</p>
</li>
<li><p>Reinforcing that the dinosaurs are not dead, <strong>Microsoft</strong> (302ms/353ms) came in with very fast times as well, competitive with the new entrants and raising the question: why change? I guess it turns out that having buckets of money to throw at GPUs is still an advantage. Both Google and Microsoft deliver those perfectly crafted AI voices that are so good that, counter-intuitively, you immediately know that its AI you're talking to and not a real person.</p>
</li>
<li><p><strong>Deepgram</strong>'s (341ms/417ms) new Aura offering was not left behind, turning in some very fast times of its own. There may still be a few issues to work out, as we experienced unnatural pauses once or twice during longer sentences where there was no comma, semi-colon or other indication there should be. Additionally, a spurious soft 'A' was inserted when synthesizing a phase enclosed with single quotes; e.g. "say 'billing' if you have a billing question" is heard as "say ah billing if you have..". </p>
</li>
<li><p><strong>Whisper</strong> (519ms/712ms) from openAI impressed as well. The times are slightly longer than the rest of the field, but the quality was outstanding: the voices sounded great, the speech cadence was perfect and the pronunciations were spot on.  </p>
</li>
<li><p><strong>Elevenlabs</strong> (532ms/906ms) has become well-known for its natural sounding voices and is rapidly becoming the choice of many for that reason. Its times in our test were slightly slower than the rest of the field, but still quite fast overall. We experienced no defects in the generated audio. </p>
</li>
</ul>
<h1 id="heading-summary">Summary</h1>
<p>Our main takeaway is how fast <strong>all</strong> of these vendors are. A year ago, we would have been happy with sub-second results - now we are hungering for, and in some cases getting, ttfb times of less than 100 millseconds. All of these vendors provide great products that are worth evaluating for those planning their CX/AI rollout. We're looking forward to the vendors polishing things like speech cadence and the minor imperfections that we encountered.  </p>
<p>We should note that we are also happy to work directly with any of these vendors to collaborate on testing or on fine-tuning our integrations if necessary to improve performance and overall user experience.  We will update our leaderboard from time to time, and we are always adding support for new vendors so reach out to us if you provide a TTS service and would like to be included in future reports.</p>
<p>Also, feel free to create a free account on the <a target="_blank" href="https://jambonz.cloud">jambonz cloud</a> to try out jambonz!</p>
<h3 id="heading-appendix-notes-on-our-configuration">Appendix: Notes on our configuration</h3>
<p>A few notes on how we configured each speech service.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>vendor</td><td>model</td><td>voice</td></tr>
</thead>
<tbody>
<tr>
<td>Deepgram</td><td>Aura</td><td>Asteria</td></tr>
<tr>
<td>Elevenlabs</td><td>turbo-v2</td><td>Serena</td></tr>
<tr>
<td>Google</td><td></td><td>Wavenet-C</td></tr>
<tr>
<td>Microsoft</td><td></td><td>Ava (multi-lingual)</td></tr>
<tr>
<td>PlayHT</td><td>PLayHT2.0-Turbo</td><td>Jennifer</td></tr>
<tr>
<td>RimeLabs</td><td>Mist</td><td>Abby</td></tr>
<tr>
<td>Whisper</td><td>tts-1</td><td>Alloy</td></tr>
</tbody>
</table>
</div>]]></content:encoded></item><item><title><![CDATA[Improving the voicebot experience]]></title><description><![CDATA[Yes, we can talk to AI. Connecting a speech driven interface to AI is easy. But crafting a conversational experience that approaches the ease and pleasure of a conversation with another human is not.
One of the challenges is that speech recognition s...]]></description><link>https://blog.jambonz.org/improving-the-voicebot-experience</link><guid isPermaLink="true">https://blog.jambonz.org/improving-the-voicebot-experience</guid><category><![CDATA[Cx]]></category><category><![CDATA[AI]]></category><category><![CDATA[conversational-ai]]></category><category><![CDATA[speech to text]]></category><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Thu, 08 Feb 2024 17:29:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1707346499937/bb216986-859b-4cf0-b8cd-8913d3198d6e.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Yes, we can talk to AI. Connecting a speech driven interface to AI is easy. But crafting a conversational experience that approaches the ease and pleasure of a conversation with another human is not.</p>
<p>One of the challenges is that speech recognition systems are not that good at detecting the turn of a conversation. As humans, we're super good at this - we continually process all sorts of cues during a conversation to determine when our partner has finished speaking and we can jump in. For instance:</p>
<ul>
<li><p>s/he asked us a question so we know it's time for us to respond (and think about how easy it is for us to classify a question as rhetorical, in which case we understand it is <em>not</em> for us to respond to),</p>
</li>
<li><p>s/he starts by saying "it's a long story.." and we settle in to wait longer for our turn to speak, or</p>
</li>
<li><p>s/he pauses during her speech, but we know it is a sort of "thinking" or a "bridge" pause and we don't break in ("Yeeeah..(pause)...well, it's not that simple, my friend..")</p>
</li>
</ul>
<p>However, in a conversation with AI, speech recognition services will return a transcript any time the speaker pauses, regardless of whether this is a complete response.  Quite often, conversations get derailed because AI tried to process a partial statement from the speaker and returned a non-sensical response. If on the other hand we simply wait an extra long time to be sure the user has finished speaking, we get a stilted conversation with lots of uncomfortable silence that is even worse.</p>
<h3 id="heading-using-ai-to-help-predict-the-turns-of-the-conversation">Using AI to help predict the turns of the conversation</h3>
<p>What if we used AI to predict the type of response a caller may make to a given statement or question from a voicebot? And what if we then used that prediction to tune the speech recognizer specifically for this turn of the conversation?</p>
<p>This is essentially a text classification operation, which is something that LLMs are really good at. Let's build a simple example using OpenAI's gpt-3.5-turbo model to test with.</p>
<h4 id="heading-streamlit-app">Streamlit app</h4>
<p>Below is a streamlit app that we can use to test out our idea. In this app we are using a few-shot prompt to configure the model to assess a statement from a voicebot and predict the type of response a caller might make. You'll need an OpenAI key to test with, and you can actually modify the prompt and examples to see how it impacts your results.</p>
<p>We ask the model to predict and classify the next response from the caller as one of four types:</p>
<ul>
<li><p>a single utterance</p>
</li>
<li><p>a single sentence</p>
</li>
<li><p>multiple sentences</p>
</li>
<li><p>identification data</p>
</li>
</ul>
<p>The categories are mostly self-explanatory, but the category of identification data needs some explanation. The purpose of this category is to predict when a caller is going to need to do something like give a credit card or customer number, spell their name or email address, etc. In these cases we need to make sure to give some extra time since people will often speak slowly, or pause while they refer to notes they are reading from.</p>
<p>Read the instructions on the first page of the app below, review the prompts and examples and then click on the "Try it out" page and enter a statement or question and see what type of response the model predicts.  You can even change the prompt or the few-shot examples to see if you can improve the results!</p>
<iframe src="https://jambonz-prompt-classsification.streamlit.app//?embed=true" height="800" width="100%"></iframe>

<p>If you are unable to run the application yourself for some reason, here is a screenshot showing a sample query and response:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707354183774/a050664c-6ac3-488b-be1d-082a340ccb9d.png" alt class="image--center mx-auto" /></p>
<h4 id="heading-implementing-in-jambonz">Implementing in jambonz</h4>
<p>For our experiment we'll make a simple change to our jambonz app to consult OpenAI at each turn of the conversation to get a classification of the next expected response. We'll then turn that into a specific configuration command sent to the speech recognizer.</p>
<p>We're using <a target="_blank" href="https://deepgram.com">Deepgram</a> and specifically we'll be tuning their endpointing and utteranceEndMs properties <a target="_blank" href="https://developers.deepgram.com/docs/understanding-end-of-speech-detection#using-utteranceend">as described here</a>.</p>
<ul>
<li><p>If the classification is <code>single utterance</code>, we'll leave the settings at their default, because the default settings have extremely low latency (i.e. transcripts returned very quickly after extremely short pauses)</p>
</li>
<li><p>If the classification is <code>single sentence</code>, we'll set the properties to <code>{endpointing: 500}</code> to set the endpointing to 500 ms,</p>
</li>
<li><p>If the classification is <code>multiple sentences</code>, we'll set the properties to <code>{endpointing: 500, utteranceEndMs: 2000}</code> which sets the endpointing as above and additionally requires 2 second of non-speech before returning a final transcript</p>
</li>
<li><p>If the classification is <code>identification data</code>, we'll also set the properties to the same values as above, since we also need to allow the caller plenty of time to finish entering their customer number, spell their name, or what have you.</p>
</li>
</ul>
<h4 id="heading-results">Results</h4>
<p>The resulting conversation after these changes is much more natural and, as a result, much more effective.  Check out the video below to see the difference before and after the AI tuning approach is implemented</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/DqvmUhE7NaE?si=gmgZuutuED4aCLEm"></iframe>

<h2 id="heading-conclusion">Conclusion</h2>
<p>The advances in AI over the past year has been breathtaking.  However, the quality of speech interactions between humans and AI still lags.  Until problems like detecting turns of conversation are solved, the promise of AI in our everyday lives will be stunted.  This is a hard problem that will take an array of solutions to solve comprehensively, and in this post we have only experimented with a relatively simple approach of using text-based classification to improve the prediction of dialog turns.  We look forward to doing further work on this topic in the future.</p>
]]></content:encoded></item><item><title><![CDATA[Priority queues in jambonz]]></title><description><![CDATA[Overview
Being able to prioritize incoming calls based on its “importance” is an essential feature in many business scenarios. For example, the most common scenario in the financial sector is to skip the whole queue in case a client is calling to blo...]]></description><link>https://blog.jambonz.org/priority-queues-in-jambonz</link><guid isPermaLink="true">https://blog.jambonz.org/priority-queues-in-jambonz</guid><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Tue, 04 Jul 2023 20:58:41 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-overview">Overview</h1>
<p>Being able to prioritize incoming calls based on its “importance” is an essential feature in many business scenarios. For example, the most common scenario in the financial sector is to skip the whole queue in case a client is calling to block the credit card in case of loss.</p>
<p>There were no way to do it in the Jambonz platform until now. Starting from version 0.8.4 the ability to prioritize incoming calls is possible by adding the priority attribute to the queued call. The calls with the highest priority will be connected faster even though there might be other calls awaiting in the queue. Also, this release adds an option to get any queued calls from the queue! What you need is just to know the callSid of the queued call. Isn’t this awesome?</p>
<h1 id="heading-node-red-example">Node-RED example</h1>
<p>Let’s cover a simple example using Node-RED. Node-RED nodes for the Jambonz platform are available under <a target="_blank" href="https://flows.nodered.org/node/@jambonz/node-red-contrib-jambonz">@jambonz/node-red-contrib-jambonz</a> package. Just follow <a target="_blank" href="https://nodered.org/docs/user-guide/runtime/adding-nodes">this guide</a> to install them.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1688503550340/6227c2e5-a92c-4b8f-9cc8-994310aa814e.png" alt class="image--center mx-auto" /></p>
<p>The basics of creating a jambonz application is described in <a target="_blank" href="https://blog.jambonz.org/building-your-first-jambonz-app-using-nodejs">this blog post</a>. In this example we are focused on how to build the Node-RED flow.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1688503725919/92c8196b-6ee7-4852-a591-c7d861f78503.png" alt class="image--center mx-auto" /></p>
<p>In this flow we are greeting the customer on joining the queue and getting the priority from the external application based on the called number. If the number is allocated for a VIP person, the call is placed into the queue with the highest priority. In case the external server is not reachable the catch node continues the logic and places the call with the default priority. </p>
<p>During the querying we are announcing that the call is still in the queue and we are waiting for an available agent to handle the call. When the call waits for a free agent longer than allowed time, the call is disconnected with a ‘goodbye’ announcement.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>The ability to assign priorities to calls in a queue is a useful feature for many use cases.  It is now supported in jambonz, but is completely optional.  As before, calls can be queued with no priority if desired.</p>
<p>Thanks to @AVoylenko for driving this feature!</p>
]]></content:encoded></item><item><title><![CDATA[How to monitor jambonz on AWS using Voipmonitor and Traffic Mirroring]]></title><description><![CDATA[jambonz comes with a great set of observability tools out of the box:

a grafana dashboard displaying key performance metrics,

opentelemetry application traces, which are visible in the jambonz portal, and

the ability to download sip traces in the ...]]></description><link>https://blog.jambonz.org/how-to-monitor-jambonz-on-aws-using-voipmonitor-and-traffic-mirroring</link><guid isPermaLink="true">https://blog.jambonz.org/how-to-monitor-jambonz-on-aws-using-voipmonitor-and-traffic-mirroring</guid><category><![CDATA[#cpass #voicegateway #opensource #jambonz]]></category><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Thu, 18 May 2023 14:38:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1684420664944/125d4af0-6e58-4599-9491-9ab457e18dbd.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>jambonz comes with a great set of observability tools out of the box:</p>
<ul>
<li><p>a grafana dashboard displaying key performance metrics,</p>
</li>
<li><p>opentelemetry application traces, which are visible in the jambonz portal, and</p>
</li>
<li><p>the ability to download sip traces in the form of pcap files from the portal for recent calls.</p>
</li>
</ul>
<p>The gold standard in VoIP system monitoring, however, has always been <a target="_blank" href="https://www.voipmonitor.org/">Voipmonitor</a>. Voipmonitor provides a wealth of charts and detailed analytic tools for SIP and RTP that are powerful and yet accessible to tier-one support engineers. If you want to level up your jambonz support organization, implementing Voipmonitor is a great choice.</p>
<p>In this article we will show you how to deploy Voipmonitor on AWS using the Traffic Mirroring feature to mirror traffic from your jambonz SIP and RTP servers to a voipmonitor server.</p>
<blockquote>
<p>Note: This works both for EC2 as well as Kubernetes installs.</p>
</blockquote>
<p>Also, as a bonus, we will show you how to write the mirrored traffic to pcap files and upload them to S3 storage, separately from Voipmonitor. Having rolling raw pcap files like this can be useful for troubleshooting low level transport connection issues such as TLS connectivity with carriers.</p>
<p>Let's get started!</p>
<h2 id="heading-what-you-need">What you need</h2>
<h3 id="heading-nitro-based-ec2-instances">Nitro-based EC2 instances</h3>
<p>We'll be mirroring the traffic to the voipmonitor server using <a target="_blank" href="https://docs.aws.amazon.com/vpc/latest/mirroring/traffic-mirroring-getting-started.html">AWS Traffic Mirroring</a>, and that feature is only available to <a target="_blank" href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html#ec2-nitro-instances">nitro-based instances</a>, so make sure that your jambonz servers are running on a nitro-based instance as well as the new server that you'll be spinning up to run voipmonitor.</p>
<h3 id="heading-voipmonitor">Voipmonitor</h3>
<p>And of course you'll need to install Voipmonitor as well (we'll show you how). Voipmonitor is a commercial product, but don't be dissuaded from trying it -- they provide a 30-day free trial and the <a target="_blank" href="https://www.voipmonitor.org/whmcs/cart.php?gid=1">pricing is very reasonable</a> when you are ready for upgrade to a paid license.</p>
<h2 id="heading-installing-voipmonitor">Installing Voipmonitor</h2>
<p>The first step is to spin up a new instance in the AWS VPC where your jambonz system is running (Traffic Mirroring works within a VPC so Voipmonitor must be in the same VPC, unless you choose to do more complicated VPC peering). Again, this must be a nitro-based instance; in my deployment I chose to use a t3.small. You'll want a lot of disk space, at least 50G (for my deployment I used a 100 G disk).</p>
<p>So go ahead and spin up a nitro-based instance on Debian 11, configured as above, in your jambonz VPC. Create a new security group for the instance that allows the following traffic in:</p>
<ul>
<li><p>ssh (22/tcp) from anywhere,</p>
</li>
<li><p>http (80/tcp) and https (443/tcp) from anywhere, and</p>
</li>
<li><p>VXLAN (4789/udp) from the VPC</p>
</li>
</ul>
<p>Once the instance is up and running, follow <a target="_blank" href="https://www.voipmonitor.org/doc/Debian_11">these instructions</a> to install the voipmonitor GUI and sniffer. The sniffer will be reading the mirrored traffic from the network interface that will be coming in on 4789/udp. The default voipmonitor config (in /etc/voipmonitor.conf) listens for VXLAN traffic on this port so no changes are needed to the config file.</p>
<p>After installing the GUI, connect your browser to the public IP of the voipmonitor server using http (not https). You will be guided through the final stages of the install and redirected to the voipmonitor corporate site to generate a free license for the demo period of 30 days.</p>
<p>As a final step, If you want to enable HTTPS for the voipmonitor GUI <a target="_blank" href="https://www.voipmonitor.org/doc/Enable_SSL/TLS_%2B_self_signed_certificate_for_http_server">follow these instructions</a>. As an example, to use <a target="_blank" href="https://letsencrypt.org">letsencrypt</a> to generate your certificate you would first create a DNS A record for the server in your DNS provider, and then simply do this:</p>
<pre><code class="lang-bash">apt-get update
apt install snapd
snap install core
snap install --classic certbot
ln -s /snap/bin/certbot /usr/bin/certbot
certbot --apache
systemctl restart apache2
</code></pre>
<p>Once voipmonitor is up and running, we now need to mirror the traffic from the jambonz SIP and RTP servers to voipmonitor.</p>
<h2 id="heading-configuring-aws-traffic-mirroring">Configuring AWS Traffic Mirroring</h2>
<p>There are three steps to configuring traffic mirroring:</p>
<ol>
<li><p>Create a mirror target</p>
</li>
<li><p>Create two mirror filters: one for SIP and one for RTP</p>
</li>
<li><p>Create mirror sessions for each jambonz SIP and RTP server. A mirror session will direct traffic from one elastic network interface (ENI) to the mirror target, using the mirror filter to determine which traffic to mirror.</p>
</li>
</ol>
<h3 id="heading-create-a-mirror-target">Create a mirror target</h3>
<p>First, retrieve and copy the Interface ID for the network interface that is attached to voipmonitor instance that you created. You can find it in the network panel of the instance details as shown below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1684377647168/0be45cc6-26ff-4ddb-86e3-42c9a3856a39.png" alt class="image--center mx-auto" /></p>
<p>Then go to Traffic mirroring / MIrror targets / Create mirror target. Leave Target type set to "Network Interface" and paste the Interface ID that you copied above. Add a Name tag and click Create. This configures the voipmonitor instance to receive the mirrored traffic.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1684377827038/f45a1c0d-0ba1-481f-a629-e59a9a6f553c.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-create-mirror-filters">Create mirror filters</h3>
<p>Go to Traffic mirroring / Mirror filters / Create mirror filter. First, let's create a filter for sip traffic.</p>
<p>Create inbound rules that accept the following:</p>
<ul>
<li><p>port 5060/udp from anywhere (this is for sip over udp),</p>
</li>
<li><p>5060/tcp, 5061/tcp, 8443/tcp from anywhere (sip over tcp, tls, and websockets), and</p>
</li>
<li><p>icmp (can be useful to troubleshoot destination unreachable issues).</p>
</li>
</ul>
<p>Create outbound rules that accept the following:</p>
<ul>
<li><p>5060/udp sent from the VPC to anywhere</p>
</li>
<li><p>5060/tcp, 5061/tcp, 8443/tcp from the VPC sent anywhere, and</p>
</li>
<li><p>icmp</p>
</li>
</ul>
<p>Save the mirror filter. As an example, my sip inbound rules look like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1684421674864/d8a2fe59-907c-41c6-9314-ac261084197e.png" alt class="image--center mx-auto" /></p>
<p>and my sip outbound rules look like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1684421696319/df6ec5ee-6970-41f9-a60d-992708033073.png" alt class="image--center mx-auto" /></p>
<p>Now create a second filter for rtp. Inbound rules this time will be:</p>
<ul>
<li><p>dst port 40000-60000/udp</p>
</li>
<li><p>icmp</p>
</li>
</ul>
<p>and outbound rules will be:</p>
<ul>
<li><p>src port: 40000-60000/udp</p>
</li>
<li><p>icmp</p>
</li>
</ul>
<h3 id="heading-create-mirror-sessions">Create mirror sessions</h3>
<p>Finally, create a mirror session for every jambonz SIP server and RTP server. If using Kubernetes, you will create a mirror session for every node in the SIP and RTP node pools. The source node in each case will be the jambonz server and the destination will be the mirror target that you created earlier.</p>
<p>As a first step, for each source node gather the interface IDs for the ENI for each EC2 instance. Then create a mirror session for each node; for sip nodes use the sip mirror filter and for rtp nodes using the rtp mirror filter. In all cases connect to the single mirror target that you've created.</p>
<p>Once you've done this, mirrored traffic should be flowing to voipmonitor and you should see calls in the voipmonitor GUI.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1684379214380/043c506a-2a92-4ef6-b7f5-cbfc5e1cd869.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-bonus-section-upload-pcap-files-to-aws-s3">Bonus section: upload pcap files to AWS S3</h2>
<p>In addition to voipmonitor you may also want to upload pcaps of network traffic to AWS S3.</p>
<blockquote>
<p>Note: this section does not require the voipmonitor install, though it does require the traffic mirroring to be set up as above.</p>
</blockquote>
<h3 id="heading-create-an-s3-bucket">Create an S3 bucket</h3>
<p>Create a bucket in S3 to hold the pcap files.</p>
<h3 id="heading-install-the-aws-cli-on-the-mirror-target-server">Install the aws cli on the mirror target server</h3>
<pre><code class="lang-bash">sudo apt-get update
sudo apt-get upgrade
sudo apt-get install unzip
curl <span class="hljs-string">"https://d1vvhvl2y92vvt.cloudfront.net/awscli-exe-linux-x86_64.zip"</span> -o <span class="hljs-string">"awscliv2.zip"</span>
unzip awscliv2.zip
sudo ./aws/install
</code></pre>
<p>Configure the aws cli</p>
<pre><code class="lang-bash">aws configure
</code></pre>
<h3 id="heading-create-a-systemd-file-for-tcpdump">Create a systemd file for tcpdump</h3>
<p>We'll be using tcpump to continually write the incoming mirrored traffic to pcap files on the server. To do that, create a file named /etc/systemd/system/tcpdump.service:</p>
<pre><code class="lang-bash">[Unit]
Description=Tcpdump

[Service]
ExecStart=/usr/bin/tcpdump -ni ens5 <span class="hljs-string">'port 4789'</span> -W 5 -C 300 -w /tmp/rolling.pcap
Restart=on-failure
User=root
LimitNOFILE=4096

[Install]
WantedBy=multi-user.target
</code></pre>
<p>This writes all the encapsulated traffic to 5 rotating pcap files, closing each file when it reaches 300 MB. Start the service:</p>
<pre><code class="lang-bash">sudo systemctl daemon-reload
sudo systemctl start tcpdump.service
</code></pre>
<h3 id="heading-monitor-pcap-files-and-upload-them-to-aws-s3">Monitor pcap files and upload them to AWS S3</h3>
<p>Next we need a job to monitor these files and upload them to your bucket. First, install support for inotify, which we will use to detect when a rolling pcap file has been closed.</p>
<pre><code class="lang-bash">sudo apt-get install inotify-tools
</code></pre>
<p>Create a file named /usr/local/bin/pcap-monitor.sh. Copy the contents below into the file:</p>
<pre><code>#!<span class="hljs-regexp">/bin/</span>bash

folder=<span class="hljs-string">"/tmp/"</span>
bucket=<span class="hljs-string">"&lt;your-bucket-name&gt;"</span>

inotifywait -m $folder -e close_write |
    <span class="hljs-keyword">while</span> read path action file; <span class="hljs-keyword">do</span>
        <span class="hljs-keyword">if</span> [[ <span class="hljs-string">"$file"</span> =~ .*pcap[<span class="hljs-number">0</span><span class="hljs-number">-9</span>]$ ]]; then
            echo <span class="hljs-string">"The file '$file' appeared in directory '$path' via '$action'"</span>
            # Format the current date and time
            current_date=$(date +%Y-%m-%d)
            current_time=$(date +%H-%M-%S)
            # Copy and compress the file
            cp ${path}${file} ${path}${file}.tmp &amp;&amp; gzip -q ${path}${file}.tmp
            # S3 path
            s3_path=<span class="hljs-string">"s3://$bucket/${current_date}/${current_time}.pcap.gz"</span>
            echo <span class="hljs-string">"Uploading to $s3_path"</span>
            # Upload to S3 and remove the compressed file
            aws s3 cp ${path}${file}.tmp.gz $s3_path --quiet &amp;&amp; rm ${path}${file}.tmp.gz
            echo <span class="hljs-string">"Upload to $s3_path completed"</span>
        fi
    done
</code></pre><p>This script watches the /tmp directory and every time a new pcap file is closed it will zip it and then upload the zip file to your S3 bucket, deleting the zip file afterwards. Make the file executable:</p>
<pre><code class="lang-bash">sudo chmod a+x /usr/<span class="hljs-built_in">local</span>/bin/pcap-monitor.sh
</code></pre>
<p>Next, we'll create a systemd daemon to run this.</p>
<p>Create a file named /etc/systemd/system/pcap-monitor.service and copy the contents below in:</p>
<pre><code class="lang-bash">[Unit]
Description=Monitor Script
After=network.target

[Service]
ExecStart=/bin/bash /usr/<span class="hljs-built_in">local</span>/bin/pcap-monitor.sh
WorkingDirectory=/tmp
StandardOutput=journal
StandardError=journal
SyslogIdentifier=pcap-monitor

[Install]
WantedBy=multi-user.target
</code></pre>
<p>Now start it:</p>
<pre><code class="lang-bash">sudo systemctl daemon-reload
sudo systemctl restart pcap-monitor
</code></pre>
<p>All set! At this point pcap files should be periodically uploaded to your bucket, into folders by date. You may want to configure the AWS bucket to automatically delete pcap files after a certain number of days. But now you will have access to all network traffic from your production SBC SIP and RTP servers in order to troubleshoot any tricky SIP interop issues in production.</p>
<h3 id="heading-time-for-one-more-option">Time for one more option?</h3>
<p>If you like, with a few more steps you can strip the VXLAN headers from the pcaps, so that they will appear in wireshark exactly as they would as they arrived over the wire.  There is no problem with leaving the pcaps encapsulated in the VXLAN headers, but it might be slightly less confusing when you analyze them in a tool like wireshark without them.  If you want to do this, first you need to build a simple utility to strip the headers -- luckily, I have one for you on my github!</p>
<pre><code class="lang-bash">sudo apt-get install libpcap-dev build-essential git
git <span class="hljs-built_in">clone</span> https://github.com/drachtio/decap_vxlan.git
<span class="hljs-built_in">cd</span> decap_vxlan
make &amp;&amp; sudo make install
</code></pre>
<p>Now edit the /usr/local/bin/pcap-monitor.sh file to be like this</p>
<pre><code>#!<span class="hljs-regexp">/bin/</span>bash

folder=<span class="hljs-string">"/tmp/"</span>
bucket=<span class="hljs-string">"&lt;your-bucket-name&gt;"</span>

inotifywait -m $folder -e close_write |
    <span class="hljs-keyword">while</span> read path action file; <span class="hljs-keyword">do</span>
        <span class="hljs-keyword">if</span> [[ <span class="hljs-string">"$file"</span> =~ .*pcap[<span class="hljs-number">0</span><span class="hljs-number">-9</span>]$ ]]; then
            echo <span class="hljs-string">"The file '$file' appeared in directory '$path' via '$action'"</span>
            # Format the current date and time
            current_date=$(date +%Y-%m-%d)
            current_time=$(date +%H-%M-%S)
            # Copy and compress the file
            cat ${path}${file} | decap_vxlan | gzip -q &gt; ${path}${file}.tmp.gz
            # S3 path
            s3_path=<span class="hljs-string">"s3://$bucket/${current_date}/${current_time}.pcap.gz"</span>
            echo <span class="hljs-string">"Uploading to $s3_path"</span>
            # Upload to S3 and remove the compressed file
            aws s3 cp ${path}${file}.tmp.gz $s3_path --quiet &amp;&amp; rm ${path}${file}.tmp.gz
            echo <span class="hljs-string">"Upload to $s3_path completed"</span>
        fi
    done
</code></pre><p>That's it!  Now you've rolling pcaps of your recent sip and rtp traffic available in a secured S3 bucket for your jambonz deployment.  Enjoy!</p>
]]></content:encoded></item><item><title><![CDATA[Speech companies are failing at conversational AI]]></title><description><![CDATA[It might seem that we're in a golden age of deploying speech technology into contact centers.  You'd be forgiven for thinking that, what with the large numbers of new companies in the space, most funded with planeloads of VC cash, exciting new applic...]]></description><link>https://blog.jambonz.org/speech-companies-are-failing-at-conversational-ai</link><guid isPermaLink="true">https://blog.jambonz.org/speech-companies-are-failing-at-conversational-ai</guid><category><![CDATA[speech to text]]></category><category><![CDATA[voicebot conversationalai]]></category><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Sat, 06 May 2023 15:29:26 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1683385624269/427aa339-6ee4-47b1-8471-5c06c24aefe3.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It might seem that we're in a golden age of deploying speech technology into contact centers.  You'd be forgiven for thinking that, what with the large numbers of new companies in the space, most funded with planeloads of VC cash, exciting new applications of AI to speech recognition, and strategic imperative that enterprises see in finally making automated voice interactions...well, not suck.</p>
<p>The truth, however, is that speech technology providers are still failing at conversational AI for the simple reason behind most business failures: they aren't listening to and anticipating their customers' needs.</p>
<p>As the creator of <a target="_blank" href="https://jambonz.org">jambonz</a>, the open source voice gateway for conversational AI, I've spent the past three years working with most of the commercial speech vendors. Based on that experience, I'm suggesting three high-value (and blindingly obvious) features for conversational AI that speech vendors need to implement to improve the conversational AI experience.</p>
<p>But first, let's begin by enumerating the different requirements that conversational AI has from long-form speech-to-text transcription. There are some seemingly subtle yet very important distinctions:</p>
<ul>
<li><p>Every piece of audio from a caller we transcribe in a conversational AI use case is in response to a question or a prompt. There is always something that we <strong>just</strong> asked or said to the user that he or she is responding to <strong>now</strong>. In other words, conversational AI is highly contextual on a <em>short-term</em> (query-response) basis.</p>
</li>
<li><p>Conversational AI is about....(wait for it)...conversations. It's a two-way discussion, <strong>even if we're only transcribing one side of it</strong> (the caller). The conversation proceeds turn by turn. The deeper into the conversation we get, the more accurate and faster we ought to become at accurately transcribing what is being said. Conversational AI, therefore, is also highly contextual on a <em>medium-term</em> (conversation length) basis (and that context includes both sides of the conversation).</p>
</li>
<li><p>During a conversation, there are times when we don't want or need to transcribe a caller's speech. We're in essence not listening to them for certain periods. For instance, we may want to make them listen to a lengthy prompt in full (perhaps for legal purposes). We need to be listening/transcribing most of the time, but not all of the time.</p>
</li>
</ul>
<p>Those are simple properties of a conversational dialog that we can probably all agree on. So what? Well, from these properties we can draw the following critical features that we would require from the underlying speech recognition technology:</p>
<ul>
<li>fine-tuned control of endpointing,</li>
<li>an API interface that includes relevant prompt, and</li>
<li>suitable billing models.</li>
</ul>
<p>Here's what I mean:</p>
<h2 id="heading-fine-tuned-control-of-endpointing">Fine-tuned control of endpointing</h2>
<p>"endpointing" is a feature wherein the speech provider uses speech energy detection to determine the end of an utterance and then returns a transcript for that utterance.</p>
<p>In conversational AI, we prompt the user, and then we want to gather their response. This process is somewhat more an art than a science as you can imagine. We want to get the user's full thought (i.e. not have the recognizer return only the first half of a sentence). Yet we also want to minimize the latency in the conversation (i.e., not have the recognizer take so long to determine the user has finished speaking that the conversation becomes laden with periods of unnatural silence).</p>
<p>All the speech providers that we support in the open source <a target="_blank" href="https://jambonz.org">jambonz</a> conversational AI voice gateway support endpointing.  But only one of them -- <a target="_blank" href="https://deepgram.com">Deepgram</a> exposes via their API the ability to control the endpointing behavior.  </p>
<p>Controlling endpointing behavior would be (and in the case of Deepgram, <em>is</em>) a highly useful feature.  When I have a voicebot ask a user a question that implies a quick confirmation ("Would you like to speak to an agent?"), I'm expecting a yes/no answer.  As a result, I want the endpointing to be very quick (maybe 500 milliseconds). If I'm asking the customer a broader question ("Please tell us how can we help you today?"), however, I want endpointing to wait for maybe 2 seconds of silence to make sure I get everything they want to say, which may, in this case, be more than a single sentence.</p>
<p>All speech providers should expose via their API the ability to control endpointing behavior.</p>
<h2 id="heading-api-interface-that-includes-relevant-prompt">API interface that includes relevant prompt</h2>
<p>Today, most of us are enchanted by the power of ChatGPT, right? And like me, I bet you're really impressed with how generative language models can create such high quality responses in response to nothing but prompt text.</p>
<p>If so, you might also find it strange that while <strong>every time</strong> we connect to a speech recognizer during conversational AI we have <strong>just provided a prompt</strong> to the caller, the speech provider's API apparently has no interest in knowing what that prompt was.  Wouldn't that prompt help shape answers?  For that matter, wouldn't that prompt also help determine the most effective endpointing configuration to use for the current user response?</p>
<p>Today, speech providers allow for things like hints in their APIs -- an array of words or phrases that should be "boosted" in terms of making the recognizer more aware of them.  That's great, and we should have hints, but even more valuable than hints is the <strong>question I just asked the user which she is now responding to</strong>.  And guess what -- we have that exact question, in text form, because we probably just did text-to-speech to generate that question!</p>
<p>So please, speech vendors, augment your APIs to let me tell you the current prompt the user is responding to with their speech. Then, use that information to create more accurate responses.</p>
<p>Some examples may be helpful:</p>
<ul>
<li>I just prompted the user, "Could you please spell your last name?", so the recognizer should now expect some spoken letters (i.e. don't transcribe "T" as "tee").</li>
<li>I just prompted the user, "Could you tell me what is wrong with your medical equipment?", so the recognizer should automatically boost medical equipment words or phrases.</li>
<li>I just prompted the user, "Is this the best number to call you back on?", so the recognizer should be prepared to return quickly after "yes", "no," or other confirmatory/negatory phrases.</li>
</ul>
<h2 id="heading-suitable-billing-model">Suitable billing model</h2>
<p>The billing model that most (all?) speech providers use is per-second billing for the time we are connected to the recognizer, sometimes rounded up to threshold.  We get charged this regardless of whether we actively want a transcript at any given point in the conversation.  This result leads to an implementation model where the voice gateway connects to the speech recognizer for each turn of the conversation, creates a transcript, and then drops the connection.  In the next turn of the conversation, we will prompt the user, connect again to the recognizer, and get a new transcript.</p>
<p>Dropping and re-establishing the connection like this is done to save cost, but it isn't ideal for several reasons.  For one thing, there's a bit of overhead each time in establishing the connection, during which speech from the user might be lost (though in <a target="_blank" href="https://www.jambonz.org/">jambonz</a>, we queue incoming voice frames during connection to avoid this).  </p>
<p>More importantly, though, any chance for using longer term context to improve results is lost.  Consider again the ChatGPT experience: the longer the conversation that you have with it becomes, the better results you receive.  As the conversation proceeds, ChatGPT has more context to form its answers.</p>
<p>What speech providers should do is to provide an API that lets a <a target="_blank" href="https://www.jambonz.org/">voice gateway platform like jambonz</a> connect once, at the start of the call.  Then at any time during the connection, they should allow the voice gateway to call an API to "pause" recognition.  During the paused interval, they shouldn't not charge me and should simply discard any voice packets that are sent over the connection.  When I'm ready to gather a response from the user again, I should be able to call an API to "resume" recognition over this same connection.  Billing again can start at this point.</p>
<p>Most importantly, speech providers should use the enhanced context gained from the ongoing conversation to give me more accurate and faster results the deeper into the conversation we go.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>I hope I'll look back at this blog post in a year and laugh at how outdated it has become. I hope that I'll see an array of speech companies that really "get" conversational AI and have invested time and attention to properly support our use case. Unfortunately, that's not the situation today.</p>
<p>Conversational AI has some unique requirements for speech recognition, and today's speech providers are not meeting them.  The result is that conversational AI experiences are by and large not matching the industry hype.  Speech providers need to stop looking at transcription as a one-size-fits all solution and build the services that we in the conversational AI space need to create the experiences that will truly delight customers.</p>
<p>If you like what you've read, check out the <a target="_blank" href="https://blog.jambonz.org/">jambonz blog</a> or <a target="_blank" href="https://jambonz.us6.list-manage.com/subscribe?u=2ab4d55936b7267b749491c84&amp;id=d75214535e">subscribe to our newsletter</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Deploying jambonz on Google Cloud]]></title><description><![CDATA[Overview
This article describes how to build a jambonz image on GCP using packer and then deploy a VM using terraform. To follow along the example here, you will need the following:

a google cloud account

packer and terraform and git installed on y...]]></description><link>https://blog.jambonz.org/deploying-jambonz-on-google-cloud</link><guid isPermaLink="true">https://blog.jambonz.org/deploying-jambonz-on-google-cloud</guid><category><![CDATA[Communications Platform as a Service (CPaaS) Market ]]></category><category><![CDATA[voip]]></category><category><![CDATA[Voice]]></category><category><![CDATA[voice assistants]]></category><category><![CDATA[conversational-ai]]></category><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Tue, 18 Apr 2023 21:38:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1683388413072/ef32bec1-ee41-438f-a83b-0016047d827c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-overview">Overview</h1>
<p>This article describes how to build a jambonz image on GCP using <a target="_blank" href="https://www.packer.io/">packer</a> and then deploy a VM using <a target="_blank" href="https://www.terraform.io/">terraform</a>. To follow along the example here, you will need the following:</p>
<ul>
<li><p>a <a target="_blank" href="https://console.cloud.google.com">google cloud account</a></p>
</li>
<li><p><a target="_blank" href="https://www.packer.io/">packer</a> and <a target="_blank" href="https://www.terraform.io/">terraform</a> and git installed on your laptop</p>
</li>
</ul>
<p>Let's get started!</p>
<h1 id="heading-create-a-gcp-project-and-service-account">Create a GCP project and service account</h1>
<p>In order to run packer and terraform locally on your laptop and create resources on Google Cloud Platform we'll need to download some credentials to our laptop. So login into the <a target="_blank" href="https://console.cloud.google.com">GCP console</a>, create a project (or select an existing one) and then from the main menu select IAM &amp; Admin / Service Accounts. Click "Create Service Account", fill in the details and click "Create and Continue".</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681056959154/cc9ffa22-3248-46dc-975e-e02ad3e30059.png" alt class="image--center mx-auto" /></p>
<p>Add the following roles to the service account and then click "Continue":</p>
<ul>
<li><p>Compute Admin</p>
</li>
<li><p>Compute Instance Admin (v1)</p>
</li>
<li><p>Compute Network Admin</p>
</li>
<li><p>Service Account Admin</p>
</li>
<li><p>Service Account User</p>
</li>
</ul>
<p>Now find the service account you just created in the list, select it and then click Add Key / Create Key and select JSON. Download the json key file to your laptop and save it to folder.</p>
<ul>
<li><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681057268984/b4c5ae25-ff4b-4776-9d43-e80199fd8424.png" alt class="image--center mx-auto" /></li>
</ul>
<h1 id="heading-check-out-jambonz-infrastructure-from-github">Check out jambonz-infrastructure from github</h1>
<p>In a terminal window on your laptop check out <a target="_blank" href="https://github.com/jambonz/jambonz-infrastructure">jambonz-infrastructure</a> and navigate to the packer folder for building the jambonz-mini.</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> https://github.com/jambonz/jambonz-infrastructure.git
<span class="hljs-built_in">cd</span> jambonz-infrastructure/packer/jambonz-mini/
</code></pre>
<p>Next, set the environment variable <code>GOOGLE_APPLICATION_CREDENTIALS</code> to point to the JSON service key file that you downloaded</p>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> GOOGLE_APPLICATION_CREDENTIALS=~/Downloads/drachtio-cpaas-4712713e4b95.json
</code></pre>
<h1 id="heading-building-a-image">Building a image</h1>
<p>Now you are ready to build a GCP image. First, review the settings in the file <code>gcp-template.json</code> and change:</p>
<ul>
<li><p><code>project_id</code> to your gcp project id,</p>
</li>
<li><p><code>image_zone</code> to the zone that you want to build the image in.<br />  You can also change the <code>disk_size</code> if you want to have a larger disk than the default 80G (it is recommended to have at least 80G to handle time series data like call detail records and logs).</p>
</li>
</ul>
<p>Once you have made any changes necessary, start the packer build. This will take 30-45 minutes to complete as it builds all of the supporting software projects as well as jambonz.</p>
<pre><code class="lang-bash"> packer build -color=<span class="hljs-literal">false</span> gcp-template.json
</code></pre>
<p>Once it is completes it will generate an image:</p>
<pre><code class="lang-bash">    googlecompute: Processing triggers <span class="hljs-keyword">for</span> man-db (2.9.4-2) ...
==&gt; googlecompute: Deleting instance...
    googlecompute: Instance has been deleted!
==&gt; googlecompute: Creating image...
==&gt; googlecompute: Deleting disk...
    googlecompute: Disk has been deleted!
Build <span class="hljs-string">'googlecompute'</span> finished after 40 minutes 57 seconds.
</code></pre>
<p>You will be able to see the image in the GCP console:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681067910394/ac29ac92-cb2d-49b6-b6cb-83c25bf35d8a.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-deploying-a-vm-instance">Deploying a VM instance</h1>
<p>Now change into the terraform folder</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> ../../terraform/gcp/jambonz-mini/
</code></pre>
<p>You can now prepare to run terraform by running the following commands:</p>
<pre><code class="lang-bash">terraform init
terraform plan
</code></pre>
<p>This will prompt you to provide some variables, most importantly the image id that you just created. If you prefer, you can also provide these settings by creating a file named <code>deployment.tfvars</code> in the same folder, for example in my case:</p>
<pre><code class="lang-bash">image = <span class="hljs-string">"packer-1681065267"</span>
region = <span class="hljs-string">"us-central1"</span>
zone = <span class="hljs-string">"us-central1-a"</span>
project = <span class="hljs-string">"drachtio-cpaas"</span>
dns_name = <span class="hljs-string">"jambonz.me"</span>
instance_type = <span class="hljs-string">"e2-medium"</span>
</code></pre>
<p>the <code>dns_name</code> should be a DNS name in a domain controlled by you. It will be the http URL at which you access the jambonz.portal. The <code>instance_type</code> is the GCP instance type that the jambonz VM instance will be running on. We recommend an instance type with a minimum of 2vCPUs and 8 GB Ram (more of each if this server is going to handle production traffic).</p>
<p>and then run the command as:</p>
<pre><code class="lang-bash">terraform plan -var-file=<span class="hljs-string">"deployment.tfvars"</span>
</code></pre>
<p>If all looks good, then run it:</p>
<pre><code class="lang-bash">terraform apply
</code></pre>
<p>or, if you've created a variables file:</p>
<pre><code class="lang-bash">terraform apply -var-file=<span class="hljs-string">"deployment.tfvars"</span>
</code></pre>
<p>This should run successfully with output like the following:</p>
<pre><code class="lang-bash">random_string.uuid: Creating...
random_string.uuid: Creation complete after 0s [id=ntzsiu]
google_compute_address.jambonz_static_ip: Creating...
google_compute_firewall.jambonz_mini_firewall_rule: Creating...
google_compute_address.jambonz_static_ip: Still creating... [10s elapsed]
google_compute_firewall.jambonz_mini_firewall_rule: Still creating... [10s elapsed]
google_compute_address.jambonz_static_ip: Creation complete after 11s [id=projects/drachtio-cpaas/regions/us-central1/addresses/jambonz-static-ip-ntzsiu]
google_compute_instance.jambonz_mini: Creating...
google_compute_firewall.jambonz_mini_firewall_rule: Creation complete after 11s [id=projects/drachtio-cpaas/global/firewalls/jambonz-firewall-rule-ntzsiu]
google_compute_instance.jambonz_mini: Still creating... [10s elapsed]
google_compute_instance.jambonz_mini: Still creating... [20s elapsed]
google_compute_instance.jambonz_mini: Still creating... [30s elapsed]
google_compute_instance.jambonz_mini: Still creating... [40s elapsed]
google_compute_instance.jambonz_mini: Creation complete after 42s [id=projects/drachtio-cpaas/zones/us-central1<span class="hljs-_">-a</span>/instances/jambonz-mini-ntzsiu]

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
</code></pre>
<p>Now you should be able to log into the GCP console and see the running instance. Make note of the external (static) IP because in the final step you will add DNS A records pointing to this IP.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681088586507/39a8f812-bff3-4c06-b28d-1092bea858b2.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-create-dns-records-for-the-server">Create DNS records for the server</h1>
<p>Now create DNS records in your DNS provider for the name that you provided when creating the instance. For example, in my case I chose to use the name jambonz.me to refer to the VM instance, and I will create the following two DNS A records, each pointing to the static IP of the server:</p>
<ul>
<li><p>jambonz.me</p>
</li>
<li><p>grafana.jambonz.me</p>
</li>
<li><p>In my case, I am using [dnsmadeeasy] as my DNS provider so my DNS records look like this:</p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681088913077/ef70e89a-7659-4205-bf78-57e4daa7a152.png" alt class="image--center mx-auto" /></p>
<p>  # Log into the portal and configure the system</p>
</li>
<li><p>At this point you can log into the jambonz portal at the DNS name that you chose with user / password "admin/admin". You will be forced to change the password on your first login.</p>
</li>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681089040242/f1fcdfbb-a1df-4374-98d5-67a6dc85a49e.png" alt class="image--center mx-auto" /></p>
<p>  You'll want to configure your Carriers (sip trunks) and speech credentials, but at this point you have a fully functional jambonz system running on GCP. Enjoy!</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Tutorial: adding support for a custom speech provider]]></title><description><![CDATA[jambonz supports many speech providers out of the box, but what if you want to use a speech provider for that is not currently supported? There is where the jambonz custom speech API comes in.

The custom speech api requires jambonz 0.8.2 or above

I...]]></description><link>https://blog.jambonz.org/tutorial-adding-support-for-a-custom-speech-provider</link><guid isPermaLink="true">https://blog.jambonz.org/tutorial-adding-support-for-a-custom-speech-provider</guid><category><![CDATA[Voicebot]]></category><category><![CDATA[conversational-ai]]></category><category><![CDATA[ASR]]></category><category><![CDATA[cpaas]]></category><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Fri, 31 Mar 2023 01:03:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1683388862843/98c5ea52-ee5e-40e0-9ad3-462ab97d3a62.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>jambonz supports many speech providers out of the box, but what if you want to use a speech provider for that is not currently supported? There is where the jambonz <a target="_blank" href="https://www.jambonz.org/docs/speech-api/overview/">custom speech API</a> comes in.</p>
<blockquote>
<p>The custom speech api requires <a target="_blank" href="https://www.jambonz.org/docs/release-notes/v0.8.2">jambonz 0.8.2</a> or above</p>
</blockquote>
<p>In this article, we will walk through an example of adding support for <a target="_blank" href="https://alphacephei.com/vosk/server">Vosk</a> which is an open source speech recognition engine that you can run on your own infrastructure. Vosk is not natively supported by jambonz, but we shall see that we can easily add in support for it using the custom speech api.</p>
<p>As described <a target="_blank" href="https://www.jambonz.org/docs/supporting-articles/custom-speech-stt/">in the api docs</a>, to add support for a speech recognition provider you need to build a websocket server that provides the integration with the speech provider. Your websocket server receives JSON messages and an audio stream from <a target="_blank" href="https://jambonz.org">jambonz</a> and returns transcripts.</p>
<p>The example code we'll be using to integrate Vosk STT can be found on github in the <a target="_blank" href="https://github.com/jambonz/custom-speech-example">custom-speech-example repo</a>. To run this example you need the following prerequisites:</p>
<ul>
<li><p>a jambonz system running 0.8.2 or above</p>
</li>
<li><p>a basic jambonz app that exercises speech recognition</p>
</li>
<li><p>a server on which you can run docker and the custom-speech-example Node.js app (these could also be two different servers).</p>
</li>
</ul>
<h2 id="heading-provisioning-a-custom-speech-provider">Provisioning a custom speech provider</h2>
<p>First, let's log into the portal and create a new speech service for Vosk.</p>
<p>To do so, select Speech and then click the + icon to add a new provider. Select "Custom" from the dropdown and give it a name -- we'll call it Vosk.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1680098524502/77d7f3e9-a4e0-430f-a361-6bb0a5a4b053.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>Check "Use for speech-to-text" and enter the ws(s) URL that is the endpoint of your websocket server. In my case as you can see, I'll be running that server on the same jambonz instance, and it will be listening on port 3088.</p>
</li>
<li><p>Also add an authentication token value that your websocket server can use to authenticate the connections from jambonz.</p>
</li>
</ul>
<p>Now you can click on Applications, select the jambonz application you are going to use for testing and specify to use Vosk as the STT provider.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1680099038469/927b5359-6e5d-443d-954d-1b7f787328f2.png" alt class="image--center mx-auto" /></p>
<blockquote>
<p>Hint: A good app to test with is simple echo voicebot that transcribes your voice and repeats it back to you using text-to-speech. You can quickly generate this app using the command "npx create-jambonz-ws-app -s echo my-echo-bot".</p>
</blockquote>
<h2 id="heading-running-vosk-server">Running Vosk server</h2>
<p>Now we need to run the Vosk server.  We are going to be using the grpc api that Vosk supports to send audio and commands to the server.  The simplest way to run it is in docker.  In my case I have Vosk server running on a separate server, as it is fairly demanding of system resources.</p>
<pre><code class="lang-bash">docker run --network host -ti alphacep/kaldi-grpc-en:latest
</code></pre>
<p>The Vosk server listens for incoming grpc connections on port 5001 by default, so in our next step we will configure the custom-speech-example app accordingly.</p>
<h2 id="heading-running-the-websocket-server-serving-the-speech-api-endpoint">Running the websocket server serving the speech api endpoint</h2>
<p>As mentioned above, I have cloned the repo to the same server jambonz is running on, and I will configure it to listen on port 3088 for connections from jambonz.  </p>
<p>It will also need to connect to Vosk on the remote server and port 5001, and it will authenticate connections from jambonz using the auth token I specified earlier when configuring the custom Vox speech provider in the jambonz portal.
To handle all of this configuration I've decided to run the Node.js app using <a target="_blank" href="https://pm2.io/">pm2</a> and included the following configuration file (ecosystem.config.js):</p>
<pre><code class="lang-js"><span class="hljs-comment">/* eslint-disable max-len */</span>
<span class="hljs-built_in">module</span>.exports = {
  <span class="hljs-attr">apps</span> : [
    {
      <span class="hljs-attr">name</span>: <span class="hljs-string">'jambonz-custom-speech-vendors'</span>,
      <span class="hljs-attr">script</span>: <span class="hljs-string">'app.js'</span>,
      <span class="hljs-attr">instance_var</span>: <span class="hljs-string">'INSTANCE_ID'</span>,
      <span class="hljs-attr">exec_mode</span>: <span class="hljs-string">'fork'</span>,
      <span class="hljs-attr">instances</span>: <span class="hljs-number">1</span>,
      <span class="hljs-attr">autorestart</span>: <span class="hljs-literal">true</span>,
      <span class="hljs-attr">watch</span>: <span class="hljs-literal">false</span>,
      <span class="hljs-attr">max_memory_restart</span>: <span class="hljs-string">'1G'</span>,
      <span class="hljs-attr">env</span>: {
        <span class="hljs-attr">LOGLEVEL</span>: <span class="hljs-string">'debug'</span>,
        <span class="hljs-attr">HTTP_PORT</span>: <span class="hljs-number">3088</span>,
        <span class="hljs-attr">API_KEY</span>: <span class="hljs-string">'foobar'</span>,
        <span class="hljs-attr">VOSK_URL</span>: <span class="hljs-string">'54.167.7.129:5001'</span>
      }
    }
  ]
};
</code></pre>
<h2 id="heading-test-it-out">Test it out!</h2>
<p>Now the fun part -- let's route a phone number to our jambonz app and test it out!</p>
<p>When I make the call, Vosk transcribes my speech and it is played back to me.  Looking at the logs from custom-speech-example, I can see the transcripts being received from Vosk and sent back to jambonz:</p>
<pre><code class="lang-bash">$ pm2 <span class="hljs-built_in">log</span> jambonz-custom-speech-vendors
{<span class="hljs-string">"msg"</span>:<span class="hljs-string">"Example jambonz speech server listening at http://localhost:3088"</span>} {<span class="hljs-string">"url"</span>:<span class="hljs-string">"/transcribe/vosk"</span>,<span class="hljs-string">"headers"</span>:{<span class="hljs-string">"pragma"</span>:<span class="hljs-string">"no-cache"</span>,<span class="hljs-string">"cache-control"</span>:<span class="hljs-string">"no-cache"</span>,<span class="hljs-string">"host"</span>:<span class="hljs-string">"localhost"</span>,<span class="hljs-string">"origin"</span>:<span class="hljs-string">"http://localhost"</span>,<span class="hljs-string">"upgrade"</span>:<span class="hljs-string">"websocket"</span>,<span class="hljs-string">"connection"</span>:<span class="hljs-string">"Upgrade"</span>,<span class="hljs-string">"sec-websocket-key"</span>:<span class="hljs-string">"1CvOEywO14v0DoJGMUjo9Q=="</span>,<span class="hljs-string">"sec-websocket-version"</span>:<span class="hljs-string">"13"</span>,<span class="hljs-string">"authorization"</span>:<span class="hljs-string">"Bearer foobar"</span>},<span class="hljs-string">"msg"</span>:<span class="hljs-string">"received upgrade request"</span>}
{<span class="hljs-string">"msg"</span>:<span class="hljs-string">"upgraded to websocket, url: /transcribe/vosk"</span>} {<span class="hljs-string">"obj"</span>:{<span class="hljs-string">"type"</span>:<span class="hljs-string">"start"</span>,<span class="hljs-string">"language"</span>:<span class="hljs-string">"en-US"</span>,<span class="hljs-string">"format"</span>:<span class="hljs-string">"raw"</span>,<span class="hljs-string">"encoding"</span>:<span class="hljs-string">"LINEAR16"</span>,<span class="hljs-string">"interimResults"</span>:<span class="hljs-literal">true</span>,<span class="hljs-string">"sampleRateHz"</span>:8000,<span class="hljs-string">"options"</span>:{}},<span class="hljs-string">"msg"</span>:<span class="hljs-string">"received JSON message from jambonz"</span>}
{<span class="hljs-string">"data"</span>:{<span class="hljs-string">"chunksList"</span>:[{<span class="hljs-string">"alternativesList"</span>:[{<span class="hljs-string">"text"</span>:<span class="hljs-string">"this is a test using a custom speech provider"</span>,<span class="hljs-string">"confidence"</span>:1,<span class="hljs-string">"wordsList"</span>:[]}],<span class="hljs-string">"pb_final"</span>:<span class="hljs-literal">true</span>,<span class="hljs-string">"endOfUtterance"</span>:<span class="hljs-literal">false</span>}]},<span class="hljs-string">"msg"</span>:<span class="hljs-string">"received data from vosk"</span>}
{<span class="hljs-string">"data"</span>:{<span class="hljs-string">"chunksList"</span>:[{<span class="hljs-string">"alternativesList"</span>:[{<span class="hljs-string">"text"</span>:<span class="hljs-string">"this is a test using a custom speech provider"</span>,<span class="hljs-string">"confidence"</span>:1,<span class="hljs-string">"wordsList"</span>:[]}],<span class="hljs-string">"pb_final"</span>:<span class="hljs-literal">true</span>,<span class="hljs-string">"endOfUtterance"</span>:<span class="hljs-literal">false</span>}]},<span class="hljs-string">"msg"</span>:<span class="hljs-string">"sending transcription to jambonz"</span>}
{<span class="hljs-string">"obj"</span>:{<span class="hljs-string">"type"</span>:<span class="hljs-string">"stop"</span>},<span class="hljs-string">"msg"</span>:<span class="hljs-string">"received JSON message from jambonz"</span>}
</code></pre>
<p>The experience for the caller is no different than using one of the native jambonz speech providers.</p>
<h2 id="heading-summary">Summary</h2>
<p>While jambonz comes packed with native support for a large number of speech providers (Google, Microsoft, AWS, Nuance, Nvidia, IBM Watson, and Wellsaid at the time of writing) it's super easy to add in your own speech providers using the <a target="_blank" href="https://www.jambonz.org/docs/speech-api/overview/">custom speech API</a>.  In this tutorial, we walked through adding support for the open source Vosk server.  In the example project that we shared, you will find other examples as well, including adding support for <a target="_blank" href="https://www.assemblyai.com/">AssemblyAI</a> speech recognition as well as an example of how to implement support for custom text-to-speech as well as speech-to-text.</p>
<p>For more information about jambonz visit us at jambonz.org, join our <a target="_blank" href="https://joinslack.jambonz.org">community slack channel</a>, or email us at support@jambonz.org.</p>
]]></content:encoded></item><item><title><![CDATA[Installing jambonz using AWS Marketplace]]></title><description><![CDATA[This article will walk you through the process of deploying jambonz using the AWS Cloudformation offering. Additionally, it shows the post-install steps necessary to modify the jambonz portal to use https instead of plain http.

The AWS Marketplace o...]]></description><link>https://blog.jambonz.org/installing-jambonz-using-aws-marketplace</link><guid isPermaLink="true">https://blog.jambonz.org/installing-jambonz-using-aws-marketplace</guid><category><![CDATA[#cpass #voicegateway #opensource #jambonz]]></category><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Sun, 08 Jan 2023 21:26:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1683389187345/8ef206af-427f-4b12-94a9-e4bd5266ca85.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This article will walk you through the process of deploying <a target="_blank" href="https://jambonz.org">jambonz</a> using the <a target="_blank" href="https://aws.amazon.com/marketplace/pp/prodview-55wp45fowbovo">AWS Cloudformation offering</a>. Additionally, it shows the post-install steps necessary to modify the jambonz portal to use https instead of plain http.</p>
<blockquote>
<p>The AWS Marketplace offering for a jambonz server is a flat fee of $18/month. This includes the ability to deploy an unlimited number of instances. Cost of the AWS instances themselves and associated infrastructure is separate.</p>
</blockquote>
<h1 id="heading-installation">Installation</h1>
<h2 id="heading-first-steps">First steps</h2>
<h3 id="heading-before-starting">Before starting</h3>
<p>Choose the AWS region you want to deploy into. If have not previously generated an AWS keypair in that region, go and do so now. You will need it in the steps below.</p>
<h3 id="heading-main-page">Main page</h3>
<p>In your web browser, navigate to https://aws.amazon.com/marketplace/pp/prodview-55wp45fowbovo. This provides the basic details of the offering, including version numbers and pricing. Click on the button labeled "Continue to Subscribe".</p>
<h3 id="heading-terms-and-conditions-page">Terms and Conditions page</h3>
<p>Review the terms and conditions and click "Continue to Configuration".</p>
<h3 id="heading-configure-this-software-page">Configure this software page</h3>
<p>Leave the Fulfillment option and Software version to their default settings. Select your desired AWS region from the dropdown. Click "Continue to Launch".</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673206543476/e49dacb3-ccc5-48ad-9af1-bf7cc0834142.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-launch-this-software-page">Launch this software page</h3>
<p>In the "Choose Action" dropdown select "Launch Cloudformation" and click "Launch</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673206673499/63df586e-10b8-4cf1-a3cf-2fce880f07c6.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-create-stack">Create Stack</h3>
<p>Now we've moved to the AWS Cloudformation stacks page, where we are creating a new stack using the template provided. On the next several pages we will configure the AWS environment that jambonz will be running in. All of the AWS infrastructure that will be created will be in a new VPC created by this stack, so it will not interfere with anything else you already have running in this region.</p>
<p>On this page, leave everything at the default selections, and simply click "Next".</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673206934296/aa1fa473-6a93-496c-a935-46dba9df42e9.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-specify-stack-details">Specify stack details</h3>
<p>Enter a name for the stack and then fill out the Parameters section:</p>
<ul>
<li><p><code>AllowedHttpCidr</code>: This is a network mask that can restrict what networks can access the jambonz portal. Typically, you may want to leave this open to the internet if you don't know where your admin users will be logging in from; if so, enter "0.0.0.0/0".</p>
</li>
<li><p><code>AllowedRtpCidr</code>: Ditto to the above, for the source of caller media streams (RTP). Again, in most cases you will want this to be "0.0.0.0/0".</p>
</li>
<li><p><code>AllowedSipCidr</code>: Ditto for restricting the sources of SIP signaling. As above, for most deployments this will be "0.0.0.0/0" but if you wanted to restrict it for instance to one of your internal SBCs you could enter the network mask for that.</p>
</li>
<li><p><code>AllowedSshCidr</code>: And finally, the same for where you want to allow ssh access into the server and, again, "0.0.0.0/0" means access from anywhere.</p>
</li>
<li><p><code>InstanceType</code>: Here you choose the instance type you want to run on. For production VoIP systems, AWS recommends the c5n instance class and a good choice for jambonz would be the c5n.xlarge, if they have that instance type in your region. This would handle about 15-20 arriving calls per second and 300-400 concurrent calls in progress. If you need something smaller, e.g. for testing, feel free to deploy a t2 or t3.medium. If you need something for large production loads, speak to us about building a jambonz horizontally-scalable cluster -- we have a cloudformation script for that as well, but it is not currently available on the AWS marketplace so you need to contact us (support@jambonz.org) and we will set you up.</p>
</li>
<li><p><code>KeyName</code>: Select one of the AWS keypairs that you have previously generated in this region.</p>
</li>
<li><p><code>URLPortal</code>: If you intend for the jambonz portal to be accessible from a browser using a DNS name (versus its IP address), then enter that name here. In my example, I will be adding 'jambonz.net' as the DNS. <em>(Note: you don't need to provision the DNS record for that name just yet, you will do so after the stack completes and you know the elastic IP of the EC2 instance created.)</em></p>
</li>
<li><p><code>VpcCidr</code>: The stack is going to create an AWS VPC that will contain everything that gets created -- things like subnets, internet gateways, the EC2 instance itself, etc. Here put a CIDR for the VPC itself. As an example, I typically use "10.0.0.0/16".</p>
<p>  Finally, click "Next".</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673207713417/3e617789-8eb9-495d-b2db-3bb3de53d910.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-configure-stack-options">Configure stack options</h3>
<p>Under the "Tags" section, its a good idea to add a "Name" tag with a meaningful name to identify and group all of the stuff we are going to create. This is optional, however.</p>
<p>Leave the rest of the fields at their default and click "Next" at the bottom righthand side of the page.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673208181516/9653d8a0-5280-45ee-b4d7-b460dbe1c377.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-review-stack">Review stack</h3>
<p>Leave everything as presented and click "Submit" on the bottom of the page.</p>
<blockquote>
<p>Note: if prompted to confirm the creation of an IAM role, select the checkbox to ok that. As of release 0.7.8 and later, the stack automatically configures jambonz to send logs to Cloudwatch, which is very useful for troubleshooting. In order to do so, the stack must create an IAM role for this.</p>
</blockquote>
<h3 id="heading-done">Done!</h3>
<p>Now we wait a bit while AWS deploys our EC2 instance and the associated stuff in our new VPC. This should take roughly 2-3 minutes.</p>
<p>When complete, click on the Outputs panel. This will tell you the elastic IP of the server, along with the initial password to use to log into the portal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673208986783/de873a53-1ec1-44dc-8bb1-1a666cc01d1e.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-dns-configuration">DNS configuration</h3>
<p>If you entered a DNS name in the stack parameters above, then now is the time to create DNS records for the jambonz portals. (Skip this step if you left that parameter blank).</p>
<p>Copy the ServerIP from the Outputs panel and create DNS A records for all of the following, each pointing to that IP:</p>
<ul>
<li><p>&lt;dns&gt;</p>
</li>
<li><p>api.&lt;dns&gt;</p>
</li>
<li><p>grafana.&lt;dns&gt;</p>
</li>
<li><p>homer.&lt;dns&gt;</p>
</li>
<li><p>jaeger.&lt;dns&gt;</p>
</li>
</ul>
<p>For instance, I use <a target="_blank" href="http://dnsmadeeasy.com">dnsmadeeasy</a> as my DNS provider, so I go into their portal to set up these DNS records.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673209456219/2d085b3a-b8cc-41d4-b401-1bcf0abb72da.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-accessing-the-portal">Accessing the portal</h3>
<p>Now you should be able to log into the portal. In your browser URL window you will either enter the DNS name that you specified (http://jambonz.net, in my case) or just the Server IP if you did not specify a DNS name.</p>
<blockquote>
<p>Note: once the cloudformation stack is complete, the EC2 instance still requires a bit more time to initialize and come up the first time. If you try to log into the portal and get a 502 Bad Gateway just give it a minute or two and try again.</p>
</blockquote>
<p>On the jambonz login screen use the username 'admin' and the first-time password that you see on the Output panel of the cloudformation window. You will be forced to set a new password. Once you set the new password you will be logged into the home page of the portal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673209815200/6f343e0e-988a-4b82-8d3d-0efd32738003.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673209840729/3a2e89f8-1f22-478e-9c87-e59cf85be2d0.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673209916841/39d30df0-648d-4b86-b5c2-36955307f801.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-additional-portals">Additional portals</h3>
<p>If you used a DNS name, the monitoring portal will be grafana.&lt;dns&gt; and you will log in with admin/admin. You will be forced to change the password on the initial login.</p>
<p>Navigate to Dashboards/Browse/jambonz/Jambonz Metrics to see the jambonz monitoring page. Right now, with no traffic there won't be much activity, but this will be a useful page to monitor your live system. It will tell you things like: how many calls are on the system, what is the latency / response time for things like text-to-speech generation and application webhook responses etc.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673210851091/7e09f8b3-7876-42ad-8977-e3a8daf7bfa2.png" alt class="image--center mx-auto" /></p>
<p>SIP traces are sent to homer, which can be accessed at homer.&lt;dns&gt;. The initial login is admin/sipcapture.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673210822985/961a7c8c-5e44-4772-a702-2755e4fe127a.png" alt class="image--center mx-auto" /></p>
<p>Application traces are sent to jaeger, which can be accessed at jaeger.&lt;dns&gt;. There is currently no authentication associated with this page. Application traces can be very useful to see exactly what happened on a specific call. In the trace below, for example, we can see the following:</p>
<ul>
<li><p>retrieving account details from mysql took 15 milliseconds</p>
</li>
<li><p>the http webhook to retrieve the jambonz app took 308 milliseconds</p>
</li>
<li><p>synthesizing speech from text took 1.4 seconds (note: tts audio is cached by jambonz, so further requests for this text/voice/language/provider will be near-instantaneous)</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673210651171/6bd7c967-0244-444a-939e-f91d20008ba9.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-modifying-portals-to-use-https">Modifying portals to use HTTPS</h2>
<p>We've been using plain HTTP to access the portals so far. That's perfectly fine, but we can also change to use HTTPS if we like. To do so, we will need to install a TLS certificate on the server, and modify the NGINX and jambonz configs slightly. Let's walk through that.</p>
<h3 id="heading-check-nginx-config">Check nginx config</h3>
<p>In jambonz releases prior to 0.8, it may be necessary to fix something in the nginx config file before performing these steps. ssh into the server and as root user open <code>/etc/nginx/sites-available/default</code> in an editor. If the top lines look like this:</p>
<pre><code class="lang-bash">  server {
      server_name _;
      location /api/ {
</code></pre>
<p>change them to replace the underscore with your dns name (jambonz.net in my case)</p>
<pre><code class="lang-bash">server { 
    server_name jambonz.net; 
    location /api/ {
</code></pre>
<h3 id="heading-installing-a-tls-certificate">Installing a TLS certificate</h3>
<p>You can use any certificate provider you want, but in this case I am going to use <a target="_blank" href="https://letsencrypt.org/">letsencrypt</a> because they are free and easy to use.</p>
<p>Following the instructions on their page, I go <a target="_blank" href="https://certbot.eff.org/instructions?ws=nginx&amp;os=debianbuster">here</a> to get my instructions. Following these instructions is pretty straightforward:</p>
<pre><code class="lang-bash">sudo -u root -s 
apt-get update 
apt install snapd 
snap install core 
snap install --classic certbot 
ln -s /snap/bin/certbot /usr/bin/certbot 
certbot --nginx
</code></pre>
<p>This will install a certificate and rewrite your nginx configuration file (the same one we looked at above) to support https. For instance, mine now looks like <a target="_blank" href="https://gist.github.com/davehorton/bd35eafe1e0ad6467e417630f020a533">this</a>.</p>
<h3 id="heading-modifying-jambonz-config">Modifying jambonz config</h3>
<p>We're not done quite yet. Now, ssh into the jambonz server as the admin user and go to the <code>~/apps/jambonz-webapp</code> folder.</p>
<p>Modify the .env file in this folder. <strong>If you are running a 0.7.x version</strong> change this:</p>
<pre><code class="lang-bash">REACT_APP_API_BASE_URL=http://jambonz.net/api/v1
</code></pre>
<p>to this</p>
<pre><code class="lang-bash">REACT_APP_API_BASE_URL=https://jambonz.net/api/v1
</code></pre>
<p>(Of course, your dns name will be different, but the point is simply to change this to an https url).</p>
<p><strong>If you are running a 0.8 version or above</strong> change this:</p>
<pre><code class="lang-bash">VITE_API_BASE_URL=http://jambonz.net/api/v1
</code></pre>
<p>to this</p>
<pre><code class="lang-bash">VITE_API_BASE_URL=https://jambonz.net/api/v1
</code></pre>
<p>Then rebuild and restart:</p>
<pre><code class="lang-bash">npm run build
npm restart jambonz-webapp
</code></pre>
<p>All set! If you log out of the portal, refresh your browser and log back in you should now see your connection is over a secure https connection.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>I hope this article has been useful in showing you how to quickly install and configure a jambonz system on AWS using the Marketplace offering. Please feel free to contact us by email at support@jambonz.org or by joining our community slack channel by going to https://joinslack.jambonz.org.</p>
]]></content:encoded></item><item><title><![CDATA[Supporting webrtc clients with jambonz]]></title><description><![CDATA[If you've done a standard install of jambonz using either the cloudformation template or the Kubernetes helm chart your jambonz system will be configured to receive SIP traffic from VoIP carriers and sip phones using UDP transport.
But what if you wa...]]></description><link>https://blog.jambonz.org/supporting-webrtc-clients-with-jambonz</link><guid isPermaLink="true">https://blog.jambonz.org/supporting-webrtc-clients-with-jambonz</guid><category><![CDATA[SIP]]></category><category><![CDATA[WebRTC]]></category><category><![CDATA[Communications Platform as a Service (CPaaS) Market ]]></category><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Thu, 05 Jan 2023 22:23:40 GMT</pubDate><content:encoded><![CDATA[<p>If you've done a standard install of jambonz using either the <a target="_blank" href="https://aws.amazon.com/marketplace/pp/prodview-55wp45fowbovo">cloudformation template</a> or the <a target="_blank" href="https://github.com/jambonz/helm-charts">Kubernetes helm chart</a> your jambonz system will be configured to receive SIP traffic from VoIP carriers and sip phones using UDP transport.</p>
<p>But what if you want to receive traffic from webrtc clients as well? No problem, this tutorial will show you the changes needed to make that happen.</p>
<blockquote>
<p>Note: the example that follows shows how to configure this on a jambonz system running on VMs, such as AWS EC2.</p>
</blockquote>
<h1 id="heading-overview">Overview</h1>
<p>Webrtc clients will be sending SIP over secure WebSockets (wss). Client-side libraries like <a target="_blank" href="https://jssip.net/">jsSip</a> (my favorite) and <a target="_blank" href="https://sipjs.com">SIP.js</a> are most often used to build the client webrtc application that runs in your browser. To support these types of clients, jambonz needs to support sip over wss for the SIP signaling, and <a target="_blank" href="https://www.rfc-editor.org/rfc/rfc3711">SRTP</a> for the encrypted media.</p>
<p>Note that insecure websockets (SIP over plain ws) is not allowed by the browser, so we are going to need to install a TLS certificate on the jambonz server. In the example below we'll configure our jambonz server to listen on port 8443/tcp for sip over wss traffic, and we'll create a TLS wildcard certificate for <code>*.sip.jambonz.me</code> since that is a domain that we own. (If you are working along, you should similarly choose a domain of your own that you control the DNS for).</p>
<h1 id="heading-generating-a-tls-certificate-for-sip-traffic">Generating a TLS certificate for SIP traffic</h1>
<p>We're going to use <a target="_blank" href="https://letsencrypt.org/">letsencrypt</a> to generate our certificate because it's free (and easy!).</p>
<p>After installing the certbot program on the jambonz debian server by <a target="_blank" href="https://certbot.eff.org/instructions?ws=other&amp;os=debianbuster">following their instructions</a>, we run the following command:</p>
<pre><code class="lang-bash">certbot certonly --manual --preferred-challenges=dns --email daveh@drachtio.org --server https://acme-v02.api.letsencrypt.org/directory --agree-tos -d *.sip.jambonz.me -d sip.jambonz.me
</code></pre>
<p>This gets us a TLS cert with the CN of <code>*.sip.jambonz.me</code> and Subject Alternative Names of both <code>sip.jambonz.me</code> and <code>*.sip.jambonz.me</code>. I like this because it gives me the option of assigning different jambonz accounts different SIP realm values to register against (e.g. one jambonz user joe can register phones under the realm <code>joe.sip.jambonz.me</code> while jane with a different jambonz account registers her devices under <code>jane.sip.jambonz.me</code>).</p>
<p>I'm using the DNS challenge method to verify I control those domains, so letsencrypt will prompt me to add some TXT records in my DNS provider. Once I've done that the TLS cert is generated to the server:</p>
<pre><code class="lang-bash">Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/sip.jambonz.me/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/sip.jambonz.me/privkey.pem
</code></pre>
<h1 id="heading-configuring-drachtio">Configuring drachtio</h1>
<p>Now that we have our TLS certificate, we need to configure drachtio to use it. This is a simple matter of adding the tls info to the /etc/drachtio.conf.xml config file. When done, it will look like this:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">sip</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">contacts</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">contacts</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">tls</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">key-file</span>&gt;</span>/etc/letsencrypt/live/sip.jambonz.me/privkey.pem<span class="hljs-tag">&lt;/<span class="hljs-name">key-file</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">cert-file</span>&gt;</span>/etc/letsencrypt/live/sip.jambonz.me/fullchain.pem<span class="hljs-tag">&lt;/<span class="hljs-name">cert-file</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">tls</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">udp-mtu</span>&gt;</span>4096<span class="hljs-tag">&lt;/<span class="hljs-name">udp-mtu</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">sip</span>&gt;</span>
</code></pre>
<p>Now, we need to tell drachtio to listen on port 8443 for sip traffic over wss. To do that we edit /etc/systemd/system/drachtio.service to add this new sip contact. When finished, that section of the file looks like this:</p>
<pre><code class="lang-bash">ExecStart=/usr/<span class="hljs-built_in">local</span>/bin/drachtio --daemon \
--contact sip:<span class="hljs-variable">${LOCAL_IP}</span>;transport=udp --external-ip <span class="hljs-variable">${PUBLIC_IP}</span> \
--contact sips:<span class="hljs-variable">${LOCAL_IP}</span>:8443;transport=wss --external-ip <span class="hljs-variable">${PUBLIC_IP}</span> \
--contact sip:<span class="hljs-variable">${LOCAL_IP}</span>;transport=tcp \
 --address 0.0.0.0 --port 9022 --homer 127.0.0.1:9060 --homer-id 10
</code></pre>
<p>We then restart drachtio..</p>
<pre><code class="lang-bash">systemctl daemon-reload
systemctl restart drachtio
</code></pre>
<p>we can verify that drachtio is now listening on tcp port 8443 by looking at the /var/log/drachtio.log file after we restart it:</p>
<pre><code class="lang-bash">2023-01-05 20:53:30.794776 SipTransport::logTransports - there are : 3 transports
2023-01-05 20:53:30.794794 SipTransport::logTransports - tcp/10.0.188.191:5060 (sip:10.0.188.191;transport=tcp, external-ip: , local-net: 10.0.0.0/8)
2023-01-05 20:53:30.794802 SipTransport::logTransports - wss/10.0.188.191:8443 (sips:10.0.188.191:8443;transport=wss, external-ip: 35.176.86.236, local-net: )
2023-01-05 20:53:30.794812 SipTransport::logTransports - udp/10.0.188.191:5060 (sip:10.0.188.191;transport=udp, external-ip: 35.176.86.236, local-net: ), mtu size: 4096
</code></pre>
<p>Of course, we also need to make sure network traffic is being allowed into 8443/tcp. On EC2, we do that by reviewing and if necessary editing the service group for the instance.</p>
<h1 id="heading-configuring-jambonz">Configuring jambonz</h1>
<p>We're not quite done yet. If we were to point a webrtc client at the server now and try to register, we would receive a 403 Forbidden back because we have not yet configured our authorization. We need to authenticate clients based on a sip username and password.</p>
<p>If you are not familiar with how jambonz handles SIP authentication, here is a video describing it.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://youtu.be/m0EvTFqTZXU">https://youtu.be/m0EvTFqTZXU</a></div>
<p> </p>
<p>Once you have created your webhook application for authentication, specify it in the account section in the jambonz portal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1672953176491/ec5c136e-d235-4a1a-ac65-d4335ceba307.png" alt class="image--center mx-auto" /></p>
<p>You will also need to specify the sip realm that you want devices owned by that jambonz account to use. In my example, I've chosen a subdomain named <code>daveh.sip.jambonz.me</code> to be used.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1672953227928/c6bb3bd4-fa61-4bb4-acf8-05b17173af4d.png" alt class="image--center mx-auto" /></p>
<p>Now you need to point your webrtc client at the jambonz server, and configure it with sip username, password, and realm that match the information your webhook is using. In my webrtc client that I've built using jssip, that client-side config looks like this:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">let</span> data = {
    <span class="hljs-attr">name</span>: <span class="hljs-string">'daveh'</span>,
    <span class="hljs-attr">sipUri</span>: <span class="hljs-string">'sip:daveh@sip.jambonz.me'</span>,
    <span class="hljs-attr">sipPassword</span>: <span class="hljs-string">'foobar'</span>,
    <span class="hljs-attr">wsUri</span>: <span class="hljs-string">'wss://sip.jambonz.me:8443'</span>,
    <span class="hljs-attr">pcConfig</span>: {
        <span class="hljs-attr">iceServers</span>: [
            {
                <span class="hljs-attr">urls</span>: [ <span class="hljs-string">'stun:stun.l.google.com:19302'</span> ]
            }
        ]
    },
    <span class="hljs-attr">initiallyMinimized</span>: <span class="hljs-literal">false</span>
};
</code></pre>
<p>Bingo! My webrtc client now registers successfully.</p>
<p>One last thing - lets make a test call. I'm going to route my incoming calls from sip and webrtc devices to the good old "hello world" application.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1672953454877/c35a265c-bab7-4952-abf9-c077539e60a1.png" alt class="image--center mx-auto" /></p>
<p>I dial any number from my webrtc client and....it works!</p>
<h1 id="heading-so-long-for-now">So long for now..</h1>
<p>Thanks again for trying out jambonz! Feel free to learn more at <a target="_blank" href="https://jambonz.org">jambonz.org</a>, or join our slack channel by going to <a target="_blank" href="https://joinslack.jambonz.org">joinslack.jambonz.org</a>, or email me at daveh@jambonz.org.</p>
]]></content:encoded></item><item><title><![CDATA[Building your first jambonz app using Node.js]]></title><description><![CDATA[jambonz is the open-source CPaaS for service providers and developers alike that is no more difficult to install than a webserver.  It is a BYOE (bring your own everything) platform, which means that you bring your own carrier trunks and speech APIs ...]]></description><link>https://blog.jambonz.org/building-your-first-jambonz-app-using-nodejs</link><guid isPermaLink="true">https://blog.jambonz.org/building-your-first-jambonz-app-using-nodejs</guid><category><![CDATA[Voice]]></category><category><![CDATA[messaging]]></category><category><![CDATA[APIs]]></category><category><![CDATA[PaaS]]></category><category><![CDATA[WebRTC]]></category><dc:creator><![CDATA[Dave Horton]]></dc:creator><pubDate>Thu, 23 Sep 2021 13:59:57 GMT</pubDate><content:encoded><![CDATA[<p><a target="_blank" href="https://jambonz.org">jambonz</a> is the open-source CPaaS for service providers and developers alike that is no more difficult to install than a webserver.  It is a BYOE (bring your own everything) platform, which means that you bring your own carrier trunks and speech APIs (Google and AWS cloud speech both supported).</p>
<blockquote>
<p>Haven't got a carrier, but want to try out jambonz?  No problem, visit <a target="_blank" href="https://www.telecomsxchange.com/jambonz">TelecomsXchange</a> to create a free account and gain access to their worldwide network of hundreds of voice and SMS carriers!</p>
</blockquote>
<p>Your options for deploying a jambonz service include:</p>
<ul>
<li>downloading and installing for free on your own infrastructure (jambonz is published under the <a target="_blank" href="https://opensource.org/licenses/MIT">MIT open source license</a>), </li>
<li>deploying in your AWS account using a <a target="_blank" href="https://aws.amazon.com/marketplace/pp/prodview-55wp45fowbovo">cloudformation script</a>, or</li>
<li>creating a free account on our <a target="_blank" href="https://jambonz.us/register">hosted platform</a>.</li>
</ul>
<blockquote>
<p>If you are just getting started and want to try out jambonz for the first time, we recommend <a target="_blank" href="https://jambonz.us/register">creating an account on the hosted platform</a>, since it is simple, free, and gets you up and running instantly.</p>
</blockquote>
<p>Node.js is the preferred application environment for creating and running jambonz applications.  And whipping up a jambonz app couldn't be easier using the <a target="_blank" href="https://nodejs.dev/learn/the-npx-nodejs-package-runner">npx</a> and the <code>create-jambonz-app</code> utility.  Let's run it with the <code>-h</code> option to see what it can do for us:</p>
<pre><code class="lang-bash"> $ npx create-jambonz-app -h
Usage: create-jambonz-app [options] project-name

Options:
  -v, --version              display the current version
  -s, --scenario &lt;scenario&gt;  generates sample webhooks <span class="hljs-keyword">for</span> specified scenarios, default is dial and tts (default: <span class="hljs-string">"tts, dial"</span>)
  -h, --<span class="hljs-built_in">help</span>                 display <span class="hljs-built_in">help</span> <span class="hljs-keyword">for</span> <span class="hljs-built_in">command</span>


Scenarios available:
- tts: answer call and play greeting using tts,
- dial: use the dial verb to outdial through your carrier,
- record: record the audio stream generated by the listen verb,
- auth: authenticate sip devices, or
- all: generate all of the above scenarios

Example:
  $ npx create-jambonz-app my-app
</code></pre>
<p>You can see that it will scaffold out an <a target="_blank" href="https://expressjs.com/">express</a>-based jambonz  <a target="_blank" href="https://www.jambonz.org/docs/webhooks/overview/">webhook</a> application that implements one or more scenarios.</p>
<p>Let's dive right in and create a simple app that answers a call and plays a greeting using text-to-speech:</p>
<pre><code>npx <span class="hljs-keyword">create</span>-jambonz-app -s tts my-app

Creating a <span class="hljs-keyword">new</span> jambonz app <span class="hljs-keyword">in</span> /<span class="hljs-keyword">Users</span>/dhorton/tmp/my-app

Installing packages...
</code></pre><p>Done!  Now let's see what we have:</p>
<pre><code><span class="hljs-string">$</span> <span class="hljs-string">cd</span> <span class="hljs-string">my-app/</span>
<span class="hljs-string">$</span> <span class="hljs-string">ls</span> <span class="hljs-string">-lrt</span>
<span class="hljs-string">total</span> <span class="hljs-number">432</span>
<span class="hljs-string">-rw-r--r--</span>    <span class="hljs-number">1</span> <span class="hljs-string">dhorton</span>  <span class="hljs-string">staff</span>     <span class="hljs-number">567</span> <span class="hljs-string">Sep</span> <span class="hljs-number">23</span> <span class="hljs-number">07</span><span class="hljs-string">:40</span> <span class="hljs-string">README.md</span>
<span class="hljs-string">-rw-r--r--</span>    <span class="hljs-number">1</span> <span class="hljs-string">dhorton</span>  <span class="hljs-string">staff</span>    <span class="hljs-number">1616 </span><span class="hljs-string">Sep</span> <span class="hljs-number">23</span> <span class="hljs-number">07</span><span class="hljs-string">:40</span> <span class="hljs-string">app.js</span>
<span class="hljs-string">drwxr-xr-x</span>    <span class="hljs-number">2</span> <span class="hljs-string">dhorton</span>  <span class="hljs-string">staff</span>      <span class="hljs-number">64</span> <span class="hljs-string">Sep</span> <span class="hljs-number">23</span> <span class="hljs-number">07</span><span class="hljs-string">:40</span> <span class="hljs-string">data</span>
<span class="hljs-string">-rw-r--r--</span>    <span class="hljs-number">1</span> <span class="hljs-string">dhorton</span>  <span class="hljs-string">staff</span>     <span class="hljs-number">491</span> <span class="hljs-string">Sep</span> <span class="hljs-number">23</span> <span class="hljs-number">07</span><span class="hljs-string">:40</span> <span class="hljs-string">ecosystem.config.js</span>
<span class="hljs-string">drwxr-xr-x</span>    <span class="hljs-number">3</span> <span class="hljs-string">dhorton</span>  <span class="hljs-string">staff</span>      <span class="hljs-number">96</span> <span class="hljs-string">Sep</span> <span class="hljs-number">23</span> <span class="hljs-number">07</span><span class="hljs-string">:40</span> <span class="hljs-string">lib</span>
<span class="hljs-string">drwxr-xr-x</span>  <span class="hljs-number">239</span> <span class="hljs-string">dhorton</span>  <span class="hljs-string">staff</span>    <span class="hljs-number">7648 </span><span class="hljs-string">Sep</span> <span class="hljs-number">23</span> <span class="hljs-number">07</span><span class="hljs-string">:40</span> <span class="hljs-string">node_modules</span>
<span class="hljs-string">-rw-r--r--</span>    <span class="hljs-number">1</span> <span class="hljs-string">dhorton</span>  <span class="hljs-string">staff</span>  <span class="hljs-number">203525</span> <span class="hljs-string">Sep</span> <span class="hljs-number">23</span> <span class="hljs-number">07</span><span class="hljs-string">:40</span> <span class="hljs-string">package-lock.json</span>
<span class="hljs-string">-rw-r--r--</span>    <span class="hljs-number">1</span> <span class="hljs-string">dhorton</span>  <span class="hljs-string">staff</span>     <span class="hljs-number">489</span> <span class="hljs-string">Sep</span> <span class="hljs-number">23</span> <span class="hljs-number">07</span><span class="hljs-string">:40</span> <span class="hljs-string">package.json</span>
</code></pre><p>A <a target="_blank" href="https://pm2.keymetrics.io/docs/usage/quick-start/">pm2</a> config file is generated to hold some environment variables that you must define -- things such as your jambonz account sid, the URL of your jambonz API server, etc:</p>
<pre><code class="lang-bash">$ cat ecosystem.config.js
module.exports = {
  apps : [{
    name: <span class="hljs-string">'my-app'</span>,
    script: <span class="hljs-string">'app.js'</span>,
    instance_var: <span class="hljs-string">'INSTANCE_ID'</span>,
    exec_mode: <span class="hljs-string">'fork'</span>,
    instances: 1,
    autorestart: <span class="hljs-literal">true</span>,
    watch: <span class="hljs-literal">false</span>,
    max_memory_restart: <span class="hljs-string">'1G'</span>,
    env: {
      NODE_ENV: <span class="hljs-string">'production'</span>,
      LOGLEVEL: <span class="hljs-string">'info'</span>,
      HTTP_PORT: 3000,
      JAMBONZ_ACCOUNT_SID: <span class="hljs-string">''</span>,
      JAMBONZ_API_KEY: <span class="hljs-string">''</span>,
      JAMBONZ_REST_API_BASE_URL: <span class="hljs-string">''</span>,
      WEBHOOK_SECRET: <span class="hljs-string">''</span>,
      HTTP_PASSWORD: <span class="hljs-string">''</span>,
      HTTP_USERNAME: <span class="hljs-string">''</span>,
    }
  }]
};
</code></pre>
<blockquote>
<p>Note: use of pm2 is course optional, you can alternatively specify your environment variables in a .env file or on the command line if you prefer.</p>
</blockquote>
<p>The final two environment variables above (<code>HTTP_PASSWORD</code> and <code>HTTP_USERNAME</code>) are optional - you only need to suply them if you are using http basic auth to secure your webhook endpoint.  The others are required, and you can find your account_sid, api_key, and webhook secret in the jambonz portal.  If you are using the <a target="_blank" href="https://jambonz.us">jambonz.us</a> hosted platform the <code>JAMBONZ_REST_API_BASE_URL</code> variable should be <code>https://api.jambonz.us/v1</code>, otherwise set to the appropriate URL for your own server. </p>
<p>Update the <code>ecosystem.config.js</code> file with these values where indicated.</p>
<p>Let's have a look at the code now.  The <code>app.js</code> file is boilerplate stuff that you probably won't have to change, but let's have a look at it to understand what it does.</p>
<pre><code class="lang-js">$ cat app.js
<span class="hljs-keyword">const</span> assert = <span class="hljs-built_in">require</span>(<span class="hljs-string">'assert'</span>);
assert.ok(process.env.JAMBONZ_ACCOUNT_SID, <span class="hljs-string">'You must define the JAMBONZ_ACCOUNT_SID env variable'</span>);
assert.ok(process.env.JAMBONZ_API_KEY, <span class="hljs-string">'You must define the JAMBONZ_API_KEY env variable'</span>);
assert.ok(process.env.JAMBONZ_REST_API_BASE_URL, <span class="hljs-string">'You must define the JAMBONZ_REST_API_BASE_URL env variable'</span>);

<span class="hljs-keyword">const</span> express = <span class="hljs-built_in">require</span>(<span class="hljs-string">'express'</span>);
<span class="hljs-keyword">const</span> app = express();
<span class="hljs-keyword">const</span> {WebhookResponse} = <span class="hljs-built_in">require</span>(<span class="hljs-string">'@jambonz/node-client'</span>);
<span class="hljs-keyword">const</span> basicAuth = <span class="hljs-built_in">require</span>(<span class="hljs-string">'express-basic-auth'</span>);
<span class="hljs-keyword">const</span> opts = <span class="hljs-built_in">Object</span>.assign({
  <span class="hljs-attr">timestamp</span>: <span class="hljs-function">() =&gt;</span> <span class="hljs-string">`, "time": "<span class="hljs-subst">${<span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>().toISOString()}</span>"`</span>,
  <span class="hljs-attr">level</span>: process.env.LOGLEVEL || <span class="hljs-string">'info'</span>
});
<span class="hljs-keyword">const</span> logger = <span class="hljs-built_in">require</span>(<span class="hljs-string">'pino'</span>)(opts);
<span class="hljs-keyword">const</span> port = process.env.HTTP_PORT || <span class="hljs-number">3000</span>;
<span class="hljs-keyword">const</span> routes = <span class="hljs-built_in">require</span>(<span class="hljs-string">'./lib/routes'</span>);
app.locals = {
  ...app.locals,
  logger,
  <span class="hljs-attr">client</span>: <span class="hljs-built_in">require</span>(<span class="hljs-string">'@jambonz/node-client'</span>)(process.env.JAMBONZ_ACCOUNT_SID, process.env.JAMBONZ_API_KEY, {
    <span class="hljs-attr">baseUrl</span>: process.env.JAMBONZ_REST_API_BASE_URL
  })
};

<span class="hljs-keyword">if</span> (process.env.HTTP_USERNAME &amp;&amp; process.env.HTTP_PASSWORD) {
  <span class="hljs-keyword">const</span> users = {};
  users[process.env.HTTP_USERNAME] = process.env.HTTP_PASSWORD;
  app.use(basicAuth({users}));
}
app.use(express.urlencoded({ <span class="hljs-attr">extended</span>: <span class="hljs-literal">true</span> }));
app.use(express.json());
<span class="hljs-keyword">if</span> (process.env.WEBHOOK_SECRET) {
  app.use(WebhookResponse.verifyJambonzSignature(process.env.WEBHOOK_SECRET));
}
app.use(<span class="hljs-string">'/'</span>, routes);
app.use(<span class="hljs-function">(<span class="hljs-params">err, req, res, next</span>) =&gt;</span> {
  logger.error(err, <span class="hljs-string">'burped error'</span>);
  res.status(err.status || <span class="hljs-number">500</span>).json({<span class="hljs-attr">msg</span>: err.message});
});

<span class="hljs-keyword">const</span> server = app.listen(port, <span class="hljs-function">() =&gt;</span> {
  logger.info(<span class="hljs-string">`Example jambonz app listening at http://localhost:<span class="hljs-subst">${port}</span>`</span>);
});
</code></pre>
<p>Pretty straightforward stuff, right?  It's creating an express app, using middleware to validate the signature of incoming webhook requests to be sure they came from your account, enforcing http basic auth if you've configured it, and then invoking your the http endpoints you are exposing.</p>
<p>Oh, and its also including the <code>@jambonz/node-client</code> <a target="_blank" href="https://www.npmjs.com/package/@jambonz/node-client">npm package</a>.  This little beauty will make it easy to respond to webhooks and to use the <a target="_blank" href="https://api.jambonz.org">jambonz REST api</a>.</p>
<p>Let's look at the code generated for the http endpoints next.  In this case, we asked it to generate a simple app to use tts to play a greeting.  We'll actually need two endpoints: one to respond to the webhook request for an application, and one to handle call status events.  Both of those are generated under <code>lib/routes/endpoints</code> as you can see:</p>
<pre><code class="lang-bash">$ ls -lrt lib/routes/endpoints/
total 24
-rw-r--r--  1 dhorton  staff  218 Sep 23 07:40 call-status.js
-rw-r--r--  1 dhorton  staff  183 Sep 23 07:40 index.js
-rw-r--r--  1 dhorton  staff  803 Sep 23 07:40 tts-hello-world.js

MBP-daveh:my-app dhorton$ cat lib/routes/endpoints/index.js
const router = require(<span class="hljs-string">'express'</span>).Router();

router.use(<span class="hljs-string">'/call-status'</span>, require(<span class="hljs-string">'./call-status'</span>));
router.use(<span class="hljs-string">'/hello-world'</span>, require(<span class="hljs-string">'./tts-hello-world'</span>));

module.exports = router;
</code></pre>
<p>Finally, let's have a look at the code generated to respond to the application webhook!</p>
<pre><code class="lang-js">$ cat lib/routes/endpoints/tts-hello-world.js
<span class="hljs-keyword">const</span> router = <span class="hljs-built_in">require</span>(<span class="hljs-string">'express'</span>).Router();
<span class="hljs-keyword">const</span> WebhookResponse = <span class="hljs-built_in">require</span>(<span class="hljs-string">'@jambonz/node-client'</span>).WebhookResponse;
<span class="hljs-keyword">const</span> text = <span class="hljs-string">`&lt;speak&gt;
&lt;prosody volume="loud"&gt;Hi there,&lt;/prosody&gt; and welcome to jambones!
jambones is the &lt;sub alias="seapass"&gt;CPaaS&lt;/sub&gt; designed with the needs
of communication service providers in mind.
This is an example of simple text-to-speech, but there is so much more you can do.
Try us out!
&lt;/speak&gt;`</span>;

router.post(<span class="hljs-string">'/'</span>, <span class="hljs-function">(<span class="hljs-params">req, res</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> {logger} = req.app.locals;
  logger.debug({<span class="hljs-attr">payload</span>: req.body}, <span class="hljs-string">'POST /hello-world'</span>);
  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">const</span> app = <span class="hljs-keyword">new</span> WebhookResponse();
    app
      .pause({<span class="hljs-attr">length</span>: <span class="hljs-number">1.5</span>})
      .say({text});
    res.status(<span class="hljs-number">200</span>).json(app);
  } <span class="hljs-keyword">catch</span> (err) {
    logger.error({err}, <span class="hljs-string">'Error'</span>);
    res.sendStatus(<span class="hljs-number">503</span>);
  }
});

<span class="hljs-built_in">module</span>.exports = router;
</code></pre>
<p>The key here is the WebhookResponse instance we create and then call chained methods on, each corresponding to a <a target="_blank" href="https://www.jambonz.org/docs/webhooks/overview/">jambonz webhook verb</a>:</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> WebhookResponse = <span class="hljs-built_in">require</span>(<span class="hljs-string">'@jambonz/node-client'</span>).WebhookResponse;

.. then ..

const app = <span class="hljs-keyword">new</span> WebhookResponse();
app
  .pause({<span class="hljs-attr">length</span>: <span class="hljs-number">1.5</span>})
  .say({text});
</code></pre>
<p>When we've added all our verbs, we simply return it a json payload in our http response:</p>
<pre><code class="lang-js">res.status(<span class="hljs-number">200</span>).json(app);
</code></pre>
<p>Let's run this puppy!</p>
<p>When testing on my laptop, I like to use <a target="_blank" href="https://ngrok.io">ngrok</a> to provide an externally-reachable URL for the app running behind my firewall:</p>
<pre><code class="lang-bash">$ ngrok http -region=us -hostname=jambonz-apps.drachtio.org 3000
</code></pre>
<blockquote>
<p>Note: I am using a custom domain here, which requires a paid plan on ngrok.</p>
</blockquote>
<p>Next, I just start up my app:</p>
<pre><code class="lang-bash">$ pm2 start ecosystem.config.js
[PM2][WARN] Applications my-app not running, starting...
[PM2] App [my-app] launched (1 instances)
┌─────┬───────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id  │ name      │ namespace   │ version │ mode    │ pid      │ uptime │ ↺    │ status    │ cpu      │ mem      │ user     │ watching │
├─────┼───────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤
│ 0   │ my-app    │ default     │ 0.0.1   │ fork    │ 25162    │ 0s     │ 0    │ online    │ 0%       │ 9.7mb    │ dhorton  │ disabled │
└─────┴───────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘
</code></pre>
<p>pm2 log shows me that the app has is listening on the configured port for webhooks</p>
<pre><code class="lang-bash">$ pm2 <span class="hljs-built_in">log</span>

0|my-app   | {<span class="hljs-string">"level"</span>:30, <span class="hljs-string">"time"</span>: <span class="hljs-string">"2021-09-23T12:53:11.336Z"</span>,<span class="hljs-string">"pid"</span>:25162,
<span class="hljs-string">"hostname"</span>:<span class="hljs-string">"MBP-daveh.local"</span>,
<span class="hljs-string">"msg"</span>:<span class="hljs-string">"Example jambonz app listening at http://localhost:3000"</span>}
</code></pre>
<p>Now, bounce over to your jambonz portal and create a new application with this webhook URL and path..</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1632401983179/uYoa5mn8O.png" alt="my-new-app.png" /></p>
<p>And assign a phone number to route to this app..</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1632402078576/yeVeVQ0_P.png" alt="assifgn-phone-number.png" /></p>
<p>Bam!  Done!  </p>
<p>Incoming calls on this phone number now route to this webhook, which returns a simple app that plays a text-to-speech greeting using my preferred speech vendor and voice.</p>
<p>This is has been a basic (and hopefully painless!) introduction to building apps using jambonz and the Node.js SDK.  If you prefer to learn by watching, <a target="_blank" href="https://youtu.be/42jcqyvCstU">here is a video</a> covering much of the same material but going into a bit more detail, including showing you how to use Twilio as your carrier.  </p>
<p>To create your own account for testing, just head over to <a target="_blank" href="https://jambonz.us/register">jambonz.us</a>, or watch <a target="_blank" href="https://youtu.be/BVOwpxIKOso">this video</a> showing how to get started.</p>
<p>There's much more you can do -- as a next step, consider using the  <code>--scenario</code> option to scaffold an app that does a <a target="_blank" href="https://www.jambonz.org/docs/webhooks/dial/">dial</a> or a record (using the <a target="_blank" href="https://www.jambonz.org/docs/webhooks/listen/">listen</a> verb) operation.</p>
<p>In this article, we explored creating webhook applications but did not use the <a target="_blank" href="https://api.jambonz.org">REST api</a>.  We will cover that in a later article, but for a sneak peak at an example application that uses both webhooks and the REST api, have a look at the attended transfer application <a target="_blank" href="https://github.com/jambonz/jambonz-node-example-app/blob/main/lib/routes/endpoints/dial/attended-transfer.js">here</a> (and video <a target="_blank" href="https://youtu.be/IwrPmJdQYXE">here</a>), where we respond to dtmf events during a call to use the REST api to perform <a target="_blank" href="https://api.jambonz.org/#9c80ca99-4036-4a47-8823-4609e3fd4788">Live Call Control</a>.</p>
<p>Any questions, feel free to email us at <a href="mailto:support@jambonz.org">support@jambonz.org</a> or <a target="_blank" href="https://joinslack.jambonz.org">join our slack channel</a> to ping us in real-time!</p>
]]></content:encoded></item></channel></rss>