智能呼叫系统方案一(mrcp使用)

背景

freeswitch在做智能呼叫系统时，如何拿到用户的语音流是个难点，大体上会有三种方案：

freeswitch使用mrcp协议和unimrcp交互, unimrcp来集成对接asr/tts,esl还可用。
freeswitch使用media_bug把语音流旁路到业务服务,再把语音流发送到asr。大部分公司使用的是这种方案, 只是具体的实现方式不太一样, esl还可用。
把freeswitch作为ua, 业务服务来实现uac(外呼)或者uas(呼入),这种需要自己实现sip交互, esl无法使用。这种更灵活，缺点是有点复杂，freeswitch的一些特性功能就没法用了。

智能呼叫系统一般都具有:打断(用户打断机器人或者机器人打断用户),按键,转人工等功能。

本章介绍第一种方案, 使用esl方式查看这些功能的具体效果。freeswitch的官方文档

freeswitch版本为:

Version 1.10.7-release+git~~20211024T163933Z~~883d2cb662~64bit

测试方法

freeswitch 呼入配置

1
2
3
4
5
6
7


conf/dialplan/default.xml
<extension name="88990">
    <condition field="destination_number" expression="^88990$">
       <action application="answer"/>
       <action application="park"/>
    </condition>   
</extension>

添加自己的mrcp服务conf/mrcp_profiles/sbc-asr.xml,sbc-tts.xml
添加grammar文件grammar/ali.gram

1
2
3
4
5
6
7
8


cat ali.gram 
#JSGF V1.0;
grammar example;
public <main> = [ <pre> ] ( <weather> {WEATHER} | <sports>  {SPORTS} | <stocks> {STOCKS} ) ;
<pre> = ( I would like [ to hear ] ) | ( hear ) | ( [ please ] get [ me ] ) | ( look up );
<weather> = [ the ] weather;
<sports> = sports [ news ];
<stocks> = ( [ a ] stock ( quote | quotes ) ) | stocks;

使用esl方式连接freeswitch。
软电话拨打88990。

detect_speech

用来识别用户的语音流, 并返回识别结果。使用方法主要有:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


detect_speech <mod_name> <gram_name> <gram_path> [<addr>]
detect_speech grammar <gram_name> [<path>]
detect_speech grammaron <gram_name>
detect_speech grammaroff <gram_name>
detect_speech grammarsalloff
detect_speech nogrammar <gram_name>
detect_speech param <name> <value>
detect_speech pause
detect_speech resume
detect_speech start_input_timers
detect_speech stop

esl中的调用命令为detect_speech uuid unimrcp:sbc-asr {recognition-timeout=1000}ali default,uuid根据实际情况填写。

可以看到相关的日志：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


EXECUTE [depth=1] sofia/internal/1000@172.16.4.111 detect_speech(unimrcp:sbc-asr {recognition-timeout=1000}ali default )
2025-10-31 16:50:51.556688 59.43% [INFO] mod_unimrcp.c:3128 asr_handle: name = unimrcp, codec = (null), rate = 8000, grammar = (null), param = sbc-asr
2025-10-31 16:50:51.556688 59.43% [INFO] mod_unimrcp.c:3130 codec = L16, rate = 8000, dest = (null)
2025-10-31 16:50:51.556688 59.43% [NOTICE] mrcp_application.c:96 (ASR-26) Create MRCP Handle 0x7fac94029610 [sbc-asr]
2025-10-31 16:50:51.556688 59.43% [INFO] mrcp_client_session.c:133 (ASR-26) Create Channel ASR-26 <new>
2025-10-31 16:50:51.556688 59.43% [INFO] mrcp_client_session.c:387 (ASR-26) Receive App Request ASR-26 <new> [2]
2025-10-31 16:50:51.556688 59.43% [INFO] mrcp_client.c:700 (ASR-26) Add MRCP Handle ASR-26 <new>
2025-10-31 16:50:51.556688 59.43% [NOTICE] mrcp_client_session.c:719 (ASR-26) Add Control Channel ASR-26 <new@speechrecog>
2025-10-31 16:50:51.556688 59.43% [INFO] mrcp_client_session.c:411 (ASR-26) Send Offer ASR-26 <new> [c:1 a:1 v:0] to 172.16.7.240:8060
2025-10-31 16:50:51.556688 59.43% [INFO] mrcp_sofiasip_client_agent.c:354 (ASR-26) Local SDP ASR-26 <new>
v=0
o=FreeSWITCH 0 0 IN IP4 172.16.4.111
s=-
c=IN IP4 172.16.4.111
t=0 0
m=application 9 TCP/MRCPv2 1
a=setup:active
a=connection:new
a=resource:speechrecog
a=cmid:1
m=audio 4004 RTP/AVP 0 8 96
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:96 L16/8000
a=sendonly
a=ptime:20
a=mid:1

因为我在测试时asr有点问题，没有返回结果，正常情况下,读DETECTED_SPEECH事件，就可以看到识别结果。

detect_speech没法打断speak在播放的tts语音。

speak

调用mrcp把文本合成语音,并播发给用户。 esl的命令为:

1
2
3


uuid_setvar uuid tts_engine unimrcp:sbc-tts (设置unimrcp)
uuid_setvar uuid tts_voice EN (设置语言,非必须)
speak uuid "welcome to unimrcp, thank you" (合成语音)

看到的日志为：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


EXECUTE [depth=1] sofia/internal/1000@172.16.4.111 speak(welcome to unimrcp, thank you)
2025-10-31 17:07:22.516557 59.87% [INFO] mod_unimrcp.c:1625 speech_handle: name = unimrcp, rate = 8000, speed = 0, samples = 160, voice = , engine = unimrcp, param = sbc-tts
2025-10-31 17:07:22.516557 59.87% [INFO] mod_unimrcp.c:1628 voice = EN, rate = 8000
2025-10-31 17:07:22.516557 59.87% [NOTICE] mrcp_application.c:96 (TTS-27) Create MRCP Handle 0x7facec054130 [sbc-tts]
2025-10-31 17:07:22.516557 59.87% [INFO] mrcp_client_session.c:133 (TTS-27) Create Channel TTS-27 <new>
2025-10-31 17:07:22.516557 59.87% [INFO] mrcp_client_session.c:387 (TTS-27) Receive App Request TTS-27 <new> [2]
2025-10-31 17:07:22.516557 59.87% [INFO] mrcp_client.c:700 (TTS-27) Add MRCP Handle TTS-27 <new>
2025-10-31 17:07:22.516557 59.87% [NOTICE] mrcp_client_session.c:719 (TTS-27) Add Control Channel TTS-27 <new@speechsynth>
2025-10-31 17:07:22.536560 59.87% [INFO] mrcp_client_session.c:411 (TTS-27) Send Offer TTS-27 <new> [c:1 a:1 v:0] to 172.16.7.240:8060
2025-10-31 17:07:22.536560 59.87% [INFO] mrcp_sofiasip_client_agent.c:354 (TTS-27) Local SDP TTS-27 <new>
v=0
o=FreeSWITCH 0 0 IN IP4 172.16.4.111
s=-
c=IN IP4 172.16.4.111
t=0 0
m=application 9 TCP/MRCPv2 1
a=setup:active
a=connection:new
a=resource:speechsynth
a=cmid:1
m=audio 4048 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=recvonly
a=mid:1

2025-10-31 17:07:22.536560 59.87% [INFO] mrcp_sofiasip_client_agent.c:609 () Receive SIP Event [nua_i_state] Status 0 INVITE sent [sbc-tts]

play_and_detect_speech

播放tts语音的同时，识别用户的语音流。可以打断tts的播放。

lua的方法，可以参考这篇文章:用play_and_detect_speech实现人机语音交互的示例

esl中的使用命令为：

1
2
3


uuid_setvar uuid tts_engine unimrcp:sbc-tts (设置unimrcp)
uuid_setvar uuid tts_voice EN (设置语言,非必须)
play_and_detect_speech uuid say:welcome to unimrcp, thank you. detect:unimrcp:sbc-tts {start-input-timers=true,recognition-timeout=10000,no-input-timeout=30000,speech-complete-timeout=20000}ali

这种方法的打断机制有两种：

unimrcp进行asr识别是返回了START-OF-INPUT事件，检测到用户说话。
unimrcp进行asr识别是返回了RECOGNIZE-COMPLETE事件,识别完成返回一句话。

经过测试发现，大部分都是START-OF-INPUT事件，太过敏感了，日志为:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


2025-10-31 17:17:16.756586 57.40% [INFO] mrcp_client_session.c:500 (TTS-30) Raise App MRCP Response TTS-30 <a7f677acb63a11f0>
2025-10-31 17:17:18.916563 57.60% [INFO] mrcp_client_connection.c:635 () Receive MRCPv2 Data 172.16.4.111:43794 <-> 172.16.7.240:1544 [94 bytes]
MRCP/2.0 94 START-OF-INPUT 2 IN-PROGRESS
Channel-Identifier: a7d6756ab63a11f0@speechrecog


2025-10-31 17:17:18.916563 57.60% [INFO] mrcp_client_session.c:516 (ASR-29) Raise App MRCP Event ASR-29 <a7d6756ab63a11f0>
2025-10-31 17:17:18.956559 57.60% [INFO] switch_ivr_async.c:4834 (sofia/internal/1000@172.16.4.111) START OF SPEECH
2025-10-31 17:17:18.956559 57.60% [INFO] mrcp_client_session.c:392 (TTS-30) Receive App MRCP Request TTS-30 <a7f677acb63a11f0>
2025-10-31 17:17:18.956559 57.60% [INFO] mrcp_client_session.c:622 (TTS-30) Send MRCP Request TTS-30 <a7f677acb63a11f0@speechsynth> [2]
2025-10-31 17:17:18.956559 57.60% [INFO] mrcp_client_connection.c:530 (TTS-30) Send MRCPv2 Data 172.16.4.111:43794 <-> 172.16.7.240:1544 [72 bytes]
MRCP/2.0 72 STOP 2
Channel-Identifier: a7f677acb63a11f0@speechsynth


2025-10-31 17:17:18.976561 57.60% [INFO] mrcp_client_connection.c:635 () Receive MRCPv2 Data 172.16.4.111:43794 <-> 172.16.7.240:1544 [108 bytes]
MRCP/2.0 108 2 200 COMPLETE
Channel-Identifier: a7f677acb63a11f0@speechsynth
Active-Request-Id-List: 1

针对这种情况，有两种方法尝试：

优化unimrcp中的语音检测的算法，我用了webrtc的噪声过滤，效果不太理想。
unimrcp去掉START-OF-INPUT,这样就只会在识别出用户的一句话之后才能打断。也会有点问题，就是打断表现的会慢一些，因为要等用户说完一句话，才会返回识别结果。方法可以参考：https://blog.csdn.net/qq1779062842/article/details/106471665

另外如果在此节点按键呢? 日志如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


2025-10-31 17:36:34.816860 58.40% [DEBUG] apt_consumer_task.c:141 () Wait for Messages [MRCP Client]
2025-10-31 17:36:34.816860 58.40% [DEBUG] switch_ivr_play_say.c:2817 Speaking text: Websocket server may return JSON object containing base64 encoded audio to be played by the user.
2025-10-31 17:36:34.856554 58.40% [DEBUG] switch_rtp.c:7934 Correct audio ip/port confirmed.
2025-10-31 17:36:34.856554 58.40% [DEBUG] switch_core_io.c:448 Setting BUG Codec PCMA:8
2025-10-31 17:36:34.876612 58.40% [DEBUG] switch_rtp.c:1982 rtcp_stats_init: audio ssrc[1321737180] base_seq[19527]
2025-10-31 17:36:38.056592 57.03% [DEBUG] switch_rtp.c:8179 RTP RECV DTMF 1:1600
2025-10-31 17:36:38.056592 57.03% [DEBUG] mod_unimrcp.c:3457 (ASR-33) Queued DTMF: 1
2025-10-31 17:36:38.056592 57.03% [INFO] switch_channel.c:527 RECV DTMF 1:1600
2025-10-31 17:36:38.056592 57.03% [DEBUG] switch_ivr_async.c:4862 (sofia/internal/1000@172.16.4.111) IGNORE NON-TERMINATOR DIGIT 1
2025-10-31 17:36:38.456573 57.00% [DEBUG] switch_rtp.c:8179 RTP RECV DTMF 2:1600
2025-10-31 17:36:38.456573 57.00% [DEBUG] mod_unimrcp.c:3457 (ASR-33) Queued DTMF: 2
2025-10-31 17:36:38.456573 57.00% [INFO] switch_channel.c:527 RECV DTMF 2:1600
2025-10-31 17:36:38.456573 57.00% [DEBUG] switch_ivr_async.c:4862 (sofia/internal/1000@172.16.4.111) IGNORE NON-TERMINATOR DIGIT 2
2025-10-31 17:36:38.956563 57.00% [DEBUG] switch_rtp.c:8179 RTP RECV DTMF 1:1600
2025-10-31 17:36:38.956563 57.00% [DEBUG] mod_unimrcp.c:3457 (ASR-33) Queued DTMF: 1
2025-10-31 17:36:38.956563 57.00% [INFO] switch_channel.c:527 RECV DTMF 1:1600
2025-10-31 17:36:38.956563 57.00% [DEBUG] switch_ivr_async.c:4862 (sofia/internal/1000@172.16.4.111) IGNORE NON-TERMINATOR DIGIT 1
2025-10-31 17:36:39.476560 57.13% [DEBUG] switch_rtp.c:8179 RTP RECV DTMF 2:1600
2025-10-31 17:36:39.476560 57.13% [DEBUG] mod_unimrcp.c:3457 (ASR-33) Queued DTMF: 2
2025-10-31 17:36:39.476560 57.13% [INFO] switch_channel.c:527 RECV DTMF 2:1600
2025-10-31 17:36:39.476560 57.13% [DEBUG] switch_ivr_async.c:4862 (sofia/internal/1000@172.16.4.111) IGNORE NON-TERMINATOR DIGIT 2

可以看到能收到按键，但是无法打断。有报：IGNORE NON-TERMINATOR DIGIT意思是没有设置终止符而忽略了按键,这样设置还是没解决此问题：

1

play_and_detect_speech uuid say:welcome to unimrcp, thank you. detect:unimrcp:sbc-tts {start-input-timers=true,recognition-timeout=10000,no-input-timeout=30000,speech-complete-timeout=20000}builtin:dtmf/digits?length=4&terminators=#&interdigit-timeout=3000

play_and_get_digits

这个是专门用来收集按键的方法。用法参数：

1

<min> <max> <tries> <timeout> <terminators> <file> <invalid_file> [<var_name> [<regexp> [<digit_timeout> [<transfer_on_failure>]]]]

该功能只能播放音频文件，不能调用mrcp来生成语音流。

播放文件是使用示例为：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


<extension name="collect_digits">
  <condition field="destination_number" expression="^5000$">
    <action application="answer"/>
    <action application="sleep" data="1000"/>
    <action application="play_and_get_digits"
            data="1 4 3 5000 # ivr/8000/ivr-enter_source_telephone_number.wav ivr/8000/ivr-invalid_extension_try_again.wav user_input ^[0-9]+$ 3000"/>
    <action application="log" data="INFO User entered: ${user_input}"/>

    <!-- 跳转到下一个处理扩展 -->
    <action application="transfer" data="${user_input} XML default"/>
  </conition>
</extension>

<!-- 定义分支 -->
<extension name="ivr_option_1">
  <condition field="destination_number" expression="^1$">
    <action application="transfer" data="1002 XML default"/>
  </condition>
</extension>

<extension name="ivr_option_2">
  <condition field="destination_number" expression="^2$">
    <action application="playback" data="ivr/8000/ivr-your_caller_id_information_is.wav"/>
  </condition>
</extension>

该模块功能为按键1，拨打1002，按键2，播放语音ivr/8000/ivr-your_caller_id_information_is.wav。

按1转1002坐席的实际测试的日志为：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 play_and_get_digits(1 4 3 5000 # ivr/8000/ivr-enter_source_telephone_number.wav ivr/8000/ivr-invalid_extension_try_again.wav user_input ^[0-9]+$ 3000)
2025-11-04 14:35:35.867339 55.97% [INFO] switch_channel.c:527 RECV DTMF 1:1600
2025-11-04 14:35:36.947328 55.83% [INFO] switch_channel.c:527 RECV DTMF #:1600
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 log(INFO User entered: 1)
2025-11-04 14:35:36.947328 55.83% [INFO] mod_dptools.c:1879 User entered: 1
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 transfer(1 XML default)
2025-11-04 14:35:36.947328 55.83% [NOTICE] switch_ivr.c:2296 Transfer sofia/internal/1000@172.16.4.111 to XML[1@default]
2025-11-04 14:35:36.947328 55.83% [INFO] mod_dialplan_xml.c:639 Processing 1000 <1000>->1 in context default
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 set(open=true)
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 transfer(1002 XML default)
2025-11-04 14:35:36.947328 55.83% [NOTICE] switch_ivr.c:2296 Transfer sofia/internal/1000@172.16.4.111 to XML[1002@default]
2025-11-04 14:35:36.947328 55.83% [INFO] mod_dialplan_xml.c:639 Processing 1000 <1000>->1002 in context default
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 set(open=true)
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 export(dialed_extension=1002)

另外也可以接收多个按键，例如：123#,日志为:

1
2
3
4
5
6
7


EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 play_and_get_digits(1 4 3 5000 # ivr/8000/ivr-enter_source_telephone_number.wav ivr/8000/ivr-invalid_extension_try_again.wav user_input ^[0-9]+$ 3000)
2025-11-04 14:37:05.347863 56.50% [INFO] switch_channel.c:527 RECV DTMF 1:1600
2025-11-04 14:37:05.747320 56.50% [INFO] switch_channel.c:527 RECV DTMF 2:1600
2025-11-04 14:37:06.167313 56.53% [INFO] switch_channel.c:527 RECV DTMF 3:1600
2025-11-04 14:37:06.847320 56.27% [INFO] switch_channel.c:527 RECV DTMF 4:1600
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 log(INFO User entered: 1234)
2025-11-04 14:37:06.847320 56.27% [INFO] mod_dptools.c:1879 User entered: 1234

总结

本章我们介绍了使用mrcp场景下的打断和按键收集功能。如果不对源码修改，实际体验上效果远远达不到实用需求。