背景
freeswitch在做智能呼叫系统时,如何拿到用户的语音流是个难点,大体上会有三种方案:
freeswitch使用mrcp协议和unimrcp交互, unimrcp来集成对接asr/tts,esl还可用。
freeswitch使用media_bug把语音流旁路到业务服务,再把语音流发送到asr。
大部分公司使用的是这种方案, 只是具体的实现方式不太一样, esl还可用。
- 把
freeswitch作为ua, 业务服务来实现uac(外呼)或者uas(呼入),这种需要自己实现sip交互,
esl无法使用。这种更灵活,缺点是有点复杂,freeswitch的一些特性功能就没法用了。
智能呼叫系统一般都具有:打断(用户打断机器人或者机器人打断用户),按键,转人工等功能。
本章介绍第一种方案, 使用esl方式查看这些功能的具体效果。freeswitch的官方文档
freeswitch版本为:
Version 1.10.7-release+git20211024T163933Z883d2cb662~64bit
测试方法
- freeswitch 呼入配置
1
2
3
4
5
6
7
|
conf/dialplan/default.xml
<extension name="88990">
<condition field="destination_number" expression="^88990$">
<action application="answer"/>
<action application="park"/>
</condition>
</extension>
|
-
添加自己的mrcp服务conf/mrcp_profiles/sbc-asr.xml,sbc-tts.xml
-
添加grammar文件grammar/ali.gram
1
2
3
4
5
6
7
8
|
cat ali.gram
#JSGF V1.0;
grammar example;
public <main> = [ <pre> ] ( <weather> {WEATHER} | <sports> {SPORTS} | <stocks> {STOCKS} ) ;
<pre> = ( I would like [ to hear ] ) | ( hear ) | ( [ please ] get [ me ] ) | ( look up );
<weather> = [ the ] weather;
<sports> = sports [ news ];
<stocks> = ( [ a ] stock ( quote | quotes ) ) | stocks;
|
-
使用esl方式连接freeswitch。
-
软电话拨打88990。
detect_speech
用来识别用户的语音流, 并返回识别结果。使用方法主要有:
1
2
3
4
5
6
7
8
9
10
11
|
detect_speech <mod_name> <gram_name> <gram_path> [<addr>]
detect_speech grammar <gram_name> [<path>]
detect_speech grammaron <gram_name>
detect_speech grammaroff <gram_name>
detect_speech grammarsalloff
detect_speech nogrammar <gram_name>
detect_speech param <name> <value>
detect_speech pause
detect_speech resume
detect_speech start_input_timers
detect_speech stop
|
esl中的调用命令为detect_speech uuid unimrcp:sbc-asr {recognition-timeout=1000}ali default,uuid根据实际情况填写。
可以看到相关的日志:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
EXECUTE [depth=1] sofia/internal/1000@172.16.4.111 detect_speech(unimrcp:sbc-asr {recognition-timeout=1000}ali default )
2025-10-31 16:50:51.556688 59.43% [INFO] mod_unimrcp.c:3128 asr_handle: name = unimrcp, codec = (null), rate = 8000, grammar = (null), param = sbc-asr
2025-10-31 16:50:51.556688 59.43% [INFO] mod_unimrcp.c:3130 codec = L16, rate = 8000, dest = (null)
2025-10-31 16:50:51.556688 59.43% [NOTICE] mrcp_application.c:96 (ASR-26) Create MRCP Handle 0x7fac94029610 [sbc-asr]
2025-10-31 16:50:51.556688 59.43% [INFO] mrcp_client_session.c:133 (ASR-26) Create Channel ASR-26 <new>
2025-10-31 16:50:51.556688 59.43% [INFO] mrcp_client_session.c:387 (ASR-26) Receive App Request ASR-26 <new> [2]
2025-10-31 16:50:51.556688 59.43% [INFO] mrcp_client.c:700 (ASR-26) Add MRCP Handle ASR-26 <new>
2025-10-31 16:50:51.556688 59.43% [NOTICE] mrcp_client_session.c:719 (ASR-26) Add Control Channel ASR-26 <new@speechrecog>
2025-10-31 16:50:51.556688 59.43% [INFO] mrcp_client_session.c:411 (ASR-26) Send Offer ASR-26 <new> [c:1 a:1 v:0] to 172.16.7.240:8060
2025-10-31 16:50:51.556688 59.43% [INFO] mrcp_sofiasip_client_agent.c:354 (ASR-26) Local SDP ASR-26 <new>
v=0
o=FreeSWITCH 0 0 IN IP4 172.16.4.111
s=-
c=IN IP4 172.16.4.111
t=0 0
m=application 9 TCP/MRCPv2 1
a=setup:active
a=connection:new
a=resource:speechrecog
a=cmid:1
m=audio 4004 RTP/AVP 0 8 96
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:96 L16/8000
a=sendonly
a=ptime:20
a=mid:1
|
因为我在测试时asr有点问题,没有返回结果,正常情况下,读DETECTED_SPEECH事件,就可以看到识别结果。
detect_speech没法打断speak在播放的tts语音。
speak
调用mrcp把文本合成语音,并播发给用户。 esl的命令为:
1
2
3
|
uuid_setvar uuid tts_engine unimrcp:sbc-tts (设置unimrcp)
uuid_setvar uuid tts_voice EN (设置语言,非必须)
speak uuid "welcome to unimrcp, thank you" (合成语音)
|
看到的日志为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
EXECUTE [depth=1] sofia/internal/1000@172.16.4.111 speak(welcome to unimrcp, thank you)
2025-10-31 17:07:22.516557 59.87% [INFO] mod_unimrcp.c:1625 speech_handle: name = unimrcp, rate = 8000, speed = 0, samples = 160, voice = , engine = unimrcp, param = sbc-tts
2025-10-31 17:07:22.516557 59.87% [INFO] mod_unimrcp.c:1628 voice = EN, rate = 8000
2025-10-31 17:07:22.516557 59.87% [NOTICE] mrcp_application.c:96 (TTS-27) Create MRCP Handle 0x7facec054130 [sbc-tts]
2025-10-31 17:07:22.516557 59.87% [INFO] mrcp_client_session.c:133 (TTS-27) Create Channel TTS-27 <new>
2025-10-31 17:07:22.516557 59.87% [INFO] mrcp_client_session.c:387 (TTS-27) Receive App Request TTS-27 <new> [2]
2025-10-31 17:07:22.516557 59.87% [INFO] mrcp_client.c:700 (TTS-27) Add MRCP Handle TTS-27 <new>
2025-10-31 17:07:22.516557 59.87% [NOTICE] mrcp_client_session.c:719 (TTS-27) Add Control Channel TTS-27 <new@speechsynth>
2025-10-31 17:07:22.536560 59.87% [INFO] mrcp_client_session.c:411 (TTS-27) Send Offer TTS-27 <new> [c:1 a:1 v:0] to 172.16.7.240:8060
2025-10-31 17:07:22.536560 59.87% [INFO] mrcp_sofiasip_client_agent.c:354 (TTS-27) Local SDP TTS-27 <new>
v=0
o=FreeSWITCH 0 0 IN IP4 172.16.4.111
s=-
c=IN IP4 172.16.4.111
t=0 0
m=application 9 TCP/MRCPv2 1
a=setup:active
a=connection:new
a=resource:speechsynth
a=cmid:1
m=audio 4048 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=recvonly
a=mid:1
2025-10-31 17:07:22.536560 59.87% [INFO] mrcp_sofiasip_client_agent.c:609 () Receive SIP Event [nua_i_state] Status 0 INVITE sent [sbc-tts]
|
play_and_detect_speech
播放tts语音的同时,识别用户的语音流。可以打断tts的播放。
lua的方法,可以参考这篇文章:用play_and_detect_speech实现人机语音交互的示例
esl中的使用命令为:
1
2
3
|
uuid_setvar uuid tts_engine unimrcp:sbc-tts (设置unimrcp)
uuid_setvar uuid tts_voice EN (设置语言,非必须)
play_and_detect_speech uuid say:welcome to unimrcp, thank you. detect:unimrcp:sbc-tts {start-input-timers=true,recognition-timeout=10000,no-input-timeout=30000,speech-complete-timeout=20000}ali
|
这种方法的打断机制有两种:
unimrcp进行asr识别是返回了START-OF-INPUT事件,检测到用户说话。
unimrcp进行asr识别是返回了RECOGNIZE-COMPLETE事件,识别完成返回一句话。
经过测试发现,大部分都是START-OF-INPUT事件,太过敏感了,日志为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
2025-10-31 17:17:16.756586 57.40% [INFO] mrcp_client_session.c:500 (TTS-30) Raise App MRCP Response TTS-30 <a7f677acb63a11f0>
2025-10-31 17:17:18.916563 57.60% [INFO] mrcp_client_connection.c:635 () Receive MRCPv2 Data 172.16.4.111:43794 <-> 172.16.7.240:1544 [94 bytes]
MRCP/2.0 94 START-OF-INPUT 2 IN-PROGRESS
Channel-Identifier: a7d6756ab63a11f0@speechrecog
2025-10-31 17:17:18.916563 57.60% [INFO] mrcp_client_session.c:516 (ASR-29) Raise App MRCP Event ASR-29 <a7d6756ab63a11f0>
2025-10-31 17:17:18.956559 57.60% [INFO] switch_ivr_async.c:4834 (sofia/internal/1000@172.16.4.111) START OF SPEECH
2025-10-31 17:17:18.956559 57.60% [INFO] mrcp_client_session.c:392 (TTS-30) Receive App MRCP Request TTS-30 <a7f677acb63a11f0>
2025-10-31 17:17:18.956559 57.60% [INFO] mrcp_client_session.c:622 (TTS-30) Send MRCP Request TTS-30 <a7f677acb63a11f0@speechsynth> [2]
2025-10-31 17:17:18.956559 57.60% [INFO] mrcp_client_connection.c:530 (TTS-30) Send MRCPv2 Data 172.16.4.111:43794 <-> 172.16.7.240:1544 [72 bytes]
MRCP/2.0 72 STOP 2
Channel-Identifier: a7f677acb63a11f0@speechsynth
2025-10-31 17:17:18.976561 57.60% [INFO] mrcp_client_connection.c:635 () Receive MRCPv2 Data 172.16.4.111:43794 <-> 172.16.7.240:1544 [108 bytes]
MRCP/2.0 108 2 200 COMPLETE
Channel-Identifier: a7f677acb63a11f0@speechsynth
Active-Request-Id-List: 1
|
针对这种情况,有两种方法尝试:
- 优化
unimrcp中的语音检测的算法,我用了webrtc的噪声过滤,效果不太理想。
unimrcp去掉START-OF-INPUT,这样就只会在识别出用户的一句话之后才能打断。
也会有点问题,就是打断表现的会慢一些,因为要等用户说完一句话,才会返回识别结果。
方法可以参考:https://blog.csdn.net/qq1779062842/article/details/106471665
另外如果在此节点按键呢? 日志如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
2025-10-31 17:36:34.816860 58.40% [DEBUG] apt_consumer_task.c:141 () Wait for Messages [MRCP Client]
2025-10-31 17:36:34.816860 58.40% [DEBUG] switch_ivr_play_say.c:2817 Speaking text: Websocket server may return JSON object containing base64 encoded audio to be played by the user.
2025-10-31 17:36:34.856554 58.40% [DEBUG] switch_rtp.c:7934 Correct audio ip/port confirmed.
2025-10-31 17:36:34.856554 58.40% [DEBUG] switch_core_io.c:448 Setting BUG Codec PCMA:8
2025-10-31 17:36:34.876612 58.40% [DEBUG] switch_rtp.c:1982 rtcp_stats_init: audio ssrc[1321737180] base_seq[19527]
2025-10-31 17:36:38.056592 57.03% [DEBUG] switch_rtp.c:8179 RTP RECV DTMF 1:1600
2025-10-31 17:36:38.056592 57.03% [DEBUG] mod_unimrcp.c:3457 (ASR-33) Queued DTMF: 1
2025-10-31 17:36:38.056592 57.03% [INFO] switch_channel.c:527 RECV DTMF 1:1600
2025-10-31 17:36:38.056592 57.03% [DEBUG] switch_ivr_async.c:4862 (sofia/internal/1000@172.16.4.111) IGNORE NON-TERMINATOR DIGIT 1
2025-10-31 17:36:38.456573 57.00% [DEBUG] switch_rtp.c:8179 RTP RECV DTMF 2:1600
2025-10-31 17:36:38.456573 57.00% [DEBUG] mod_unimrcp.c:3457 (ASR-33) Queued DTMF: 2
2025-10-31 17:36:38.456573 57.00% [INFO] switch_channel.c:527 RECV DTMF 2:1600
2025-10-31 17:36:38.456573 57.00% [DEBUG] switch_ivr_async.c:4862 (sofia/internal/1000@172.16.4.111) IGNORE NON-TERMINATOR DIGIT 2
2025-10-31 17:36:38.956563 57.00% [DEBUG] switch_rtp.c:8179 RTP RECV DTMF 1:1600
2025-10-31 17:36:38.956563 57.00% [DEBUG] mod_unimrcp.c:3457 (ASR-33) Queued DTMF: 1
2025-10-31 17:36:38.956563 57.00% [INFO] switch_channel.c:527 RECV DTMF 1:1600
2025-10-31 17:36:38.956563 57.00% [DEBUG] switch_ivr_async.c:4862 (sofia/internal/1000@172.16.4.111) IGNORE NON-TERMINATOR DIGIT 1
2025-10-31 17:36:39.476560 57.13% [DEBUG] switch_rtp.c:8179 RTP RECV DTMF 2:1600
2025-10-31 17:36:39.476560 57.13% [DEBUG] mod_unimrcp.c:3457 (ASR-33) Queued DTMF: 2
2025-10-31 17:36:39.476560 57.13% [INFO] switch_channel.c:527 RECV DTMF 2:1600
2025-10-31 17:36:39.476560 57.13% [DEBUG] switch_ivr_async.c:4862 (sofia/internal/1000@172.16.4.111) IGNORE NON-TERMINATOR DIGIT 2
|
可以看到能收到按键,但是无法打断。有报:IGNORE NON-TERMINATOR DIGIT意思是没有设置终止符而忽略了按键,这样设置还是没解决此问题:
1
|
play_and_detect_speech uuid say:welcome to unimrcp, thank you. detect:unimrcp:sbc-tts {start-input-timers=true,recognition-timeout=10000,no-input-timeout=30000,speech-complete-timeout=20000}builtin:dtmf/digits?length=4&terminators=#&interdigit-timeout=3000
|
play_and_get_digits
这个是专门用来收集按键的方法。用法参数:
1
|
<min> <max> <tries> <timeout> <terminators> <file> <invalid_file> [<var_name> [<regexp> [<digit_timeout> [<transfer_on_failure>]]]]
|
该功能只能播放音频文件,不能调用mrcp来生成语音流。
播放文件是使用示例为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
<extension name="collect_digits">
<condition field="destination_number" expression="^5000$">
<action application="answer"/>
<action application="sleep" data="1000"/>
<action application="play_and_get_digits"
data="1 4 3 5000 # ivr/8000/ivr-enter_source_telephone_number.wav ivr/8000/ivr-invalid_extension_try_again.wav user_input ^[0-9]+$ 3000"/>
<action application="log" data="INFO User entered: ${user_input}"/>
<!-- 跳转到下一个处理扩展 -->
<action application="transfer" data="${user_input} XML default"/>
</conition>
</extension>
<!-- 定义分支 -->
<extension name="ivr_option_1">
<condition field="destination_number" expression="^1$">
<action application="transfer" data="1002 XML default"/>
</condition>
</extension>
<extension name="ivr_option_2">
<condition field="destination_number" expression="^2$">
<action application="playback" data="ivr/8000/ivr-your_caller_id_information_is.wav"/>
</condition>
</extension>
|
该模块功能为按键1,拨打1002,按键2,播放语音ivr/8000/ivr-your_caller_id_information_is.wav。
按1转1002坐席的实际测试的日志为:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 play_and_get_digits(1 4 3 5000 # ivr/8000/ivr-enter_source_telephone_number.wav ivr/8000/ivr-invalid_extension_try_again.wav user_input ^[0-9]+$ 3000)
2025-11-04 14:35:35.867339 55.97% [INFO] switch_channel.c:527 RECV DTMF 1:1600
2025-11-04 14:35:36.947328 55.83% [INFO] switch_channel.c:527 RECV DTMF #:1600
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 log(INFO User entered: 1)
2025-11-04 14:35:36.947328 55.83% [INFO] mod_dptools.c:1879 User entered: 1
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 transfer(1 XML default)
2025-11-04 14:35:36.947328 55.83% [NOTICE] switch_ivr.c:2296 Transfer sofia/internal/1000@172.16.4.111 to XML[1@default]
2025-11-04 14:35:36.947328 55.83% [INFO] mod_dialplan_xml.c:639 Processing 1000 <1000>->1 in context default
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 set(open=true)
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 transfer(1002 XML default)
2025-11-04 14:35:36.947328 55.83% [NOTICE] switch_ivr.c:2296 Transfer sofia/internal/1000@172.16.4.111 to XML[1002@default]
2025-11-04 14:35:36.947328 55.83% [INFO] mod_dialplan_xml.c:639 Processing 1000 <1000>->1002 in context default
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 set(open=true)
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 export(dialed_extension=1002)
|
另外也可以接收多个按键,例如:123#,日志为:
1
2
3
4
5
6
7
|
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 play_and_get_digits(1 4 3 5000 # ivr/8000/ivr-enter_source_telephone_number.wav ivr/8000/ivr-invalid_extension_try_again.wav user_input ^[0-9]+$ 3000)
2025-11-04 14:37:05.347863 56.50% [INFO] switch_channel.c:527 RECV DTMF 1:1600
2025-11-04 14:37:05.747320 56.50% [INFO] switch_channel.c:527 RECV DTMF 2:1600
2025-11-04 14:37:06.167313 56.53% [INFO] switch_channel.c:527 RECV DTMF 3:1600
2025-11-04 14:37:06.847320 56.27% [INFO] switch_channel.c:527 RECV DTMF 4:1600
EXECUTE [depth=0] sofia/internal/1000@172.16.4.111 log(INFO User entered: 1234)
2025-11-04 14:37:06.847320 56.27% [INFO] mod_dptools.c:1879 User entered: 1234
|
总结
本章我们介绍了使用mrcp场景下的打断和按键收集功能。
如果不对源码修改,实际体验上效果远远达不到实用需求。