节点文献

基于VoiceXML的语音增值业务平台及其算法的研究

Research on VoiceXML-Based Voice Value-Add Service Platform and Its Algoritms

【作者】 王文林

【导师】 廖建新;

【作者基本信息】 北京邮电大学 , 计算机应用技术, 2007, 博士

【摘要】 在当前以及下一代网络中,语音业务是最重要的业务,占有非常大的比重,是运营商主要的盈利手段。而语音增值业务的蓬勃发展给运营商带来的并不只是直接的利润,它还提高了现有设备的利用率,更给用户带来了新的语音体验,提高了用户对运营商的忠诚度,吸引更多的用户使用该运营商的网络,也给运营商的其他业务带来了潜在的客户和利润。然而,现有语音增值业务平台的封闭、不灵活、维护困难、业务开展困难等缺点难以根除。随着语音增值业务的发展,用户数量的增加,现有的语音增值业务平台已经不能满足需求,变成了阻碍语音增值业务进一步发展的因素。所以新的语音增值业务平台应运而生,而已经成为语音浏览器的规范的VoiceXML(Voice eXtensible Markup Language,语音可扩展标志语言)具有开发灵活、业务开展方式简单等优点,是新的语音增值业务平台的最佳选择之一。本论文受国家杰出青年科学基金(No.60525110)、新世纪优秀人才支持计划(No.NCET-04-0111)、高等学校博士学科点专项科研基金资助课题(No.20030013006)资助,对基于VoiceXML的语音增值业务平台和其中涉及的一些算法进行了研究。目前,在研究成果的基础上已经完成了一个语音增值业务系统的开发,并且已在多个省份部署,拥有数百万的增值业务用户。论文对研究过程中取得的主要创新成果进行了详细阐述。这些创新工作简要归纳如下:1) VoiceXML语音增值业务平台通过网络来获取VoiceXML业务脚本和业务资源,所以不可避免地引入了网络时延,而电话用户对时延极其敏感。为了解决这个问题,对VoiceXML语音增值业务平台中的预取方案进行了研究,并认为在VoiceXML语音增值业务平台中预取的对象应是VoiceXML业务脚本中引用的业务语音资源,提出了一种自适应多用户共享的Markov预测算法,利用语音增值业务平台可以感知用户是否在线的特点,统一计算所有在线用户下一步所需的资源及其概率,提高了预测的准确率;进一步提出应采用抢占式优先级调度算法对预取任务进行调度,将资源将会被访问的概率映射为优先级参与排队进行预取调度。仿真研究表明,自适应多用户共享的Markov预测算法比目前的单用户Markov预测算法对资源未来的使用概率的预测更加准确,采用抢占式优先级调度模型也能比目前的循环调度模型得到更好的预取效益。两种算法结合可以有效地降低网络时延带来的影响,提高语音增值业务平台的响应速度,减少用户的等待时间。2)为了更好地降低网络时延带来的不良影响,对缓存替换算法作了深入地研究,在分析了现有的若干缓存替换算法后,指出替换算法应该分成两个关键问题,其一是确定资源的效用函数,其二是替换过程的算法。针对第一个问题,改进了LRU-K(K-Least Recently Used,K阶最近使用)算法并提出了新的效用函数PLRU-K(Perfect LRU-K,完美的LRU-K);对于第二个问题,根据0/1背包原理,提出了采用一阶优化贪婪替换过程(1-optimal Greedy Replacement Process,1-GRP)算法对缓存进行替换选择。仿真结果显示,PLRU-K的效用函数要比LRU-K、P-LFU(Perfect Least Frequently Used,完美的最少使用)效用函数更能体现缓存资源未来的使用收益;1-GRP算法要比P-GRP(Profit-based Greedy Replacement Process,基于收益的贪婪替换过程算法)、PD-GRP(Profit-Density-based Greedy Replacement Process,基于收益密度的贪婪替换过程算法)更能让缓存获得更大的收益;基于PLRU-K和1-GRP的缓存替换算法的性能要超过其他的替换算法,特别是在缓存空间较小的情况下。3)为了避免缓存中的VoiceXML文档及资源与服务器上的原件不一致而将过期的数据提供给用户,缓存的一致性控制算法不可或缺。本文讨论了在Web环境中,在不对HTTP(HyperText Transfer Protocol,超文本传输协议)及Web服务器进行任何修改的前提下,如何更好地实现缓存一致性控制的问题。在分析了目前一致性控制算法的优劣的基础上,考虑到VoiceXML文档的特性,根据Web文档修改时间间隔满足负指数分布这一结论,提出了随机分布拟合预测算法,使用参数估计的方法来拟合VoiceXML文档修改时间间隔的随机分布,再通过此分布预测VoiceXML文档改变的概率,用以指导是否应使用缓存内的文档。仿真研究表明,使用随机拟合分布预测算法能得到小于0.01%的文档过期率,基本满足电信系统中呼损率的要求,同时能获得较大的性能提高。4)针对VoiceXML中不能直接提供多方通信控制功能的问题,提出对VoiceXML的object元素提供的功能进行扩展;在讨论了多方通信控制所需要的具体功能之后,提出了阻塞式和非阻塞式两种不同的object扩展方案并分析了各自的优劣,还举例说明了如何使用这些object。5)在电话会议的应用中,混音是一个关键的问题,而目前的混音算法都没有很好地克服混音后音量忽大忽小变化的问题。在分析了目前的混音算法之后,提出了非均匀波形收缩混音算法,该算法基于在语音信号中低强度信号比高强度信号出现几率更高的事实,采用与混音路数无关的恒定混音权重进行混音操作。同时该算法不需要进行乘除法操作,没有浮点运算,容易采用硬件实现。实验证明,该混音算法效果理想,混音后的语音自然流畅,没有噪音,在多路语音输入时仍能保证语音质量,并且是目前最快的混音算法之一,完全能满足语音会议中高性能、高并发的混音要求。本论文的研究成果不仅可以用于基于VoiceXML的语音增值业务平台中,对于其他语音增值业务平台和相关领域的研究也具有很好的参考价值。

【Abstract】 Voice services will be the most important in next generation network as wellas they are in the current communication networks. Voice services hold the largestpercentage of the total revenues, and are the main profit source of the networkoperators. The sharp growth of voice value-added services (WAS) does not onlyprofit the operators directly, but also improves the utilization ratio of the presentequipments, brings the customers new voice experiences, increases their loyaltiesto the operators, and brings more potential customers and profits to the otherservices of the operators.But there are many shortcomings in the present voice value-added serviceplatforms (VSP), such as closed, not flexible, hard to maintain and deploy the newservices, etc. With the growth of the WAS and the user amount, the presentVSPs could not meet the demands, and began to cumber the further developmentof the WAS. So, it is time for the emergence of new VSP. VoiceXML (Voiceextensible Markup Language), which has been specified for the voice browser,has the virtue of flexibility, easily develop, and is one of the best choices of VSP.This thesis is jointly supported by National Science Fund for DistinguishedYoung Scholars (No. 60525110) , Program for New Century Excellent Talents inUniversity (No. NCET-04-0111) , Specialized Research Fund for the DoctoralProgram of Higher Education (No. 20030013006) to research the voicevalue-added platform based on VoiceXML. A voice value-added service systembased on the research result had been developed, which has been further deployedin several provinces of China and is serving millions of users.The principal contributions of the work presented in this thesis are:1) VoiceXML-based VSP gets the VoiceXML service scripts and resourcesvia the Internet, so the network delay is unavoidable, while the phone users are much sensitive to the delay. The prefetch schema in the VoiceXML-based VSPhas been studied to resolve this problem, and it is proposed that the prefetchedobjects should be the resources which are referenced by the VoiceXML scripts.An adaptive multi-user shared Markov predict algorithm is presented, which usesthe character that the voice platform can know whether a user is online or not.This algorithm can predict the probability of the forthcoming required resourcesof all the online users, which is helpful to improve the veracity of the prediction.And a preemptive priority model is designed to schedule the prefetch tasks, whichmaps the resource access probability to the task priority. The simulation researchshows that the precision of the adaptive multi-user shared Markov predictalgorithm is better than that of single user Markov predict algorithm and thepreemptive priority schedule model can get more profits than the round-robinschedule model. The combination of the two new algorithms can considerablyreduce the network delay, accelerate the response, and decrease the user’s waitingtime.2) In order to further reduce the network delay, the cache replacementalgorithm has been studied. By analyzing the present cache replacementalgorithms, two key problems are pointed out, one is how to establish the resourceutility function, and the other is the algorithm of replacement process. For the firstproblem, the LRU-K (K-Least Recently Used) algorithm is improved byproducing a novel utility function named PLRU-K (Perfect LRU-K). And for thesecond problem, according to the 0/1 Knapsack problem, the 1-optimal greedyreplacement process (1-GRP) is proposed to select and replace resources from thecache. The simulation research shows that the utility function of PLRU-K can getthe more veracious future profit of resource than that of LRU-K and P-LFU(Perfect Least Frequently Used) and 1-GRP algorithm can get more profit thanP-GRP (Profit-based Greedy Replacement Process) and PD-GRP(Profit-Density-based Greedy Replacement Process). So, the performance of thecache replacement algorithm based on PLRU-K and 1-GRP goes beyond otheralgorithms, especially when the cache volume is far smaller than the total size ofthe resources.3) The cache consistency control algorithm is indispensable in order to avoidgiving some stale data to the user. It is discussed how to develop a method that does not entail any server modifications or changes to the HTTP (HyperTextTransfer Protocol) to implement the consistency control. By analyzing the presentconsistency algorithms, considering the character of the VoiceXML document,and according to the conclusion that the Web pages modification intervals followexponential distribution, the Fitting & Prediction algorithm is proposed. Itestimates the validity of the cached document by performing parameter fits tostochastic distribution and predicting the change probability of VoiceXMLdocument. Simulation research indicates the algorithm surpasses the Alexprotocol and can obtain a stale ratio lower than 0.01% to meet the demand of thevoice platform while effectively enhance the system performance.4) To resolve the problem that the multi-party communication function isinvalid in VoiceXML, it is proposed that the function can be performed byextending the object element. After the discussion of the function details of themulti-party communication, two object extended schemes included block andnon-block approaches are proposed and their advantage and disadvantage arepointed out. And it is illustrated how to use those objects.5) Audio mixing is an essential component in conference, and the presentaudio mixing algorithms have a protean volume. By analyzing those algorithms,an algorithm named asymmetrical wave-shrinking is proposed. Based on the factthat the low volume appears more frequently than the high volume in the voicesignal, it uses a fixed mixing weight independent of inputs to mix the audios.Without multiplication and division operations, the algorithm is so simple and fastthat it can be easily implemented by hardware. The experimentation shows thatthe result of this algorithm is very good and sounds naturally and fluently withoutnoises even if there are many inputs. This algorithm is one of the fastestalgorithms at present, so it can meet the demand of the high performance and highconcurrence in the voice conference.The contributions in this thesis are not only used in VoiceXML-based voicevalue-added platform, and are also useful for the other voice platforms and theother related research fields.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络