节点文献

基于多维数据分析的神经网络与分布式计算研究

Research on Neural Networks and Distributed Computing Based on Multidimensional Data Analysis

【作者】 刘天桢

【导师】 童恒庆;

【作者基本信息】 武汉理工大学 , 计算机应用技术, 2008, 博士

【摘要】 人工神经网络技术以其大规模并行处理、分布式存储、自适应性、容错性等优点吸引了众多领域科学家的广泛关注,被广泛地应用于生物、电子、计算机、数学等领域。随着网络通信技术和互联网的飞速发展,分布式计算成为影响当今计算机技术发展的关键技术力量之一,在现代社会和经济发展中得到越来越广泛的应用。这两项技术都离不开数据,而大量的数据来自数据仓库存储的多维数据;这两项技术都需要数据分析,都会涉及多维矩阵。因此,研究基于多维数据分析的神经网络与分布式计算有着重要的意义,使得本研究工作得到国家自然科学基金的支持。本文的工作主要分为以下四个方面。在多维数据分析与多维矩阵研究方面,针对数据仓库中进行多维数据分析处理的重要性,引入多维矩阵的概念,对应用最广泛的立体阵,讨论了它的运算性质,为在神经网络和分布式计算中的应用打下基础。在基于多维数据分析的神经网络研究方面,首先构造了一种无监督学习的凸约束神经网络模型,该网络具有特殊结构,能实现数据压缩与还原过程,经过训练后可以表示信息的主要特征。其次研究了一种贝叶斯神经网络,运用广义朴素贝叶斯方法来处理连续变量,构造一种正交多项式核函数对其先验分布的密度函数进行估计,进一步研究了密度函数及其导数的核估计的优良性。然后针对全要素生产率研究,构造了一个分岔神经网络,实现了利用随机前沿面模型进行TFP测度。最后,构造了一种通过相互影响而使输出结果一致的半监督异构神经网络来计算TFP贡献率,并且详细地讨论了该神经网络的结构与算法。在基于多维数据分析的分布式计算研究方面,首先针对结构方程模型改进了偏最小二乘算法,构造了确定性算法。其次研究了多对象结构方程模型,采用分布式计算来计算结构方程中每组的系数,使用带凸约束的广义线性模型建立新模型,给出了多对象结构方程模型的算法。然后研究了多元非参数回归曲线漂移模型,使用分布式计算进行多元曲线漂移模型销售曲线的预测。最后研究了若干具体的分布式计算的应用,包括一般分布函数表的Monte Carlo分布式计算,蛋白质分子构造的分布式计算问题以及MOS管寿命分布的负指数矩估计与分布式计算。最后,作为基于多维数据分析的神经网络与分布式计算的综合应用,本文介绍了我们团队研发的大型应用系统——顾客满意指数测评分析系统。它基于数据仓库与.NET技术开发,采用无监督学习的凸约束神经网络模型架构,实现了基于远程方法调用的分布式计算。

【Abstract】 Artificial neural network technology is a topic concerned by scientists in many domains, because of its characteristics such as massive parallel process, distributed storage, self-adaptability, fault-tolerant and so on. It has been widely applied in many fields such as biology, electronics, computer science, mathematics and so on. With the rapid development of network communication technology and Internet, the distributed computing has become one of the key technologies influencing today’s development in computer technology. And it has been used in modern society and economic development. Both of the technologies need data, however, lots of data come from the multidimensional data stored in data warehouse. Both of the technologies need data analysis, which will involve multidimensional matrix. Therefore, it has important meaning to study the artificial neural networks and distributed computing based on multidimensional data analysis, so our research was supported by National Natural Science Fund of China.This dissertation is divided into four parts as follows.The first part focuses on the study of multidimensional data analysis and multidimensional matrix. We introduce the concept of multidimensional matrix, according to necessity of using multidimensional data analysis in data warehouse. Then we discuss the properties of cubic matrix which has the most widely application in multidimensional matrix, so we establish basis for application in neural network and distributed computing.The second part focuses on the study of artificial neural networks based on multidimensional data analysis. At first, we proposes a kind of unsupervised learning neural network model with convex constraint which has special structure and can realize the compression of data and reduction process. The main characteristics of the neural network can represent information after being trained. Secondly, we study a kind of Bayes neural networks, and adopt general naive Bayes to handle continuous variables, then, propose a kind of kernel function constructed by orthogonal polynomials which is used to estimate the density function of prior distribution in Bayes network, furthermore, make researches into optimality of the kernel estimation of density and derivatives. Thirdly, aiming at research of total factor productivity (TFP), we construct a fork neural network to implement TFP measure by stochastic frontier model. Finally, in order to compute TFP contribution rate, we put forward a kind of semi-supervised heterogeneous neural networks which makes output results consistent by interaction. Also we discuss the construction and algorithm of this neural network in detail.The third part concerns distributed computing based on multidimensional data analysis. Firstly, we propose an improved partial least square algorithm in structural equation model (SEM), which constructs a deterministic algorithm. Then multi-group structural equation model is analyzed and distributed computing is adopted to calculate all the coefficients. Furthermore, a uniform model is built using the generalized linear model with convex constraint and an algorithm for the multi-group SEM is presented. Moreover, we put forward the multivariate nonparametric regression curve drift model, and apply distributed computing to forecast the sale curve of multivariate curve drift model. At last, we apply distributed computing to several fields, which include Monte Carlo distributed computing for general distribution function table of probability of statistics, distributed computing for modeling the decomposition products of a protein and bootstrap analysis of MOSFET life distribution with negative order moment estimate and its distributed computing.The final part is an integrated application of neural networks and distributed computing based multidimensional data analysis. This dissertation introduces customer satisfaction index measure analysis system which is a large application system developed by our team. The system is based on data warehouse and .NET technique, uses the structure of unsupervised learning neural network model with convex constraint, and realizes network remote calculation and distributed computing.

  • 【分类号】TP18;TP338.8
  • 【被引频次】6
  • 【下载频次】1120
节点文献中: