节点文献

基于并行统计计算的金融数据分析

Financial Data Analysis Based on Parallel Statistical Computing

【作者】 郭广报

【导师】 赵卫东;

【作者基本信息】 山东大学 , 概率论与数理统计, 2012, 博士

【摘要】 现代计算机系统更加强大,使许多统计计算可以在瞬间完成。然而,一些重要的情况计算时间仍然需要用天来算,尤其是大样本海量数据或较大复杂抽样数据的统计推断。故一般的处理方法是用速度较快,但不太准确的方法,或完全跳过潜在的重要计算。因此,并行统计计算的发展是非常重要的。在这篇论文中,我们研究了工资数据,破产数据的加速机会,和养老基金数据的统计方法。我们发现了,并行统计计算处理大型统计推断问题良好的速度性能。本论文由五个章节,其主要内容描述如下:第一章并行统计计算是一个非常有趣的问题:在统计中,有很多统计计算是密集并行,因此并行和统计计算之间交叉的研究非常重要。本章重点关注的是回归问题,非参数推断,随机过程。特别是,我们综述的方法有并行多分裂法,线性回归最小二乘的并行统计解法和非线性回归并行统计算法,并行自助在非参数推断的理论结构:马氏链的并行统计解法,并行马氏链蒙特卡洛。非常重要的是,我们对并行GPU处理非图形的应用给出了综述。我们的结论是,并行统计算法的进一步研究是必须的。并对一些重要且悬而未决的问题给予了描述。第二章对于执行多元线性模型,子集选择和运行时间是很重要的问题。为了解决这些问题,我们引入一个新的并行估计。首先给出这一方法和广义最小二乘估计的等价条件,并考虑了投影和特征值的秩。然后,当存在一个稳定解时,我们给出它的误差。此外,我们所提出的方法,被用于破产数据,获得了一个数据集的估计方程,并报告了两个数据模拟的执行时间。第三章探讨解大样本方程的乘性和阻尼加性施瓦茨法的收敛理论。对于大样本的广义线性模型和广义加性模型,我们建议施瓦茨法解拟似然和惩罚拟似然。施瓦茨法用于一个子模型的序列,其中每个子模型对应两步估计参数中元素的一个子集,组合的子模型一起产生整个模型的解。这项技术可被用于模型比较,其中子模型的拟合值被用来作为一个更大模型的初始值。第四章并行自助是一个非常有用,时间性能突出的统计方法。然而,该法的理论研究还没有出现。在本章,介绍一个关于该法的工作相关矩阵,称为并行自助矩阵。我们考虑该重抽样的一些性质,以及光滑函数模型的相关最优子样本长度。我们出现了并行自助估计的时间性能研究;对于金融时间序列数据,给出了子样本长度选择的一些性能研究结果。第五章研究马氏链拟平稳分布的计算方法。这里的矩阵为拟随机阵,即,每行的和小于或等于1。我们发展施瓦茨法解该分布。特别是,得到了加性和乘性施瓦茨以及两水平的半收敛性。为了解释建议的方法,我们给出了马氏链拟平稳分布的两个例子。

【Abstract】 Modern computer systems are now even more powerful to make many common statistical computations literally instantaneous. However, some im-portant situations still exist where a result can require days to compute, es-pecially, statistical inference in large-sample mass data or as sampling large complex data. As a result, either faster but less accurate methods are used, or potentially important computations are skipped entirely. Thus, the devel-opment of parallel statistical computations is very important.In this thesis we report the results of our studies on speedup opportunities in financial data, including wage data, bankruptcy data and pension fund data. We find that good speedup and other performances are available for large statistical inference problems. This dissertation consists of five chapters, whose main contents are described as follows:Chapter One Parallel statistical computing is a very interesting prob-lem:there are many stats calculations are embarrasingly parallel in statistics, such research thus appears to play an important role in the interface between parallel and statistical computation. This chapter is concerned with paral-lel statistical computing in regression problems, nonparametric inference, and stochastic processes. In particular, we review the methods parallel multisplit-ting method, parallel method for least squares in linear regressions, and parallel statistical computing in multiple linear regression; the theoretic framework of parallel bootstrap in nonparametric inference; parallel methods for Markov Chain, and parallel Markov chain Monte Carlo. It is very important that we survey the non-graphics applications on GPUs. We conclude that there is a need for further research in parallel statistical computing, and describe some of the important unsolved problems.Chapter Two For performing multiple linear models, chosen subsets and run-time are important questions. To solve them, we introduce a new parallel maximum likelihood estimator for multiple linear models. We first give an equivalent condition between the method and generalized least squares esti-mator, and consider rank of projections and eigenvalue. We then present error of it when there exists a stable solution. Some theorems of the error are given in the paper. In addition, we use the proposed method to fit bankruptcy data, obtain an estimator equation of the data sets, and report the execution time of the method by two simulation data sets.Chapter Three We explore the convergence theories of multiplicative Schwarz method and damped additive Schwarz method for the solution about GLMs and GAMs with large sample sizes. For GLMs and GAMs with large samples, we suggest Schwarz methods for the QL and the the penalized QL. The Schwarz methods use a sequence of sub-models, each sub-model corre-sponding to a subset of the components of δ, the sub-models being patched together to yield the solution for the full model. The technique might be useful for model comparison, where the fitted values from a sub-model are used as starting values for a larger model.Chapter Four Parallel bootstrap is an extremely useful statistical method, with good timing performance. However, the theoretical study of the method is not present. In the chapter, we introduce a working correlation matrix about the method, called parallel bootstrap matrix. We consider some properties of the resampling, and related optimal subsample lengths in smooth function models. We also present the timing performance of parallel bootstrap estima-tors, and some performance results of subsample length selection on finance time series data.Chapter Five We study computational schemes for quasi stationary distributions of Markov chains, having matrices which are quasi stochastic, i.e., all of their row sums arc less than or equal to one. We develop Schwarz methods for the corresponding distributions. In particular, we get the semiconvergcnce of additive and multiplicative Schwarz methods, and that of two level Schwarz iterative methods for the quasi stationary distributions (QSDs). We provide two examples of Markov chains with QSDs, to explain our methods.

  • 【网络出版投稿人】 山东大学
  • 【网络出版年期】2012年 11期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络