On the test of independence in sXr Tables,
the power of the family of power-divergence statistics
Ra can be approximated to the same non-central
χ2-distribution for all the statistics of this family.
In this paper, we propose two approximations to the power of R
a. These approximations are based on the approximation methods
for the multinomial goodness-of-fit test proposed by Broffitt & Randles
(1977) and Drost et al.(1989).
One approximation is constructed from a limiting normal distribution
of Ra. The other is constructed from the linear and quadratic terms of a Taylor series expansion of Ra.
The proposed approximations vary with the selection of Ra.
By numerical comparison, it is found that the latter approximation
performs well. In the end of the paper, we discuss about the selection
of the statistics.
The parallel processing of statistical data analysis has not been studied much so far. In this paper, we aim at a high speed execution of statistical data analysis on the parallel computer ``QCDPAX''. The parallel processing is performed to allocate 1/16 of data equally to each processing unit (PU). The way of matrix calculation in multiple regression analysis is that we allocate the 1st row to PU[0], the 2nd row to PU[1], and so on. If the dimension of the matrix is more than 16, we allocate the 17th row to PU[0], the 18th row to PU[1], and so on, so that the load of each PU is equal. We evaluate the efficiency of parallel processing of basic statistics, multiple regression analysis and principal component analysis. As the result, in basic statistics calculation, as the number of samples increase, we get the better efficiency of parallel processing, independent of the number of variables. We get the efficiency of parallel processing over 90% with more than 5000 samples. In multiple regression analysis, either the number of variables or the number of samples increase the efficiency of parallel processing. An 87.1% of efficiency of parallel processing was obtained with 32 variables and 10000 samples. In principal component analysis, we could not get the efficiency of parallel processing as the number of variables increase, but we get the efficiency as the number of samples increase. We get 66.5% of efficiency of parallel processing with 16 variables and 10000 samples. We conclude that parallel processing of statistical data analysis for massive data is effective.
keywords: Parallel computer, QCDPAX, Efficiency of parallel processing, Scaling lawIn the previous paper, we presented a technique to improve quality of Japanese manuals based on quantitative criteria, which were obtained from classification and regression tree analyses. Basic data used in the analyses were collected from a specific set of documents and a specific group of readers. The clarity (high quality), the viewpoint of clarity evaluation and the weight of each evalution items may depend on the set of documents and the group of readers. It is necessary in developing to investigate any biases due to a sample of documents and a sample of readers in order to generalize the results from the regression tree analyses. We here take up newspaper articles, articles, manuals, theses, patent sentences and textbooks as the document samples. The subjects are graduate students of engineering and technical writers. We investigate the biases due to the samples.
keywords: Quality of Japanese manuals, Quantitative criteria of manuals, Clarity evaluation, Regression tree analyses, Segmentation results
In a clustering model, a cluster is defined
as a subset in which objects share a common property.
Then the similarity between the pair of objects
is regarded as the degree of shared properties.
This paper proposes a general class of clustering models
for ordinal similarity data,
in which aggregation operators are used to define
the degree of simultaneous belongingness of objects to a cluster.
We discuss some required conditions for the aggregation operators.
T-norms are concrete examples to satisfy the conditions.
Moreover, the validity of this model is shown by investigating the
features of the model and numerical applications.
A Boltzmann machine is known as a stochastically extended model of the Hopfield neural network. It is not only a neural network model but related to various fields, such as statistics, statistical mechanics, information geometry, and so on. We review various aspects of a Boltzmann machine such as dynamics, learning rule, maximum entropy property, spatial Markovian property, and so on, in view of the general theories of Gibbs sampler, exponential family and Markov random field. Some recent studies on the application of Boltzmann machines are also reviewed.
keywords: Gibbs sampler, Exponential family, Markov random field