We have developed AGSAP (Automatic Generator of Statistical Analysis Programs) system. In AGSAP system statistical analysis programs are considered as sequences of capsules computing statistics. These sequences of capsules include some loops and branches which are controlled by the results of computation or intention of statistical analysts. Source lists of the capsules are not opened to the analysts but the definitions, the restrictions and the meanings of statistics computed by the capsules are opened to them. AGSAP system automatically generates statistical analysis programs from the information on statistics and statistical methods which is given by statisticians. This information is treated as the statistical knowledge which supports statistical analysts to respond to system messages, to understand the flow of the statistical analysis programs, and to interpret the results of the programs. In this paper we give the outline of AGSAP system and a program for multiple regression analysis automatically generated by the execution of AGSAP system for an example.
keywords: Statistical program, Automatic generation, Statistical expert systemACE (Alternating Conditional Expectations) is a powerful tool in exploratory data analysis. It was originally designed to find optimal transformations of variables in the context of multiple regression and correlation. There are two key points which have led success stories of ACE. The first key point is that ACE uses one-dimensional smoothing to estimate a conditional expectation. The second is that it adopts a backfitting algorithm in order to overcome "the curse of dimensionality" in multi-dimensional smoothing. From these points people may have ideas that ACE can be applied only to continuous variables. In fact, smoothing is not essential in the method and ACE can also be applied to discrete variables. We discuss various types of applications of ACE in categorical data analyses and compare two sets of results from ACE and its competitors which are proper to discrete variables. As an example of these applications we explore similarities between ACE and a canonical (correlation) analysis of a contingency table.
keywords: ACE, Canonical analysis, Categorical data, Contingency table}A new graphical method for multidimensional scaling is proposed in this paper. One of the most important standpoints in exploratory data analysis is graphical representation of residuals. For example, we can investigate a result of regression analysis with a scatter plot and vertical segments from data points to the regression line. In the case of multidimensional scaling, the relation of input data and configuration is usually verified by a scatter plot of input dissimilarities and distances among the configuration. However, the scatter plot can not show the objects that are not adequate. So, we propose a graphical method for multidimensional scaling, which represents the configuration and residuals at the same time. We can make use of the method to represent asymmetric dissimilarity data graphically. The source program of the proposed method is shown with S language. The source programs of Basic version and C version are also provided. The proposed method is illustrated with some data.
keywords: MDS, Graphical representation, Asymmetric relationships
In this paper, we present linked lines chart which can be used as a criterion to judge the smoothness of histogram. Then, we propose a method to select the optimum number of classes of histogram, utilizing the chart, and evalute the relative performances of this method to ordinary formal one in terms of small simulation experiment. It is suggested by the simulation that the number of optimum classes visually selected by the linked lines chart gives a smooth and neat histogram.
Further, we consider, as an application of the linked lines chart, the comparison between empirical and theoretical distributions, and issues on comparison among several histogram.
Projection pursuit (PP) is a computer-intensive technique for statistical analysis of multivariate data by projecting them onto a certain lower-dimensional subspace. Recent developments in PP are reviewed, and its applications are presented. A major purpose of PP is to find out an interesting lower-dimensional projection of high-dimensional point cloud by numerically maximizing a certain objective function called projection index. Projection indices are designed to represent interestingness of the projections. In PP, if a projection looks less Gaussian then it is considered as more interesting. One exciting feature of PP is the possibility that it manages to overcome the curse of dimensionality caused by the fact that high-dimensional data spaces are often almost sparse.
keywords: Exploratory data analysis, Statistical graphics, Polynomial index, Non-normality