RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论
当前位置 :| 首页>数据挖掘知识>预测>

Microsoft Linear Regression Algorithm

来源: 作者:互联网作品 时间:2007-04-10 点击:

The Microsoft Linear Regression algorithm is a variation of the Microsoft Decision Trees algorithm, where the MINIMUM_LEAF_CASES parameter is set to be greater than or equal to the total number of cases in the dataset that the algorithm uses to train the mining model. With the parameter set in this way, the algorithm will never create a split, and therefore performs a linear regression.

You can use linear regression to determine a relationship between two continuous columns. The relationship takes the form of an equation for a line that best represents a series of data. For example, the line in the following diagram is the best possible linear representation of the data.

A line that models a set of data

The equation that represents the line in the diagram takes the general form of y = ax + b, and is known as the regression equation. The variable Y represents the output variable, X represents the input variable, and a and b are adjustable coefficients. Each data point in the diagram has an error associated with its distance from the regression line. The coefficients a and b in the regression equation adjust the angle and location of the regression line. You can obtain the regression equation by adjusting a and b until the sum of the errors that are associated with points reaches the lowest number.

Using the AlgorithmUsing the Algorithm

Use the Microsoft Tree Viewer to explore a linear regression mining model.

数据挖掘实验室

A linear regression model must contain a key column, input columns, and at least one predictable column. 数据挖掘研究院

The Microsoft Linear Regression algorithm supports specific input column content types, predictable column content types, and modeling flags, which are listed in the following table. 数据挖掘研究院

Input column content types 数据挖掘研究院

Continuous ,Cyclical, Key, Table, and Ordered 数据挖掘研究院

Predictable column content types 数据挖掘研究院

Continuous, Cyclical, and Ordered 数据挖掘实验室

Modeling flags 数据挖掘实验室

NOT NULL and REGRESSOR 数据挖掘研究院

All Microsoft algorithms support a common set of functions. However, the Microsoft Linear Regression algorithm supports additional functions, listed in the following table.

IsDescendant

数据挖掘研究院

PredictStdev

IsInNode

数据挖掘研究院

PredictSupport

数据挖掘研究院

PredictHistogram 数据挖掘研究院

PredictVariance 数据挖掘研究院

PredictNodeId 数据挖掘实验室

   

数据挖掘研究院

For a list of the functions that are common to all Microsoft algorithms, see Data Mining Algorithms. For more information about how to use these functions, see Data Mining Extensions (DMX) Function Reference. 数据挖掘研究院

The Microsoft Linear Regression algorithm supports several parameters that affect the performance and accuracy of the resulting mining model. The following table describes each parameter.

数据挖掘研究院

Parameter Description

MAXIMUM_INPUT_ATTRIBUTES

Defines the number of input attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection. 数据挖掘实验室

The default is 255. 数据挖掘研究院

MAXIMUM_OUTPUT_ATTRIBUTES 数据挖掘实验室

Defines the number of output attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection.

数据挖掘研究院

The default is 255.

数据挖掘研究院

FORCED_REGRESSOR

数据挖掘研究院

Forces the algorithm to use the indicated columns as regressors, regardless of the importance of the columns as calculated by the algorithm. 数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?