The layers in an OLAP application

OLAP applications perform more complex processing than typical relational database applications. Also, a single result can depend on every item of data in the database, which is very unlikely in an operational system, so the volume of processing can be very much higher. This means that it is more important – and more difficult – to get the architecture right.

数据挖掘实验室

You should not assume that the vendors are expert at this. We found no product that gets the architecture fully optimized for today’s hardware, and many vendors who were not even aware that they had not done so. Every element of what we believe to be ideal is available, but not yet in a single product. Even the products that have an architecture that is potentially optimal usually do little to make it easy for the application to be tuned. 数据挖掘工具

We have classified the components of an OLAP application into five logically defined layers, shown in Figure 1. In most cases, some of these may be merged, so that fewer physical layers can be distinguished, but this is a good way to classify generalized applications; one could, of course, also show more layers for any particular solution, but it would not improve the analysis to do so. We have indicated that these layers have to communicate with each other through a communications process that is a potential bottleneck. The narrowness of this constriction will vary depending on the architecture. If two modules of the same program are running on the same box, then there is, effectively, no significant narrowing of the bottleneck. If two different programs (from different vendors) are communicating on the same computer, then there will be additional overheads and translations involved, which will constrict the flow of information. If this is done across a network, then the constriction will be much greater. 数据挖掘交友

The diagram in Figure 1 also gives an indication of the volumes of data that must pass between the layers. Clearly, an architecture that places a tight bottleneck between layers that have large volumes of data passing between them is more likely to suffer from performance problems than one that places the tightest bottlenecks at places with small data traffic requirements.

In a thin-client architecture, the presentation layer is the only layer on the client. But usability and performance will be poor if no metadata is cached locally. 数据挖掘研究院

Ad hoc or on-the-fly calculations can be done either on the client or the server, and in a few cases, are split between them. Ideally, they should be done wherever data is cached. 数据挖掘工具

Large calculations should be on the server or a mid-tier, but shared-file products usually do them on the client. 数据挖掘交友

This database manager is usually combined with the database server, but can also be part of a separate application server or even be on the client in a shared-file system. 数据挖掘研究院

The database files are not directly accessible except via the database management system that manages updates, security and some of the processing.

数据挖掘交友


Figure 1: The logical layers of an OLAP client/server application. The thickness of the cyan pipes connecting the layers is an indication of the data volumes passing between them. The layers will not all be on physically separate machines in any single installation, but different products distribute them differently.

Starting from the lowest level, the database files are, literally, the physical disk file or files holding the data structures and values. We are assuming that data is not physically stored elsewhere, though metadata is sometimes duplicated on client PCs for performance reasons. This does make the product easier to use, but downloading the latest metadata can take time if the connection is slow. In other words, the session may be slow to start up, but will deliver better interactivity later. 数据挖掘实验室

The database management layer may be a standard RDBMS or it may be a proprietary multidimensional database engine. The database management layer will, from time to time, have to access all the data in the files, so to separate these two layers across a LAN is a recipe for trouble with large applications.

数据挖掘交友

The bulk multidimensional calculations layer is where the large-scale consolidation and other complex calculations that apply across large parts of the database are performed. These calculations will usually be largely or entirely pre-defined and may be performed either in advance or on demand. Doing them in advance improves the run-time performance, but can consume large amounts of disk, as explained in the database explosion section. Because these bulk calculations can involve so much of the database, it is also highly desirable that they be done close to the data, without a network bottleneck in between. Some of these calculations may also be defined at run-time, without pre-programming: for example, a user might define a new ad hoc variance, then ask for the worst five examples from the whole database. To find these five cases might involve calculating and filtering tens of thousands of variances, so it is most efficient for the calculations and the sorting to be done near the data. This layer of an OLAP application is usually included within MOLAP and hybrid OLAP products, and in the case of MOLAP products, it should be tightly bound with the integrated database management layer, with no bottleneck between them. 数据挖掘工具

The ad hoc multidimensional calculations layer performs the simpler calculations often done at reporting time, based on data that is already used in reports anyway (for example, calculating differences between two columns in a report, or a new subtotal of rows that are already used). This functionality is usually provided on the client PC, and may be part of the OLAP tool itself or performed in a spreadsheet or third-party client. If these calculations involve only data that would, in any case, be sent to the client, then they are best done on the client machine. This will reduce LAN traffic and, potentially, the server load as well. However, in some zero footprint Web solutions, such work has to be done at the server, which increases server load and network traffic.

Finally, the GUI presentation layer is the human factors and data manipulation level (for example, the part of the product that allows ‘dicing and slicing’, color coding and charting). Again, this might be part of the OLAP product or some other connected product such as a spreadsheet, an EIS front-end or a Web browser using Java applets or other scripting methods. The thinnest client architectures split this layer between the browser and a mid-tier server. While this provides a clean architecture, it reduces functionality, performance and ease of use.

Although it may do little computational work, a responsive GUI environment can still consume significant processing resources. In order that users can generate queries quickly, it is essential that this layer understands the dimensional structures of the application. This enables it both to present the structures to the users for navigation and to generate efficient queries for processing by the lower levels. 数据挖掘论坛

[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:OLAP client/server and Web architectures-Client/server and w
下一篇:How many tiers?
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • OLAP的技术核心和实现方法
  • 正在影响中国管理的10大.
  • OLAP介绍
  • Open Source ETL Takes On Proprietary Int
  • IDC Names Oracle as Leader in Data Wareh
  • OLAP术语(OLAP glossary)
  • The OLAP market grew faster than predict
  • The OLAP Survey 6 is now available
  • Did you know : OLAP by example
  • The three dimensions of the quantity cu
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • OLAP的技术核心和实现方法
  • 正在影响中国管理的10大.
  • OLAP介绍
  • IDC Names Oracle as Leader in Data Wareh
  • Open Source ETL Takes On Proprietary Int
  • Sprint, Google Partner on WiMAX For Mobi
  • Spam: You've Come a Long Way, Baby
  • The OLAP market grew faster than predict
  • OLAP术语(OLAP glossary)
  • Did you know : OLAP by example
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静