Case Study: The World Bank

“OLAP technology arms World Bank economists with compelling analytical capabilities for analyzing complicated economic information,” says Ronnie Hammad, Senior Economist at the World Bank Group. “They can query and model huge volumes of complex economic information and apply custom aggregation algorithms that interpolate values for incomplete or missing data. It provides them with a seamless analytical experience, so that the World Bank is now able to make better policy recommendations and resource allocation decisions.” 数据挖掘实验室

Background
Founded in 1944, the World Bank Group’s mission is to fight poverty and to help people help themselves and their environment. It does this by providing resources, sharing knowledge, building capacity and forging partnerships in the public and private sectors. 数据挖掘实验室

The Bank is the world’s largest source of development assistance, providing more than $15bn annually in loans to its client countries. It has over 8000 employees at its headquarters in Washington DC and over 2500 staff based in offices in 100 countries.

The problem
In order to carry out its mission, the Bank collects and analyses huge amounts of economic, social and sector data on developed and developing countries. This is used as the basis for policy advice and as a means of measuring the outcome of its projects, programs and policy advice. The Bank is also involved in helping countries build their capacity to collect, analyze and disseminate their own statistical information. 数据挖掘研究院

The Bank was collecting thousands of economic indicators from over a hundred countries in a data warehouse, called the ‘Live Database (LDB)’. The data was stored in a Microsoft SQL Server 6.5 database and transferred into an Oracle pcExpress cube for performing aggregations and derivations. These were then transferred back into SQL Server.

PcExpress was maintained by a central statistical IT function and wasn’t accessed by users. “All the user got was pre-calculated aggregates,” says Hammad. “pcExpress was too complex for them to manipulate themselves.”

The database has a fairly simple structure, consisting of countries, time periods and indicators. However, economic analysts always wanted to take different cuts of the data.

They might want to aggregate a particular region, such as establishing the growth rate in East Asia, middle income countries, oil producing countries or investigating whether landlocked African countries have a historically lower growth rate than those with access to a port. Alternatively, they might want to specify a particular time period, such as export performance for 1960-1969, 1970-1979, 1980-1989 and 1990-1999.

数据挖掘论坛

This involved calling the central information technology department to request the required report.” pcExpress requires people with a high level of expertise to manage it,” explains Jose Delcour, lead technical developer of the 2gLDB project. “Only a handful of people at the World Bank know how to use it and you had to call them. The turnaround time varied from days to weeks.”

The LDB only contained annual data, whilst users needed more frequent data. Another problem was that when the economists wanted to add a new country, all the data in the data warehouse had to be extracted into pcExpress to do the aggregations.

“We didn’t have control over the process,” says Hammad. “The central statistical department would upload the data at most a few times a year, which wasn’t good enough for us. If we uploaded new country data to the data warehouse it would not be reflected in the pre-calculated the aggregates.” 数据挖掘交友

Users wanted to be able to create their own geography, their own time period and even their own indicators and get the results immediately. Hammad found that it was too complicated to do this in a relational database.

“The challenge was to continually refresh the data and to make it accessible and intuitive to economists, researchers, and senior management at the Bank,” he says.  “At a country level, we needed to help policy makers, analysts and even government ministers to access timely and useful data.” 数据挖掘研究院

The solution
In the year 2000 the Bank decided to implement a second generation Live Database (2gLDB) system. As well as implementing it internally, one of the objectives was to install the solution in many client countries around the world. These would include statistics offices, ministries of finance, central banks and regional institutions. 数据挖掘交友

Because of the widespread anticipated use of the system, cost was a major consideration. The Bank evaluated almost all the products available. Cognos PowerPlay, Hyperion Essbase and Oracle Express were technically proficient, but were quickly dismissed on cost. 数据挖掘论坛

Using Microsoft Analysis Services would be much more cost/effective, as it was already included in the license for the SQL Server database. The Bank chose to use ProClarity 3.0 to interface with the Microsoft cubes, as it offers an intuitive, graphical representation of data.

“We chose it because it is completely integrated into Microsoft SQL Server and Microsoft Office 2000,” Delcour says. “The OLAP functionality helped a lot, because we could get a lot of benefit from slicing and dicing. Every time we update country data all the derived and aggregated indicators would be instantly recalculated in Microsoft Analysis Services and be automatically reflected. It is an open environment, so it would allow us to build our own the custom add-ins. What clinched it was that it enables users to create their own geography.” 数据挖掘论坛

Implementation
A ‘Data and Tools for Economic Analysis’ (DTEA) group provided the framework, direction and initial funding for the project. A year was spent gathering user requirements, from both inside the Bank and from client countries, and interviewing various consulting firms and software vendors. 数据挖掘研究院

Creating the application was done in conjunction with ProClarity’s consultants. The main data warehouse is in Microsoft SQL Server 2000. From there cubes are created in Microsoft Analysis Services, which handles all the aggregations and derivations. Users then access the data through ProClarity. Some of the indicators are raw, some are pre-calculated and some are calculated on-the-fly. 数据挖掘工具

Creating the application was very difficult because of the complex nature of the Bank’s custom functions library add-in. This has 85 different statistical functions, such as median, average, weighted average, least squares growth rate, forecasting, regressions and logarithms. These include specific formulae that the Bank uses for aggregating and interpolating missing values.

“One function for example, reiterates through time to get missing values based on a specific base year,” say Delcour. These were written by Delcour using Visual Basic and took eight months to complete. The ProClarity consultants built the interface between ProClarity, the customs function library and the Microsoft cubes. This included writing a lot of VB and MDX statements to link ProClarity to the library dynamic link libraries (DLLs). 数据挖掘工具

“It covered the period when they went from ProClarity 2.0 to ProClarity 3.0,” says Delcour. “This version required a whole new way of programming.” 数据挖掘论坛

In use
The application is currently used by 100 economists and analysts in the Bank’s head office. A single server runs the data warehouse and Microsoft Analysis Services, whilst ProClarity is downloaded to users’ personal computers using Microsoft’s Systems Management Server (SMS). The server is currently a Dell Precision 410 with 2 x 533 MHz Pentium III processors, 512 megabytes RAM and 20 gigabytes RAID 0 storage, running Microsoft Windows 2000 Server operating system. It is shortly to be upgraded to a machine with 2 x 1 GHz Pentium IV processors, 1 gigabyte RAM and 36 gigabytes RAID 3 storage. 数据挖掘研究院

Users can now create their own time periods and easily apply a statistical function from the library. “They just highlight the years and apply a function and get the result immediately,” says Delcour. “To do that in Microsoft Excel is very time consuming and this is really much more powerful.” 数据挖掘研究院

Users can also create their own indicator or index and get the result immediately using the add-in. The application can create an index for an economic indicator that is actually a function of several others. For example, a macro economic risk rating might involve gross domestic product growth, export performance, exchange rate volatility, current account balance and fiscal balance, each with a different weighting.

数据挖掘论坛

The ability of the custom library to interpret a missing value is important. For example, an economist might want to calculate the growth rate for a group of countries in South Asia. The function retrieves the figures in local currency using a constant price series, converts it to US dollars at a specific exchange rate, takes the previous years’ values, does a percentage change and gives the user the results. 数据挖掘实验室

However, if some of the values are missing, then it goes back to a base year, where it might have the values. It then interpolates the value for the missing year through the ratio to the total in the earlier year. It is all done on-the-fly and the result is now available in seconds. 数据挖掘实验室

The cube was designed to accommodate multiple sources by creating ‘source’ as a dimension. Primary sources come directly from over 150 countries in the form of census data and hundreds of country surveys, such as demographic, household, agricultural, industrial, etc. Secondary sources include the International Monetary Fund, United Nations Agencies and the World Trade Organization.

All this information is held in a single cube that has 40 million cells and occupies 92 megabytes, equivalent to 638 megabytes of relational data. As more sources are added, the size of the cube is expected to rise significantly. 数据挖掘工具

“Before, users had to come out of one database and log into another one,” says Hammad. “Having everything in one cube is very powerful because they can see what other organizations are saying and compare them in the same window. The size of the cube doesn’t affect the performance of the query.” 数据挖掘交友

Hammad’s approach is based upon giving users a choice of tools, some being developed in-house. ProClarity is mainly used by ‘power users,’ for slicing and dicing, drilling down, etc. “Even though it is relatively easy to use and you can master it in 15 minutes, it is not for every user,” says Hammad. “It is overkill to have it on every desktop, particularly in national ministries of finance in small developing countries.”

The Bank has programmed special reports, called ‘Executive Briefs,’ in Microsoft Excel that are targeted mainly at senior management. Based on the judgment of Bank experts in different fields, these create certain sector profiles containing the most important indicators. For example, there are 70-90 indicators that most people look at in the World Bank. These are macro data, such as gross domestic product (GDP) growth per capita, investment and savings rates, primary education enrollment, infant mortality, income distribution and exports. 数据挖掘实验室

“Bank experts have provided certain judgments on what is better, weaker or medium performance against benchmarks,” says Hammad. “It is a very powerful tool and color coding the indicators is a very good way of telling a story in a particular country.” 数据挖掘研究院

This is somewhat like traffic lighting, which is easy when one indicator has many countries. However, ProClarity is unable to accommodate a country with a range of indicators, each of which has a separate benchmark. That has to be done using Microsoft Excel.

数据挖掘实验室

“I don’t know of any tool that can list different indicators, such as GDP, investment rate or fiscal deficit, and color code it with different benchmarks and do exception highlighting,” says Hammad. “In common with similar products, ProClarity allows you to have one benchmark for many members, which in our case is ‘countries.’”

The cubes are now being used to provide data for some of the Bank’s publications. For example, ‘African Development Indicators’ is a Microsoft Excel-based statistical publication with over 250 tables. Research analysts now program these themselves, directly from the cube, rather than using a programmer, which halves the cost. 数据挖掘工具

Use of the system is being expanded to more and more users at the Bank’s head office and it has been installed for about 50 users in other institutions, including the Development Bank of Southern Africa, the African Development Bank, the Statistics Office in Mozambique and the Ministry of Finance in Senegal. 数据挖掘研究院

The future
The Bank is currently rolling the application out to three regions, Eastern Europe, Asia/Pacific and Africa. The DTEA group is doing a lot of the work on the data and indicators that the users need. Their approach is to populate a cube with high frequency data, including daily updates, and generate various reports for monitoring financial volatility. 数据挖掘论坛

The cube is still relatively small because Hammad is waiting for the new server before populating the cube. He expects it to expand exponentially in size in the coming months. Once the regions have been implemented, he will seek an enterprise-wide license, allowing everybody at the Bank to use it. 数据挖掘研究院

The bank is planning to move to ProClarity 4.0 and is in the final process of getting it certified on its network registration system. However, the bank does not plan to move to the ProClarity Enterprise Server, as the add-in does not work in that environment. Also, the cost is prohibitive because not all of users need the power of ProClarity to slice and dice the data. For all but power users, a simple Web interface to query data can suffice.

Future plans include developing a Web solution and packaging the application as a commercial product that can be sold to institutions worldwide, along with a subscription fee for the data. Plans are underway for the application to be implemented in at least 10 countries around the world. For this, Hammad is looking for a lower cost means of accessing the Microsoft cubes for just simple queries, such as country, by period, by indicator (see below). 数据挖掘论坛

“The ProClarity license is less expensive than other products, but is still a lot of money,” he says. “Excel 2000 is not a particularly user-friendly and is very slow, because of the lack of flexibility in querying the cube data. We have seen a demonstration of Microsoft Excel XP, which would be a really good low cost solution for minimal user. There are also a number of ProClarity add-ins that cost about $100 a person that we might use, but not yet.”

Benefits
The 2gLDB gives users improved access to data and more powerful analysis tools. They can easily create any geography, time period or equation and get the result in seconds, without having to involve programmers.

数据挖掘工具

“This saves the Bank a huge amount of time,” says Hammad. “Techniques like benchmarking and ranking help users to put data in a context. This shortens the time it takes to understand the data and allows for more in-depth analysis. As a result, we make better policy recommendations.”

Data is now uploaded overnight, so the information available is much more up-to-date than the predominantly annual data available before. For example, the Bank’s economists responsible for the 48 countries in the Africa region submit historical data and projections every quarter. In the past, the central department would only upload the data into pcExpress to produce aggregates when the 48th country was submitted. Now every time a country is received it is automatically uploaded and the aggregated indicators reflected. 数据挖掘工具

The World Bank manages a lot of databases, such as finance, nutrition, health, education and regional. Often the same country in the same year might have an indicator with different values in each of the databases, because the denominator is not the same. For instance, GDP is often the denominator for a lot of other indicators, such as money supply as a percentage of GDP, expenditure as a percentage or exports as a percentage. If GDP is different in different databases, then the user got different ratios for the same derived indicators. 数据挖掘交友

“The large single cube means that users can now easily access most of their data needs from one place, instead of having to retrieve information from various databases,” says Hammad “By having it all in this huge cube with instant calculations everybody is working from the same base data and has a consistent set of indicators. At any one time all the data is completely consistent, which is a huge benefit.”

数据挖掘工具

In the countries where the 2gLDB is being installed, users are now able to organize, access, analyze and disseminate information much more powerfully. Another benefit of using ProClarity is that it is still a relatively small company. 数据挖掘工具

“They are able to work directly with us,” says Hammad. “They are small enough that on a number of occasions they have incorporated certain specifications or features we wanted in future versions of the product. We have been told that all the customized work we have done will still work with future versions, which is important. ProClarity is a great company to work with.” 数据挖掘研究院

The Bank’s mission to fight poverty requires it to help countries to establish macroeconomic stability, develop policies on education and healthcare and create the necessary infrastructure. This requires analyzing huge amounts of information in order to make policy recommendations to the governments it is working with. 数据挖掘论坛

“Our unique and powerful solution gives us seamless access to the latest, most comprehensive and most accurate information,” concludes Hammad. “OLAP allows us to analyze it at ‘super speed,’ apply complex statistical functions and present it the right context to the right people. This helps improve the quality of advice we give.”

The central statistical department is undergoing a major systems renewal program that involves a Web-based solution that can access any database. The solution uses Oracle Express Server in the background to enable the users to analyze the data from any server.  This system is at its very early stages and is part of a three-year capital development project. The future challenge is how to merge the two programs together. 数据挖掘工具

Summary
The World Bank has combined OLAP technology with its own custom statistical algorithms to create a powerful tool for analyzing vast amounts of economic and social data. They now have up-to-date information and automatic recalculation of the cube. With its intuitive interface, the applications is helping economists to reduce world poverty.

Conclusions
  The license for Microsoft Analysis Services is included in the cost of Microsoft’s SQL Server relational database, effectively making the cubes free. Most organizations also have Microsoft Office licenses, so Microsoft Excel pivot tables could be used as a front end to the cubes. However, the Bank chose to license ProClarity because of the increased ease of use and functionality it offered. 数据挖掘实验室


 
 The Bank has chosen to handle performance indicators in three different ways. They can be input directly, pre-calculated or calculated on-the-fly, giving a high degree of flexibility. It also uses derived indicators, which are based on others, with the application handling the calculations. 数据挖掘交友


 
 Despite using a single cube, so far the Bank has not experienced any scalability or performance problems. However, it is about to increase the size of the cube quite dramatically, so this will be further tested. 数据挖掘研究院


 
 In common with most wide scale deployments of OLAP technology, the Bank has found that different types of users prefer a different approach to interfacing with data. Casual users are finding a traditional ‘briefing book’ approach helpful, based upon pre-selection of views created by more expert users.

数据挖掘交友

 

数据挖掘论坛

[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:Applying Project Management to Your Data Integration Project
下一篇:Sampler review from The OLAP Report
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • OLAP的技术核心和实现方法
  • 正在影响中国管理的10大.
  • OLAP介绍
  • Open Source ETL Takes On Proprietary Int
  • IDC Names Oracle as Leader in Data Wareh
  • OLAP术语(OLAP glossary)
  • The OLAP market grew faster than predict
  • The OLAP Survey 6 is now available
  • Did you know : OLAP by example
  • The three dimensions of the quantity cu
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • OLAP的技术核心和实现方法
  • 正在影响中国管理的10大.
  • OLAP介绍
  • IDC Names Oracle as Leader in Data Wareh
  • Open Source ETL Takes On Proprietary Int
  • Sprint, Google Partner on WiMAX For Mobi
  • Spam: You've Come a Long Way, Baby
  • The OLAP market grew faster than predict
  • OLAP术语(OLAP glossary)
  • Did you know : OLAP by example
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静