I’m a fan of Michael Larsen’s blog and, in particular, his approach to reporting on the books that he reads. He breaks them down into multiple posts and goes deep. I’m going to take that approach to my reading of MSS2008ASU (Microsoft SQL Sever 2008 Analysis Services Unleashed). Here is the first post which covers the front matter and first three chapters of the book:
The introduction contains a short history of Analysis Services up to the 2008 release. Although now approaching 4 years old, the functionality in this book hasn’t changed a lot since for the past few years the Analysis Services team has been focusing on delivering managed self service BI via the PowerPivot product.
The book starts off by describing the multidimensional model implemented by Analysis Services and the attributes of OLAP systems in comparison to OLTP systems: multidimensional structures, fast data access, intuitive user interface, and support for complex calculations. Three specific ways to look at the multidimensional approach are discussed:
- The Conceptual Model consists of the structures of the metadata that make up an Analysis Services database.
- The Application Model is the way an Analysis Services database looks to the applications that access it.
- The Physical Model deals with the way the Analysis Services database is stored on disk and in memory.
The Analysis Services UDM (Unified Dimensional Model) provides a consistent way to work with multidimensional data whether it is stored in a relational database, an Analysis Services multidimensional database, or a combination of the two. The building blocks of a multidimensional model, measures and dimensions, are briefly introduced.
In general, OLAP has a reputation of being difficult to understand. I think that’s true, especially if you sweat the details. It’s really hard to give a one paragraph definition of a multidimensional model that encompasses all its aspects. Chapter 2 starts to describe multidimensional space. Primarily, the chapter defines terms that are used to talk about multidimensional space. It also presents a few examples to clarify the definitions. The terms defined in this chapter are (in order of appearance):
- Dimension – some aspect of the data that someone (the book says “the company”) would like to analyze.
- Member – one point on the dimension.
- Value – a unique characteristic of a member.
- Attribute – a collection of similar members of a dimension.
- Size (Cardinality) – the number of members a dimension contains.
- Fact Space (Fact Data) – the collection of the data in the multidimensional space.
- Theoretical Space – the collection of all possible points in the multidimensional space. This space is finite and limited by the number of dimension members.
- Tuple – a coordinate in multidimensional space. Each element of the tuple corresponds to a dimension. Each element can correspond to one member of the dimension (or all members of the dimension, indicated by an asterisk, “*”).
- Slice – a section of multidimensional space that can be defined by a tuple.
- Attribute – one aspect of a dimension.
- Hierarchy – an ordered structure of dimension attributes.
- Logical Space – the collection of data aggregated from fact space values; data that doesn’t correspond one to one a data value in external data sources.
- Cell – a point in the multidimensional space.
- Measure – the value in a cell.
- Cell Value – a measure value of a cell.
- Dimension of Measures – all the measures in the multidimensional space. Properties of each member of this dimension include data type, unit of measure, and calculation type of its data aggregation function.
- Aggregation Function – the calculation used to determine the values of a cell in the logical space.
- The “All” Member – a logical space cell that contains all members of a given dimension. Values are calculated using the dimension’s aggregation function.
- Subcube (Subspace) – a portion of the full multidimensional space. Subcubes can be normal (a coordinate that exists on one dimension must be present for every coordinate on the other dimensions of the subcube) or arbitrarily shaped (the previous parenthetical limitation is removed).
I’m not necessarily a beginner to Analysis Services and I’m not comfortable with all these definitions as they are presented in Chapter 2. Provisionally, though, I’ll accept them and move on to the next chapter without comment. Hopefully they will get clarified and filled out later in the book.
Chapter 3 concludes Part 1 of the book and it contains a discussion of the various physical architectures that Analysis Services systems can be built with. There are n-tier systems, where n = 1 to 4 (in this chapter, tiers seem to refer to the number of physical machines that implement the system):
- 1 tier – all the components of the system reside on a single computer. The components of a 1 tier system include:
- The data you want to analyze
- A local cube that structures the multidimensional model
- Various client interfaces that applications can use to talk to the cube.
- The application itself.
- 2 tier – The structure of the system is similar to the 1 tier model but in this case, instead of a local cube on the same machine as all the other components, there is a separate server machine that holds the multidimensional data that multiple client machines can query. In this architecture, you now have to deal with a network connection between the clients and the server.
- 3 tier – The additional tier in this topology is an internet server between the client and the server. In this case, clients are typically web browsers.
- 4 tier – The tiers of this model include:
- A relational database
- An OLAP server
- A Web server
An orthogonal approach to architecting your system when you want to scale the server component is to build a distributed system. Analysis Services has two features that help:
- Remote partitions – Some of the data in the database is stored on a remote server. This allows your database to contain massive quantities of data
- Linked objects – A publisher server contains the multidimensional database and mirrored subscriber servers service requests from some of the clients. Subscriber servers will cache data from the publisher server allowing the system to handle requests from a massive number of clients.
The chapter concludes with a discussion of thin client vs. thick client. A major difference between Analysis Services 2000 and 2005 is that 2005 was re-architected to be more of a thin client architecture than Analysis Services 2000. The 2005 architecture is essentially unchanged in 2008.