Performance Optimization with Hardware and Software Requirements

NovaView Intelligence Server Performance Optimization and Hardware Requirements Guideline

 

Panorama NovaView 6.2 hardware and Software Requirements

This document provides Panorama view authors and Panorama administrators with background information on factors that can affect the performance level of the NovaView Intelligence Server. The document also provides information that can help implementers in the process of choosing the right hardware to use for a given BI project. Further information is provided on performance optimizations that can be achieved using NovaView implementation features.

 

General information provided by Microsoft about Analysis Services performance can be found at Microsoft SQL Server Performance guides:

Analysis Services 2000 Performance Guide.

Analysis Services 2005 Performance Guide

Analysis Services 2008 Performance Guide

 

 

General Performance Considerations

 

The NovaView Intelligence Server performs most of the calculations and caching associated with the OLAP queries sent through the Panorama client applications. Hence, the following important key factors are to be considered when deploying the Panorama NovaView Intelligence Server:

  1. Memory consumption of the NovaView Intelligence Server can be high (because of caching). As the NovaView Intelligence Server is configured to make full utilization of all available memory resources, it is recommended to have as much memory as possible in the NovaView Intelligence Server machine.
  2. As the machine running the NovaView Intelligence Server does the computationally heavier task of processing and calculating user queries (using the MS OLEDB for OLAP provider) it is usually recommend that the machine running the NovaView Intelligence Server should be more powerful than the machine running the AS Server. 

 

Factors Affecting Performance

Some factors that play an important role in the final performance of a BI system are as follows:

 

  1. CPU Numbers
  2. OLAP Database Size
  3. Complexity of the Views Selected in the Panorama Clients
  4. Various Performance Affecting Factors
  5. Load Balancing
  6. Checking Views for Performance Efficiency

 

CPU Numbers

Adding more processors can provide a linearly improved response time. Statistically, a two-processor machine can support 100 users with almost half the response time of a single CPU running 100 users. 

OLAP Database Size

The size of the OLAP database can be characterized by the following elements:

  1. The database size and the fact tables.
  2. The number of cubes and dimensions in each cube.
  3. The dimension size and number of levels.
  4. The percentage of aggregations used in the Analysis Server.

As levels with many members are not usually included in the aggregation by the aggregation wizard, retrieving values for these levels requires massive data transfer from the OLAP sever to the NovaView Intelligence Server.

Complexity of the Views Selected in the Panorama Clients

Panorama provides the user with a rich BI functionality that requires the administrator of the NovaView program to be aware that various operations can require heavy calculations on the server.

The following are some of the operations that may lead to heavy server side calculations:

  1. Nesting (Crossjoins) between Dimensions at Low Levels — Crossjoins can be very expensive and NovaView provides optimizations for this operation using the Optimize Huge Crossjoins option in the crosstab settings for each view.
  2. Filters — Filters, such as Topcount, when operated on large levels can affect performance. For example, requesting data pertaining to the top 10 customers (of 200,000 customers) for Q1 returns to the client all data required to calculate the top 10. The data retrieved is often more than these 200,000 values as the cube has other dimensions on higher levels. The data brought to the client (if aggregations as not built well) contains the raw data to be calculated on the fly to reach these 200,000 values. If the filter is applied to a calculated member (sum of 20 weeks), the data brought to the client can be at least 20*200,000 values.
  3. Formulas / Calculated Members — Formulas (calculated members) performing operations on low levels (such as ‘avg (customers.name.members)’) force bringing back values for all these members, hence affecting the performance.
  4. Free MDX — Free MDX can potentially contain any MDX statement, and therefore it is recommended that the administrator be aware of the MDX queries used. To gauge the performance of a free MDX statement, it is suggested to test the free MDX clauses in the MDX Sample Application from Analysis Services.
  5. Exceptions — Defining bubble up exception on very big levels can affect performance.

 

Various Performance Affecting Factors

The following factors affect performance:

  1. Network Speed — Network speed can considerably affect performance, as big amounts for data can be passed from Analysis Services 2000 and the NovaView Intelligence Server.
  2. Deploying Over https — The SSL encryption consumes CPU and encryption adds considerably to the network traffic.
  3. Security Model — The number of roles and intensive usage of Active Directory may also affect performance.

Load Balancing

Creating a farm of NovaView Intelligence Servers using network load balancing (software or hardware) can statistically improve performance linearly. Less powerful machines can be used in a farm instead of a single stronger machine. This works well if the queries are homogeneous (all return quickly – up to 20-40 seconds). For larger, heavier queries that take much longer and are used more frequently, it is generally better to have stronger machines in the farm.

Checking Views for Performance Efficiency

It is recommended that when designing views with the Panorama Desktop client to always check the performance of your views with the same hardware as your web server (this check can be done on the web server itself). If a view takes many seconds to load in the desktop, it will probably require the same time to load in a web client scenario, which means optimization should be done on such views before publishing them to the web. A similar check can be performed for memory consumption. If after loading a view, the NovaView process in the task manager goes up by 100s of MB, it will require the same amount of memory in the web client scenario. In such cases, you can improve performance by optimizing the view, optimizing cube aggregations, or adding more memory to your web server.

Choosing the Right Hardware Configuration

In general, it is recommended to run the NovaView Server on a dedicated machine. The following table shows the CPU and memory recommended for a dedicated server running the NovaView Intelligence Server. In the table CPU refers to a current accepted server CPU (today 2Ghz and up).

Concurrent Users

10-50 50-100 100-300 300+
Dimensions Size No ‘large’dimensions 2 CPU
x
4 - 6 GB RAM
2 CPU
x
4 - 6 GB RAM
6 - 8 CPU
x
8 GB RAM
Consult Panorama or SI
More than one ‘large’ dimension 2 CPU
x
4 - 6 GB RAM
4 CPU
x
6 - 8 GB RAM
6 - 8 CPU
x
8+ GB RAM
Consult Panorama or SI

Note: A dimension is considered ‘large’ if it contains more than 30,000 members.

Percentage of aggregation is assumed at least 30%

Note that these guidelines are general recommendations based on typical scenarios. Every implementation has specific characteristic that may require deviation from these typical configurations for improved performance. Before selecting the hardware for a BI project, it is recommended to consider all implementation specific scenarios and to test the performance in a testing environment prior to launch.

Optimizing Performance Settings

The following steps are suggested for optimizing performance settings:
  1.  
    1. Cache – The NovaView Intelligence server has its own cache mechanism. However, the cache is most effective only for the initial view. Any operation such as drill-down or slicing requires Panorama to produce a new query to Analysis Service, and hence, the initial cached view can be used. The cache mechanism can provide significant performance improvements when it is important to bring the initial view up fast and when many of the users are satisfied with the information presented in the initial view. For more information on the cache mechanism, see NovaView Administrator 4.0 Administrator’s Guide.
    2. Number of Data Sessions – The Number of Data Sessions is the number of permanent open data communication channels between the NovaView intelligence server and the OLAP server, and can play a key role in optimizing performance. In a multi-process configuration, every session is a separate Windows process. To enable multi-process mode (recommended) check the checkbox below the data sessions property. The sessions are pooled between all the web users. For example, if you have configured the system to ten data sessions, these ten data connections are pooled between the users.

      The more data sessions you set, the more memory is used. We recommend 25 data sessions for an average-to-strong server. There is usually no reason to set more than 25 sessions, as this number will often cover most usage scenarios.

      For memory efficiency, less data sessions can be used, however, the response time can be reduced. From the other side too many data sessions can reduce the response time when you are low on memory.

      The sessions are shared among users by the following parameters:
      1. The server/database/cube names
      2. The users data permissions/roles

      Users requesting information from the same cube with the same data permissions can be connected to the same channel and can benefit from shared OLAP cache.

      Try different values for this setting to locate the optimal value for your organization. Panorama service has to be restarted for changes to take effect.
    3. Queries Timeout – This setting defines the timeout for running MDX queries and its purpose is to protect the server against extremely heavy queries. MDX queries that take a longer time to execute than this property (in minutes) are automatically terminated and the end user receives a timeout message. Standard BI system allows up to 5-7 minutes response time for the web users. Administrators should consider the setting level appropriate for the specific implementation, i.e. it should be low enough to protect against server hanging queries and high enough to let users do their standard analysis.
    4. Idle Timeout – The timeout in hours for idle sessions. Idle sessions should be removed to free up the memory consumed by the OLAP client component (on behalf of the Panorama process). The lower this setting, the better for performance, however it should not be less than 0.2 (12 minutes). Note that killing an idle session also destroys its associated cache.
    5. Number of Threads per Process – This is the number of queries that can be processed simultaneously. A value of 2 (default) for a four CPU machine means that eight queries can run simultaneously. As a general rule, the lower the number, the faster the average response.
    6. Data Session Stack Size –2MB is the default value per session. This default should not be changed.
    7. Additional Connection String Parameters – This enables you to set all the supported Analysis Services connection string parameters for the NovaView Intelligence Server. In most cases, this is unnecessary, unless you wish to control the OLAP communication yourself.
    8. Members Number In Dimension Tree – This setting, which can be accessed in the General tab, is the maximum number of members Web users can see in a dimension slicer dialog. SeeNovaView Web Access v4.0 online help for more information. Preloading the dimension structure for very large dimensions is a time consuming operation, especially for slow networks, which is the primary reason an administrator may wish to reconfigure this limit.

Aggregations

The aggregation level is the most important OLAP side parameter affecting the entire system performance. An aggregation level of 30% is considered the optimal setting; however, sometimes it should be higher. A lower aggregation value can seriously affect some queries performance. Note that it worth trying a higher aggregation level and comparing the response time, of identical queries.

The FirstBlockRowsCount Setting

To improve performance, the NovaView Intelligence Server attempts to send to the end-user only relevant information. For example, there is no need to download thousands of rows if only 50 can be viewed on the screen. The desktop client accomplishes this optimization on the fly by bringing only specific portions of the data grid with respect to what can be seen. The rest is queried on the fly when scrolling the crosstab.

The web client works in a different manner. The initial query brings to the client a certain number of rows, and the rest can be downloaded after clicking on the More button, by demand. See NovaView WebAccess v4.0 online help for more information. The default value is 30 rows, but can be reconfigured with the FirstBlockRowsCount registry setting. Beware of performance degradation when setting to high numbers.

The Optimize Huge Crossjoins Option

This option takes advantage of the NonEmptyCrossJoin (NECJ) MDX function.

The NECJ function can sometimes perform hundreds of times faster than the usual crossjoin. The idea behind this algorithm is to predict empty combinations and to exclude them from the crossjoin even before calculating. There is no performance gain at all if the crossjoin of two or more dimensions is completely full (even slight degradation can be noticed), however, this is not a common case. Usually a crossjoin of any two or more dimensions is very sparse.

As an example, let us imagine standard product and customer dimensions. The NECJ optimization does not contribute if all the customers have non-zero values for all the products (for the current slicing). It is clear that in most organizations customers usually buy only a subset of their products. Another example would be a crossjoin between customers and customer gender, a crossjoin that is half-empty.

In NovaView this optimization can be set per view. By default, it is disabled and can be turned on by checking the Optimize Huge Crossjoins option in the advanced properties of the grid. Remember to save the view after the change. See NovaView Desktop 4.0 Users Guide Access for more information.

The feature improves performance when the following conditions are large dimensions (and not measures) and are nested in the crosstab.

Calculated members are removed from the NECJ result by design (Microsoft AS). To overcome this limitation you can select the Remove Calculated Members from Huge Crossjoins option in the advanced properties of the grid. When unchecked, Panorama removes dimensions with calculated members from the NECJ calculation and performs a regular crossjoin operation on these dimensions to retain their calculated members. SeeNovaView Desktop 4.0 Users Guide Access for more information.

If still experiencing difficulties with this optimization, you might need to set default real coordinates for special dimensions or perform more customizations specific to the view.

Optimize Sparse Dimensions

There are certain views that can be optimized with the Optimize Sparse Dimensions feature. The improvement is most noticeable when large sparse dimensions are in the grid (without nesting).

To enable this feature, move the dimension to the slicer bar, locate to its properties and enable the Optimize sparse dimension query in crosstab option. For more information, see NovaView Desktop 4.0 User’s Guide.

This feature can be applied per view and per dimension in a view. Remember to save the view after the change.

The mechanism of this optimization is similar to the above-described NECJ optimization. However, the Optimize Sparse Dimension feature does not require nesting. It “filters” the empty dimension members before running the actual view query. Thus, if only few dimension members appear in the resulting crosstab it is unnecessary to pre-calculate all the dimension coordinates, and the dimension can shrink down many times before the main query.

Note: Starting from NovaView 4.0 SP1 and HotFix5, this feature is upgraded with new special optimization mechanism. Contact Panorama support for more details.

Pre-calculating Calculated Members

This optimization method replaces the OLAP on-the-fly calculations with SQL pre-calculated data. Not all the calculations can be pre-calculated on the relational database level, as some require a ready multidimensional structure. However, many organizations do simple calculations such as adding measures or multiplying measures by some ratio. All these calculations are performed on the fly per each query execution. That means that every user requesting data for a calculated member asks the system to perform the calculations from scratch.

Calculated members can negatively affect performance in the following ways:

  1. The cache is not effective for calculated members.
  2. Calculated members are not retained when using the NECJ optimization.
  3. Calculated members require additional care from the developer such as controlling the solve order.

For these reasons in many cases adding some columns/rows to the relational database that replace OLAP calculated members, can significantly improve performance.

This optimization may be implemented differently for calculated measures than for calculated dimension members.

The following examples are provided:

  1. Measures - M1 and M2 are elements in the fact table. M3 is the sum of M1+M2. Add a new column to the fact table and store the pre-calculated sum there. Define a new measure in the cube.
  2. Members - A and B are two members in the DIM dimension. C is the sum A+B. Add a new row for C in the DIM dimension table and run a procedure to add pre-calculated data for all additional measures rows in the fact table. Note - The fact size table should be considered in this case and the performance gain compared to storage used and to the cube process time that will eventually increase.
    This method of optimization requires a considerable amount of work, so performance gain should be analyzed for some sample set of members and the return value considered before proceeding to converting all the calculated members.
     
    Free business joomla templates