Where is Your Data?

Over the years, I have noticed a rather disturbing trend in businesses, both large and small. While sales and operational data is often stored in databases, forecast data is often maintained in spreadsheets. Good forecasting is critical to an organizations ability to deliver goods or services. The failure to do proper forecasting can be disastrous for any size organization. If we produce too much product, we waste money. If we produce too little product, we run the risk of disappointing, and losing, our customers. We have to forecast demand as accurately as possible, for every product or service that we offer.

Part of the process of refining the forecasting process involves a comparison of forecasted results to actual results. If the forecast was significantly wrong, we need to find out why, and use that information to improve the process next time.

Nearly every organization that I have worked with has an interest in reporting actual results compared to forecasted results. That’s the good news. The bad news is that, with forecasted results stored in spreadsheets, comparing the two becomes challenging. Why is this so? Isn’t it pretty simple to build reports using data from databases, and data from spreadsheets, at the same time? Yes, it is. But let’s look at reality.

Let’s consider an organization that has ten different product lines, and a manager for each line. Each of the ten managers does their own forecasting, in their own spreadsheets, with their own formatting. The reporting team has to pull data from ten different spreadsheets, into a single report, and compare the results with actuals, retrieved from a database. However, each of the ten spreadsheets are formatted differently. The data in them is not always in a table format, which is how we need it.

So, the reporting team is tasked with having to reformat the spreadsheets, so the data can be brought into the reports. And each month, they go through the same process, reformatting the spreadsheets. Even if they can convince each manager to use the same format, which would be a miracle, what’s to keep one manager from changing his/her mind down the road, and changing the format again?

I can tell you from experience, reporting from spreadsheets is a challenging task, at best.

Considering the importance of forecasting, I can’t help wonder why forecasting is done with spreadsheets. Why isn’t it done with forecasting applications that store the data in a database? In the long run, the cost of the forecasting application would be far outweighed by the cost savings in eliminating the maintenance of spreadsheets.

Yes, there are plenty of organizations that store forecasted data in databases. These are the ones who have already seen the value of doing it right. It’s a pleasure to build reports for them. But they are few and far between.

Where does your organization store its forecast data? Tell me what you think.

Improving Query Performance (At the database)

I recently was assigned to a project at a company with a rather large BOBJ deployment. Their production environments contained well in excess of 100,000 Web Intelligence documents. These documents are running against a rather sizable data warehouse. My job was to reduce the burden on the data warehouse. Tens of thousands of queries were hitting the data warehouse on an average day. If we could reduce the CPU time at the database, we could prolong the eventual expansion of hardware in the data warehouse.

I spent a few weeks examining the environment, the universes, and some of the slower documents. I also spoke with some of the bright people there, and eventually came up with some ideas for improving query performance. Note that my focus was reducing CPU time on the database. So I wasn’t looking to improve report performance, but some of these ideas will have that effect anyway. There may also be other ways to improve report performance that will have little or no affect on database performance.

Query Stripping

No, I’m not talking about having a bunch of naked queries. I’m talking about stripping out everything that isn’t needed in your queries. Often times, we find that Web Intelligence documents are not developed from scratch. Existing documents are copied, and re-used. Users will take existing reports, copy them, then add what they need to make a new document. The problem is that, while they add new objects and/or queries, they don’t always remove old objects and queries that are not needed for this new document.

The end result is a document that contains far more queries than are needed, as well as more granularity of data than is needed.

Web Intelligence XI3.1 SP3 contains a new feature called Query Stripping. You’ll find it as a check box in t he document properties. When enabled, this feature will automatically remove any queries and/or objects from the SQL sent to the database that do not directly contribute to the report(s) in the document. This is a cool feature. However, it is a feature that is only available when using universes built against OLAP cubes. The vast majority of universes are built against relational databases, and this feature won’t work for those universes. 😦

BI 4.0 doesn’t have this limitation. Query Stripping should be available for all data sources in BI 4.0. So, until you upgrade to BI 4.0, you’ll have to manually strip your queries. This can be tedious, but may be worth the effort. So, since we have to do this manually, let’s look at the things that will get us the biggest bang for our buck.

Remove any unneeded queries. The fewer queries we run, the less burden we place on the database. So, purge and delete any queries that are no longer required for the report(s).

Remove any unneeded measures. As you may know, if a query has measures from multiple tables, Web Intelligence will split them into multiple queries, one for each measure from a different table, before sending the queries to the database. This is to avoid SQL traps that can yield exaggerated results.

Therefore, removing a single measure object from a query can eliminate a query sent to the database, provided that no other measures are coming from the same table. Reducing the number of queries sent to the database not only reduces the burden on the database, but it improves overall query time in the document.

Schedule Purging

Another problem that comes with time: documents that are scheduled to run on a scheduled basis, often continue running long after they are no longer needed. Part of the cause of this problem is that, by default, scheduled documents will run for ten years before they expire. I can’t imagine very many documents remaining useful for that long.

You can use Auditor to find documents that are no longer being viewed. Scheduled documents that are no longer used are still running their queries, and, therefore, are adding overhead to the database. Removing these schedules will free up space in the File Store, as well as free up resources on the database server.

While reviewing scheduled documents, also take a look at Publications. These may be harder to to track down, as Auditor won’t tell you if someone opened the email they got with the document attached. In this case, you will have to survey the recipients and ask if they still use the documents they receive. If no one responds telling you that they use it, feel free to remove the schedule. You can always add it back later if someone complains.

Index Awareness

This is a feature of the universe, and can be used to speed up queries significantly, depending on your database, and the types of queries being written. It can speed up queries in several ways:

  • It can remove joins from the SQL
  • It can remove tables from the SQL
  • It can search indexed, rather than non-indexed, columns

How does it work? Well, if a filter is added to a query that searches for a value in a non-indexed column, Index Awareness will edit the SQL to search for a matching value in an indexed column. In the example below, we’re searching for sales revenue for a specific client. Without Index Awareness, the SQL matches what is in the query panel.

First of all, notice that Client Name is a transformation. It’s the concatenation of two columns in the database. Searching a transformation is never going to be fast, as it cannot be indexed in the database. Also, notice that the CLIENT table is in the SQL, and it is joined to the SALE table. With Index Awareness enabled, the same query looks like this.

Instead of searching a Client Name, the SQL is now searching the Client ID, which is an indexed column. In addition, since the Client ID exists in the SALE table, the SQL no longer needs the CLIENT table at all. It can remove the CLIENT table, as well as the join to the CLIENT table. This query will run much faster.

Index Awareness is enabled in the object, in the universe. In this case, Client Name is defined with Index Awareness. On the Keys tab of an object, a Primary Key can be defined, as well as one or more Foreign Keys.

The Key values are then stored in the LOV for the object. When a user selects a value from the LOV, the SQL will switch the value to the indexed column. Of course, if the user does not select from the LOV, but types in the value instead, Index Awareness is not used. So you will need to train your users to select from the LOV.

Index awareness works best when used on a transactional database, or a snowflake schema. Star schemas are usually already optimized for queries, so you may not get as much improvement with a star schema.

Pushing Transformations

Complex SQL transformations are often seen in universe objects, and are often an easy way to get the data to appear the way we need it for reporting. In the previous example, Client Name was a simple transformation. Here’s an example of a complex transformation:

‘FY’ || substr(trim(( FISCAL_CALENDAR.FISCAL_YEAR*100+FISCAL_CALENDAR.FISCAL_QUARTER)),3,2) || ‘Q’ || substr(trim(( FISCAL_CALENDAR.FISCAL_YEAR*100+FISCAL_CALENDAR.FISCAL_QUARTER
This type of transformation can be quite costly in terms of query time. I recently ran a test with a similar transformation. I ran a query with, and without, the object that included this transformation. Without it, the query ran 14% faster. Of course, your mileage may vary, as there are many factors that come into play with this type of test. But be assured, if you can push this type of transformation down to the ETL process, your queries will run faster. Keep in mind that this transformation must be performed on every row of the results. And, if this object is commonly used in reports, it could be hitting the database hundreds, or thousands of times a day.
So, if possible, let the ETL process handle those complex transformations.
Conclusion

These are just a few ideas for improving performance. We could come up with more ideas that will lighten the load on the BOBJ server, or in the document. Improving performance will not only make your users happy, but will ultimately save money, as you can avoid expanding your database servers and BOBJ servers unnecessarily.
What other ideas have you implemented to improve performance? What results did you get?