Universe Design Best Practices

I’ve always avoided writing a blog on universe best practices, primarily because there have been many others who have written some excellent blogs on the topic. However, I have had quite a few requests to write about it, and have finally agreed to compile my own list of Best Practices for universe design. After all, the universe is one of the best advantages SAP has over other BI tools. So, a well thought out, well built universe is key to a successful deployment.

SAP BusinessObjects includes two tools for building and maintaining universes: Universe Design Tool (the legacy tool formerly known as Designer) and Information Design Tool (the newer tool, first released with version 4.0). These Best Practices will, for the most part, apply to both tools. Where they don’t, I’ll be sure to point that out.

I categorize all Best Practices as follows:

  • User Friendliness: These are practices that will make a universe easier for end users to use. After all, what we do is for their benefit. Anything we can do to make the universe easier for the users, we should do. Keep in mind that happy users will say nice things about you to your boss. 😉
  • Designer Friendliness: At some point, another developer will take over the universe(s) that you have built. Let’s build our universes in such a way that they won’t have to reverse engineer everything we did, just to figure it out.
  • Database Friendliness: We want our universes to return the correct results, as fast as possible. So, it’s best to design them so that they generate the most efficient SQL for the database.

User Friendliness

A universe is only as good as the users think it is. No matter how brilliantly I design a universe, if the users don’t like it, it’s not a good universe. In fact, good universe design is critical to the success of the entire BusinessObjects deployment. If users don’t like the universes, they will abandon the system, and get the data they need some other way. But the more they like the universes, the more they will use them, and the more they will promote them among their co-workers.

Who is the best person to build a universe? Perhaps a Database Administrator (DBA)? Actually, while a good DBA will be critical to the process, they don’t speak the same language as the users. So, while they can build a universe that is technically excellent, the users won’t be able to figure them out. I usually prefer to have a Business Analyst, who know SQL, be the one to create the universe. It tends to be more user friendly this way. But, in order to do a good job, you must involve other people. So, with that in mind, here is a list of Best Practices for making your universes user friendly.

  • Before you start building the universe, gather a team of power users, perhaps 3 – 5 people, who will assist you in making the universe user friendly. Throughout the process, they can review your design and make sure that folders, objects, descriptions, etc., make sense to them. As an added benefit, these power users will be your marketing team when the universe goes into production.
  • Develop and use a naming convention for everything that the users see. Universe names should reflect the name of the system from which they draw data. Folder and object names should be user friendly. Never use underscores in anything the user sees. And never merge words together, such as PhoneNumber. Put a space between the words. And don’t use abbreviations except where it makes sense to users. For example, use Phone Number, rather than Phone No.
  • Objects should be organized in folders in such a way that it makes sense to the users. Design your folder structure ahead of time, and have your Power User team verify it for you. Do not create a folder for each table. Users don’t care about tables.
  • Within a folder, objects should be arranged alphabetically or hierarchically, depending on the needs of the users. You want it to be as easy as possible for the users to find the objects they need.
  • Never have more than three levels of folders. If users have to open more then three folders to get to the objects, they will get very frustrated. Use separator objects, rather than folders, to organize objects, especially when there are only a few objects. What is a separator object? It’s an object that doesn’t serve any purpose other than to provide visual separation between groups of objects.
    Separator ObjectsIf you create separator objects, make sure that they cannot be used in Results, Conditions, or Sorts.
  • Never create a folder for one object. Find another place for it.
  • If the same object appears in multiple universes, use the same name for that object in all universes. For example, don’t call it Client in one universe, and Customer in another. If it’s the same thing, call it the same thing.
  • Never use the same name for multiple objects within the same universe. If, for example, you have Year in multiple folders, which is probably a bad idea to begin with, give them different names, like Fiscal Year and Calendar Year.
  • Make sure that the universe has a user friendly name and description. Check with your Power User team, to make sure that these make sense to them. And remember, no underscores.
  • For date objects, never, ever create Automatic Time Hierarchy objects. Take the extra five minutes to create them the correct way. If you don’t know what Automatic Time Hierarchy objects are, don’t worry. You don’t need to know. These objects were a bad idea to begin with. In fact, IDT doesn’t even have this feature.
  • Only create objects that are needed for reporting. Don’t create an object just because there’s a column in a table. If it isn’t required, don’t create it. This will keep the universe smaller, and easier for the users to use. You can always add more objects later, as the need arises.
  • Make sure that your contexts have user friendly names and descriptions. And make sure that users aren’t prompted to select a context, unless you want them to have the option to select a context. You can add tables to objects to force them through a specific context. I find this very useful.
  • Lists of Values (LOVs) can be very helpful, or very painful, for the users. Make them as user friendly as possible. Build them from dimension tables, not fact tables. If it’s a large list, use Database Delegated, so the users can easily search for what they need without having to scroll through a long list. Use a static LOV for lists that never, or rarely, change.
  • Never use default hierarchies/navigation paths. They rarely make sense. Create at least one custom hierarchy. If you don’t, the default hierarchies will be made available.
  • Every object, without exception, should have a description. Make sure that the descriptions are approved by your Power User team. For measure objects, include the calculation in the description. Be thorough, but not verbose. Believe it or not, verbose object descriptions can affect query performance. Don’t ask why. I don’t know. But I’ve seen it with my own eyes, and confirmed it with SAP.
  • Format all numeric and date objects in the universe. Of course, users can change the format in the report, if they wish. But if you do it in the universe, users won’t have to reformat the data every time they build a new report.

Designer Friendliness

As a universe moves from one developer to another, we want to make sure that there is sufficient documentation within the universe, so that it’s as easy as possible for a developer to understand the universe. To that end, here is a list of things you can do to make your universes more Designer Friendly.

  • Keep the Structure Pane (UDT) or Data Foundation (IDT) well organized and clean. I like to place fact tables in the middle, and dimension tables to the left or right. I just find it’s easier to identify fact tables if I do that consistently. Don’t have tables overlapping each other. Try to keep tables that are joined together, close to each other. It just makes it easier to see the relationships.
  • If you create alias or derived tables, put comments next to them to explain their purpose.
  • Use a standard naming convention for Alias and Derived tables. I like to prefix alias tables with A_ and derived tables with DT_. This makes them easier to identify. And never use spaces in Alias and Derived table names, or columns. It can make SQL code tough to read.
  • When you create derived tables, write the SQL in such a way that it’s easy to read and understand.
  • In UDT, LOVs should be named the same as the object to which they belong. This makes troubleshooting much easier.
  • There is a comments box in the universe that only developers can see. Use it. It’s a great place to keep notes about the universe that other developers can use. You can use it to explain who built the universe, why things were done the way they were. Be verbose. You can put a lot of information in them.
  • Create a folder in your universe called Universe Release Notes. In that folder, create an object for each release of the universe. The name of the object will be the release date. The description of the object will be the notes for that release. Make sure the entire folder is hidden, so users can’t see it.
    Lost-ReleaseNotes
  • Always set cardinalities on your joins. It makes it easier to see how tables relate to each other. It also makes the detection routines work more accurately.
  • Only use Derived Tables when necessary. In fact, use them as a last resort. Often times, there’s a better way to do it.

Performance Friendliness

One of the most important things we can do for our users is to make sure that we are creating the universe in such a way that it generate the most efficient SQL for the database. The users will appreciate faster running queries, and so will the DBAs. So, keep these best practices in mind when building universes.

  • Don’t use Aliases to avoid creating contexts. Aliases and contexts have different purposes. Contexts, if properly built, won’t help your queries run faster, but will cause less confusion with the users.
  • Every measure object must have an aggregate function in the Select. No exceptions. If it doesn’t get aggregated, don’t make it a measure.
  • Use Index Awareness wherever applicable. This will allow the queries to search indexed columns, and can sometimes reduce the number of joins and tables in the SQL.
  • Set the ANSI92 parameter to Yes. This will generate more efficient SQL for modern databases. It will also allow the use of full outer joins in the universe.
  • If your database supports it, set the FILTER_IN_FROM parameter to Yes. This will move filters from the Where clause to the From clause and may eliminate some of the processing time at the database.
  • You may also get improved performance by setting the JOIN_BY_SQL parameter to Yes. However, this needs to be tested with some of your largest queries, as it can consume large amounts of database resources in very large queries.
  • In the connection, set the Array Fetch Size to at least 250. But test it at higher levels as well. Depending on the database, and network bandwidth, you could go as high as 1000 and get better performance.
  • Enable Query Stripping in IDT and Web Intelligence. This will remove unused objects from the SQL as it is sent to the database.
  • In the SQL for objects, keep the transformations to a minimum. For example, only use CASE statements when they are really needed. The more you do, the slower the queries will run.
  • Avoid clusters of tables that aren’t joined together. If objects shouldn’t be used together in a query, they should be in separate universes. Otherwise, you risk Cartesian Products.

Conclusion

When it comes to user friendliness, the bottom is this: If there’s anything that I can do in a universe to make it easier for the users, I should do it. If it takes me an hour to do something that will save 5 minutes for a user, then I will do it. If it saves 5 minutes for one user, it will save five minutes for more users. It will be worth the effort to have happy users.

As for designer friendliness, anything I can do to help the next developer understand the universe, I should do. It will also help me when I come back to this universe. It’s good to keep notes in a universe that will refresh my memory as to why things are done the way they are.

And, of course, a universe that performs well will gain appreciation from the users and DBAs alike.

I hope you have found this post to be useful. Feel free to share additional tips in the comments. I love learning from my readers.

Review of Information Design Tool 4.1

It’s been a while since I wrote my initial review on Information Design Tool (IDT) in SAP BusinessObjects 4.0. If you’ve read that post, you know that I was frustrated with missing features in IDT, as compared to Universe Design Tool. Every time I found a missing feature, I opened an Idea on the SAP Idea Place, requesting that the missing feature be added. Well, as it turns out, I was the top contributor for new ideas for IDT 4.1. And nearly all of my requests were delivered in version 4.1. So, I’ve been using IDT 4.1 for quite some time now, and am ready to give my impressions.

The Good News

IDT has some really nice features that I find very helpful when doing universe development.

  • Families: Tables can be grouped into families, and each family can have its own color. I will typically have a color for each fact, and a color for conformed dimensions. Aggregate tables will also have a color. I find it helps to be able to visualize the tables in this way.
  • Multiple Layers: I really like the idea of creating a Data Foundation, then one or more Business Layers on top of it. It avoids the old way of recreating the table structure in multiple universes.
  • Business Layer Views: Once you create a Business Layer, you can create views, in which you choose which folders and objects to expose in each view. You can also create security profiles, allowing groups access to limited views.
  • Standalone Lists of Values and Prompts: Prompts and LOVs can be created once, as standalone objects, and then used in multiple objects throughout the universe. These can be created in the Data Foundation or the Business Layer.
  • Table associations: This may seem minor, but I love being able to see what tables are associated with an object without opening a separate window.
  • Queries: Queries can be built in the Business Layer to test the objects and joins in a universe. This eliminates the need to use another tool, such as Web Intelligence, to test as you go.
  • View Data: We’ve always had the ability to view the data in a table or column, but now we can view the data in multiple columns from different tables. This is a big help when troubleshooting the Data Foundation.
  • Calculated columns: These are columns that can be added to tables, using SQL calculations. This allows you to add a calculated column to a table, and then use it in multiple universes.
  • Bug Free: Well, not really. But compared to IDT 4.0, it’s a huge improvement.
  • Multi-Source Universes: We now have the ability to create a universe from multiple sources. We can create a multi-source universe (MSU) and include tables, for example, from Oracle and Teradata, and join them together.

The Bad News

There are still some things that I struggle with. Perhaps it’s because I’ve spent so much time in Designer, or perhaps it’s just poor design. You decide.

  • Contexts: These aren’t as simple as advertised. In Designer, each join has one of two states, relative to a context. It is either in, or out, of the context. In IDT, it has three possible states. It can be included, excluded, or neutral. My experience is that this makes contexts significantly more complex. I worked on one universe that had 65 aggregate tables, plus two transaction fact tables. Setting up the context became so complex and problematic, that we ended up rebuilding the universe in UDT.
  • Screen Layout: I haven’t gotten used to the way the screen is laid out. I got used to Designer, with it’s individual pop up windows. IDT doesn’t have these, so you get less screen real estate. And, it doesn’t allow anything to be undocked and moved to a second monitor.
  • Custom Properties Tab: You’ll see this on folders, objects, and a few other things. It’s useless. It’s there for whenever we get an IDT SDK. So, in the meantime, there’s nothing you can do with it. So it should be hidden. Of course, that’s just my opinion.
  • Alias Table Names: Again, this may seem minor, but it’s a major inconvenience. When you have an alias of a table, the Alias name, and the original name, are next to each other in the table header, and don’t wrap. So the width of the table in the Data Foundation must be sufficient to fit both names. This can be very annoying, especially with long table names. In UDT, the original name is below the Alias name, so the table doesn’t have to be so wide.
  • Multi-Source Universes: Why is this listed here, as well as under Good News, above? Well, when you create a universe, you must select whether that universe will be a Single-Source, or Multi-Source, universe. Once you make that selection, you can never change it. And, if you choose Multi-Source, all queries will be routed through the Data Federation Engine, even if the universe actually has only one connection. This can slow down performance of that universe.

Should you convert your universes?

That’s a good question. I’m glad you asked. At this point, there is little benefit to the users in converting your old UNV universes to UNX universes. The only reasons you might convert them is for the benefit of universe developers who want to take advantage of some of the features, such as families, or standalone LOVs, or if you need to use these universes with products that don’t accept UNV universes, such as Lumira.

You might wonder if you should convert old universes, so you can turn them into Multi-Source universes. When a UNV universe is converted to a UNX universe, it is converted as a Single-Source universe. And remember, you cannot change a Single-Source universe into a Multi-Source universe. So, don’t convert UNV universes with the hope of making them Multi-Source. It can’t be done.

The last I heard, SAP had promised to support UNV universes until at least 2019. So, there’s little compelling reason to convert them. And if you’re comfortable working in UDT, feel free to continue. However, new universe development should be done in IDT, as much as possible. All BusinessObjects tools can access UNX universes.

Conclusion

Although IDT still has some issues, I like it. IDT 4.1 is a huge improvement over IDT 4.0. Yes, it still has bugs (So does UDT), and it still has room for improvement, but it’s getting better all the time. Let me know your opinion.

Impact of a Database Migration

For the past few years I’ve been involved in quite a few “forklift” projects. What is a forklift project? I’m glad you asked. When a company decides to move their data warehouse from one database platform to another by simply moving, or copying, all the existing tables into the new database, this is called a Forklift. The tables are simply lifted from one database, moved to the new database, and set down there. The data model isn’t changed. That’s a whole other project. But often times, some things change in a forklift project. Data types may be different between two different databases. Or one database may have different reserved names than another, causing table names, or columns names, to be changed. But, for the most part, everything stays the same.

My role in the project is to convert the BusinessObjects universe(s) and reports to the new database platform. I’ve done this enough times, that I’ve developed some standard templates to help me make sure that I don’t miss anything. I don’t want to get to UAT (User Acceptance Testing) and find out something doesn’t work that I forgot to check.

So, what happens in a Forklift project? Well, from a BusinessObjects perspective, we would prefer that the database structure be complete before we start development. This includes the data access layer, if needed. It would be nice to have some data in the tables during development, but we can get started without data.

Documentation

Sometimes we like to treat documentation like it’s a bad word. After all. it’s so much faster to just do the work, and skip the documentation. But trust me on this: In the long run, documentation will be absolutely necessary. You need to know where you were, and what you’ve done, in case you ever need to revert, or fix, anything in a universe or report.

I always start with a workbook, with a spreadsheet for each part of the universe. I create one workbook for each universe that will be converted. I have a template that I use for each project. The workbook has eight sheets in it, as follows:

  • Parameters: It’s not uncommon to change universe parameters, in order to optimize the universe for the new database. On this tab, we document each parameter that is changed.
  • Tables: Most of the time, the database/owner name will change for each table. And sometimes, table names will change. On this tab, we document the old database/table names, and the new database/table names.
  • Derived Tables: The SQL in each Derived Table will need to be converted from the old database to the new database. On this tab, we document the old SQL and the new SQL. Note that, for very large, or complex, Derived Tables, we may have to use a Text application, such as Microsoft Word, to document the old and new SQL.
  • Objects: On this tab, we document the old and new SQL for each object that needs to have the SQL changed. we don’t document objects that don’t need to be changed.
  • Joins: Any joins that have database functions may need to be modified to work with the new database. On this tab, we document the old and new SQL only for the joins that are changed.
  • Parameters: This tab is only used with UNX universes, to document any changes that need to be made to parameters, if any.
  • Lists of Values: If there are Lists of Values (LOVs) with custom SQL in them, they may also need to be converted. On this tab, I only document the LOVs that have been modified.
  • Other: This is just a tab to document any changes that don’t fit into the other tabs. It might include things like connection and/universe name changes.

I can’t emphasize enough the important of documentation. When you start testing reports, if the values don’t match from the old universe to the new universe, you may need to re-convert the SQL from old to new. Having this documentation will make it easier to check your work. Also, once you have converted snippets of code, you may be able to re-use that with the next universe, without having to convert it again.

Converting the Universe

OK, so you have the documentation ready, so now it’s time to convert the first universe. It would be best if you can freeze development on the universe at this time, so you’re not trying to convert a moving target. A very simple universe can be converted within an hour. A very complex universe may take a week or two to convert. Be prepared, and set expectations accordingly.

I also like to set up a temporary sandbox environment where all universes and reports will be converted. If this isn’t possible, it can also be done in the Development environment. Before you start the steps, below, make sure you create a connection to the new database, and make sure it works. Make sure that the account that you are using to connect to the database, has read permission for all the database objects that will be needed for the universe.

Following are the steps that I use to convert a universe:

  1. In the Sandbox/Development environment, create a copy of the universe. Give it a new name, that reflects the target database.
  2. Open the new universe in Universe Design Tool (UDT) or Information Design Tool (IDT), depending on the type of universe.
  3. Change the connection for this universe to the connection for the new database. Make sure that you test the connection.
  4. Change any parameters that need to be changed, in order to optimize the universe for the new database. Document your changes.
  5. For each table in the universe, change the owner and/or table name, as needed.
  6. Go through each Derived Table, and update the SQL for the new database. If needed, get help from a DBA, so the SQL is written for optimal performance for the database.
  7. Run an integrity check, parsing objects, joins, and conditions. This will tell you what objects may need to be updated with new SQL.
  8. Go through each object, join, and condition, that produced an error in the integrity check. Update the SQL, as needed. Again, make sure that the SQL you write is optimized for the database.
  9. Look for LOVs that might have custom SQL, and modify as needed. If an LOV is market to “Export with universe”, it may have custom SQL.
  10. In a UNX universe, look for parameters that may need to be modified, and fix, as needed.
  11. Save and export the universe.

Reports that are associated with this universe will need to be converted from the old universe to this new universe. Once you have done that, test the reports to make sure they run successfully. If they throw errors, work to find out why, and fix those issues. It’s a good idea to test the same reports against the old and new universes. The data may not match at this point, depending on what database you are connected to (Development or Production), and the status of loading data into the new tables. I’ve had times when I tested a report against the new universe, only to get errors, then later found out that the same report got the same errors against the old universe. In that case, just leave it alone.

Converting Reports

Do you have reports with custom SQL in the queries? If so, you will need to convert those queries manually to the new database. Just like you did with the Derived Tables in the universe, you will want to get help from a DBA to convert the SQL, making sure that it’s optimized for the database. Large, complex SQL queries may be tricky to convert, so take your time, and make sure that it is converted correctly. If the query is very complex, I like to break it down into small pieces, and convert each piece, one at a time.

Conclusion

Once you have all the universes and reports converted to the new database, you’re ready to start User Acceptance Testing (UAT). The primary goal of UAT is to make sure that reports return the same results against the old and new databases. If the results are different, don’t assume that the problem is with the universe. Make sure that you check the database, as well, making sure that the data matches between the old and new databases. If it does, then check the universe, and make sure everything used for that report was converted correctly. Make sure that you update your documentation each time you make a correction in the universe.

You’re all done. It’s time to relax in the sun with your favorite drink. That is, until the next project comes along.

Optimizing Universe Lists of Values

Lately, I’ve seen a lot of heartache over universe Lists of Values (LOV). Lists of Values can be a great benefit to users, as well as your worst nightmare for developers and DBAs. So, I thought it might be handy to document some of the best practices for keeping your LOVs user friendly and performing well. Let’s start by going over the various options for LOVs, just so we have a good understanding of what is available, and what makes sense.

There are two times when a user may see a list of values for an object. When they are building a filter in a query, based on a universe object, they may want to choose values for the filter from a List of Values. Or, when a prompt is applied to an object in a query filter, the user may see a LOV at run time when the prompt window appears. Each object in a SAP BusinessObjects universe has the below options available for LOVs. Make sure you put some thought into these options for every object in a universe.

In Information Design Tool (IDT) LOVs can be created in the Data Foundation or in the Business Layer, as standalone objects. If you create the LOV in the Data Foundation, it can be used with any object, in any Business Layer built on that Data Foundation. You can create a LOV using custom SQL, or by importing or creating a static list.

If you create the LOV in the Business Layer, you can use that LOV with any object in that Business Layer only. However, in addition to creating a LOV with custom SQL, or making a static list, you can also create a LOV using a query panel. We’ll talk about static lists later.

Associate a List of Values

This check box determines if the object will have a List of Values. If this box is not checked, the users will not be able to choose from a list when selecting values in a query filter or query prompt. Many dimension or detail objects will need a LOV. However, there are many objects that shouldn’t have a LOV. For example, measure objects don’t need LOVs, as it doesn’t make sense for them. Date/Time objects should not have LOVs. In fact, Date objects usually won’t need a LOV, as the user can select from the calendar widget, or type in the date they want.

What about objects with very large LOVs? For example, SKU Number could have millions of distinct values. So it may not make sense for that object to have a LOV. However, there may be a way to provide a LOV for SKU Number, while keeping the list manageable. Stay tuned.

Allow users to edit this list of values

This is an amusing one, as it tends to scare many developers. Fear not, this isn’t a bad option. It’s just a somewhat obsolete option. Let me explain. When a user of Desktop Intelligence (Deski) creates a query, the universe and associated LOV files are downloaded and stored locally on the client’s machine. Deski allows users to edit their own copy of the LOV for an object. For example, they can add columns, or filters, to the query panel that generates the LOV, so it only returns the values that interest them. This is a good thing.

Deski is currently the only product that allows users to edit their LOVs. I suspect that eventually Web Intelligence (Webi) will also have this feature. But for now, only Deski. And since Deski doesn’t exist in 4.x, this options is currently obsolete in 4.x. So, there’s no need to un-check this option.

Automatic refresh before use

This is one of those options that can be good, or bad. If you have a small list, that is fairly dynamic, such as Employee Name, it might be good to use this option. However, make sure that the list can be generated in two seconds or less. You don’t want users waiting for large lists to generate every time they use them. If the list is very large, this can cause significant pain for the users. Use this feature with caution.

On the other hand, if the list is fairly static, don’t use this option. For example, a LOV for State Name isn’t likely to change any time soon. So there’s no need to refresh it on every use.

Hierarchical Display

This is another feature that was designed for Deski. If you have a LOV that is based on data that is part of a hierarchy, you can add the other objects of the hierarchy to the query for the LOV, and have them displayed in hierarchical folders. For example, a LOV for City can be displayed in a hierarchy that starts with Country, then State or Region, then City. This feature only works well with Deski. So, if you’re not using Deski, don’t use this.

Export with universe

The LOV is not stored in the universe. It is stored in a separate file with a .lov extension. If you make any edits to the LOV, those edits are also stored in the LOV file. For example, if you have modified the query that generates the LOV, that query is stored in the LOV file. Or, if you have created a static LOV, that data is stored in the LOV file.

So, in order for the users to benefit from the changes you have made, you will need to export that LOV file along with the universe, into the repository. Also, if you have applied Index Awareness to an object (that’s a topic for another post), the indexes are stored in the LOV file. You will need to export that LOV with the universe.

Delegate search/Allows user to search values in the database

This is an option that can be beneficial for large lists. Looking at the earlier example of SKU Number, which could have millions of values, you can use this option to provide the users with a LOV, but without having to generate a huge list.

Here’s how it works. When you turn on this option, and a user attempts to access the list of values for an object, they will see a blank list, and a set of instructions telling them to enter a search string, including wild card characters. For example, a user could enter “XYZ%” and they will get a list that lists all values that start with XYZ.

This is called Delegate Search because you’re allowing the users for search for what they need, and you are delegating that search to the database, rather than searching the LOV file for specific values.

Note that, in IDT, there is a related option called Force users to filter values before use. This option allows users to enter search values to filter the existing LOV.

List Name

This is the name of the .lov file for this list of values. In earlier versions of BusinessObjects, we were limited to eight characters, and no spaces. But times have changed. You can now create a file name that has more than eight characters, and includes spaces. I like to give it a name that relates to the object, or the data in the LOV. That way, it’s easier to find the file if I ever have to do troubleshooting.

In the new universe tool, Information Design Tool (IDT), LOVs can be standalone objects that can be reused for multiple objects. But did you know that you can use one LOV for multiple objects in Designer as well? For example, let’s say you have an object that will return only the values of Yes, No, or Maybe. So, you generate a static LOV file, and name it YesNoMaybe. Make sure you check the “Export with universe” box. Now, if you have other objects that will return the same three values, enter the same name for those object’s LOV, but don’t check the “Export with universe” box for those. You only need to check it for one object. In this way, all objects with the same name for their LOV will share the same LOV file.

You will notice that the default file name is pretty cryptic, especially if the object was created by using the New Object button. If you ever want to return to the default name (I have no idea why you would want to), click the Restore Default button.

Edit

If you want to customize the list for an object, click the Edit button. This will open the Query Panel for this universe. You can use the Query Panel to customize the query, or you can click the SQL button on the Query Panel tool bar to write custom SQL for this LOV.

If you want to create a hierarchical list (discussed earlier), make sure that you leave the current object, as it is, in the Query Panel, and add the rest of the hierarchical objects after that, in hierarchical order. For example, if you want to create a hierarchical list for City, then City must be the first object in the Query Panel. Then, after that, add the rest of the hierarchy from top to bottom. So, the objects in the Query Panel might be as follow, in this order:

City – Country – State – County

I know, that’s counter intuitive, but trust me. It works. Also, make sure that you apply sorts to each of these objects in the query, in the order of the hierarchy.

You can also use custom SQL to create a list with values that don’t appear in the database. For example, maybe to want your City LOV to include a “All” value, in addition to the list of cities. So you might write custom SQL as follows:

Select Distinct City
From dbo.location
Union
Select ‘ All’
From Dual

Obviously, this is Oracle SQL, but it will work with any database with minor modifications. Notice that I put a Space in front of the word “All”. This will make sure that All is at the top of the list. This allows you to use an old trick, which allows users to select All from the list to get All cities in their results.

When you edit the query for a LOV, the Export with universe button is automatically checked. Don’t un-check it. 😉

Display/Preview

This button is more important than it may seem. When you click this button, the list is generated, and displayed. In other words, the query for this LOV is run, then the results are displayed. Of course, the query is only run if you haven’t run the query before, or if you have the “Automatic refresh before use” option turned on. Otherwise, this button will simply display the existing list.

So why is this button so important? Well, if you have the “Export with universe” option checked, then you need to generate the LOV file in order to have it exported with the universe. If you don’t generate the file, then there is nothing to export.

Query Limits

In IDT, you have the option of setting limits on the query that generate the LOV. You can set a time limit, as well as a row limit. By default, neither of these are turned on. I would not suggest setting a row limit. There’s nothing more frustrating to a user than a partial LOV.

Static Lists of Values

A static LOV is a list that never changes, unless you take manual action to change it. The most common method of creating a static LOV is to import a file, such as an Excel spreadsheet, or a text file. Depending on which tool you’re using, the method, as well as the options, will vary.

So let’s start with Designer (a/k/a Universe Design Tool). If you want to create a static LOV, you start, of course, by creating a text file, or spreadsheet. Simply create a file with the needed values in a column. Make sure that the first row is the name of the object, or list.

Next, go to Tools – Lists of Values – Edit a list of values. Find the object for which you want to create the static list, and click on it. Then, at the bottom of the window, click the Personal Data radio button.  This will open the Access Personal Data window. Browse for the file. Make sure you select the “First row contains column names” option. Then click Run. This will generate the LOV file from the text file or spreadsheet.

In IDT, you have the additional option of entering the list manually, simply by adding rows and typing the in values. In the Data Foundation or Business Layer, go to the Parameters and Lists of Values section. Next, click the New button at the top of the Lists of Values window, and select Static list of values. If you want to create the list here, click the Add Column button on top to add additional columns, or click the + button on the right to add additional rows. You can change the name of the column by double-clicking on the column header.

If you want to import an existing file, click the Import button, and browse for the file. Complete all the relevant options, including “First row contains column names”. When you click OK, the list is imported.

Applying a LOV to an Object

In IDT, once you create a LOV, you must then apply it to one or more objects. To do this, go to the Business Layer, and select the object. Next, click the Advanced tab, and you will see the LOV options on the right. Make sure that Associate a List of Values is selected. Then, click the little ellipses button on the right of the field on the right. Select the appropriate LOV and click OK. Note that if you click the X button to the right of the LOV name, it will remove that LOV, and the object will simply use the default “Select Distinct” LOV.

Sourcing a List of Values

Where do you get the data for a LOV? Well, ideally, from a small dimension table. But what happens when you have a dimension object built against a large fact table, with millions, or billions of rows? Well, by default, the LOV will be generated by running a “Select Distinct” against the fact table. This can take a significant amount of time to populate.

So the first thing you want to do is to see if the same values are available in a dimension table. If so, re-point your LOV to the dimension table. Or, if possible, re-point the object to the dimension table. But, at the very least, get the data for the LOV from the dimension table. However, if the data is not available in a dimension table, talk to your DBA, and see if they can create a reference table with the distinct values you need for the LOV. If the list is fairly dynamic, the reference table can be updated with the nightly ETL process.

The bottom line is that you want to avoid getting LOVs from fact tables. Users will get very frustrated waiting minutes for the LOV to populate.

Conclusion

As you can see, there’s a lot more to a List of Values than meets the eye. There are many options, and getting them right can make a significant difference for the users. Take the time evaluate each object, and set the correct options that make sense for that object. Let me know if you have additional tips for Lists of Values.

Simplified Date Prompts

I was recently on a project that involved a lot of work on existing universes. One thing that I noticed with these universes was that they incorporated a lot of clever date prompts, inviting the user to select, or enter, a date range for the query. The List of Values for these date prompts included values such as Yesterday, PriorWeek, MTD, YTD, Prior Month, Prior Year, User Input, etc. If the user chose User Input, they would then enter the begin and end dates for the date range. On the back end, each of these selections was converted to the correct dates, and, using large CASE or DECODE statements, the required dates would be applied as a condition. It was very clever, but presented some problems.

The first problem was that, depending on which filter was used, the date calculations could be slightly different. Not all filters used all options, which is fine, but because the calculations weren’t the same from one object to another, the results could be different. Also, the options could have different names from one universe to the next. This can cause confusion among users.

The second problem was maintenance. If any options needed to be modified, or added, the universe developer would need to find every place in the universe where these date prompts appeared, and make the needed changes. There were often dozens of these prompts throughout a universe.

So, I spent a little time to develop a more elegant solution, that would provide a consistent experience for the users, as well as a much easier maintenance experience for the developers.

Step 1: Create a Derived Table

Ideally, this table would be created in the database, and updated every night through a scheduled script. However, if that isn’t possible, the table can easily be created in the universe as a Derived Table. I call the table DT_DATE_PROMPTS. The table has three columns:

  • USER_CHOICE – This is the value that the users will see in the LOV.
  • BEGIN_DATE – This is the beginning of the date range for the selected value.
  • END_DATE – This is the end of the date range for the selected value.

The table is simply a series of unions, one for each option. I have attached the complete SQL statements for the Derived Table, for both Oracle and Teradata. If you’re using a different database, you can easily modify this to suit your needs.

TD_Dates OR_Dates

If you want to test this SQL in a tool such as Teradata SQL Assistant, or TOAD, make sure that you comment out the final union, as that won’t run outside of BusinessObjects. However, you can also create the Derived Table in the universe, and use the View Table Values option to test the code as well. It will prompt you for a Begin Date and an End Date. These are for the User Input option, and whatever dates you enter will show on that row.

Feel free to add, or modify, the options in the table, as needed. Note that, when using this table, users will always be prompted for a Begin Date and an End Date, even if they aren’t selecting the User Input option. For this reason, you might want to set a default for those as 11/11/1111. The users can then ignore those prompts unless they want to use that option.

Step 2: Create Aliases and Add Joins

Next, you will create an alias of the Derived Table for each date column in the universe for which you wish to provide date prompts. Then, join the alias to the date, using a Theta Join. For example, let’s say that you want to create date prompts for the Sales Date column in the Sale table. Create an alias of the Derived Table. In this case, we’ll call the alias SALE_DATE_PROMPT. Then, join the alias to the Sale table, using a Theta Join as follows:

SALE.SALE_DATE Between SALE_DATE_PROMPT.BEGIN_DATE And SALE_DATE_PROMPT.END_DATE

Step 3: Create Filter Objects

The filter objects are now pretty simple to create. You simply need to prompt for which option to choose from the alias table. For example, if we now want to prompt for a date range for the Sale Date, we create a filter object called Sale Date Range with the following code:

SALE_DATE_PROMPT.USER_CHOICE = @Prompt(‘Select Frequency’,’A’,{‘Prior Week (Mon to Sun)’, ‘Prior Month’,’User Input’}, MONO,CONSTRAINED)

Notice that, in this case, I have limited the LOV for this prompt to Prior Week, Prior Month and User Input. That’s one of the nice features of this trick. You can prompt for a limited list of values, or all values from the Derived Table. When a user uses this filter, they will be prompted for three values:

  • Begin Date
  • End Date
  • User Choice

Again, they only need to enter actual dates for the Begin Date and End Date prompts if they use the User Input option for the User Choice prompt. Otherwise, they enter dummy dates, such as 11/11/1111 for both.

There you go. You’re now ready to try it out. Drop your new filter into a query and see how it works.

Conclusion

One thing to keep in mind is that Theta joins always have a negative impact on performance. This impact may, or may not, be significant. Work with your DBA to tune the database for optimal performance.

You can create the same Derived Table in multiple universes, thus giving the users a consistent experience across all universes. If you ever need to modify the options available to the users, simple change the code in the Derived Table, and copy that change to all universes. All of the aliases will  pick up the same change, of course. So you only have to make one change per universe.

Simple, and elegant. Let me know what you think.

Solving Aggregate Aggravation

The problem with Aggregate Awareness in a universe is that, when setting it up, you can’t skip any steps. Wouldn’t it be great if we could simplify it by skipping a step here and there? Well, it just doesn’t work that way. So, in this post, we’re going to go carefully through all the required steps (There are five of them) in setting up Aggregate Awareness in a universe. We’ll also explore some unconventional uses for the feature. By the time we’re done, my hope is that you’ll have a quick reference of Aggregate Awareness that you will find useful over and over again.

What are the Five Steps to Setting up Aggregate Awareness?
I’ll start by listing the five steps to setting up Aggregate Awareness in a SAP BusinessObjects universe. We’ll dig deeper into each step in the next section.

  1. Insert the aggregate table into the universe and add the appropriate joins.
  2. Create a context for the aggregate table.
  3. Redefine existing objects, using the Aggregate_Aware() function.
  4. Define Aggregate Navigation.
  5. Test.

Don’t even think about skipping a step. It won’t work.

NOTE: These steps are the same regardless of which tool you use.

Digging into the Five Steps

Step 1: Insert the aggregate table into the universe and add the appropriate joins

First of all, what is an aggregate table? An aggregate table is a table that has measure values already aggregated according to the needs for reporting. For example, you may have a SALE table that contains every sales transaction. But for reporting purposes, you need sales summed up by day for each branch. So, your DBA, or ETL developer, creates a script that extracts the transactional data, sums it up by day and branch, and inserts it into another table. This is a Daily Aggregate table.

Even though you may have multiple aggregate tables in the database, set them up in the universe, one at a time, with Aggregate Awareness. This will make it easier to troubleshoot any issues that may arise in the process. So, insert the table into your Structure Pane, or Data Foundation, and add any joins that may be appropriate for that table. If it is a completely self-contained table, then you may not need any joins. In reality, that is extremely rare. Don’t forget to set the appropriate cardinalities for the joins. As a rule, the aggregate table should be at the “many” end of all of its joins.

Step 2: Create a Context for the Aggregate Table

Keep in mind that an aggregate table is a fact table. And, like all fact tables, should have its own context. So, create a context for the aggregate table. Include all joins that are attached to the aggregate table, as well as any other joins that may be needed for that context. If the aggregate table is a self-contained table, with no joins, then a context will not be needed. But again, this is extremely rare. As a rule, you should run an integrity check at this point, just to make sure you haven’t missed any joins.

Step 3: Redefine existing objects, using the Aggregate_Aware() function

The Aggregate_Aware() function has a minimum of 2 arguments, and no maximum. Each argument is a complete Select statement for that object, in order of preference. For example, if we are setting this up for Sales Revenue, the first choice would be to get the data from the aggregate table. The second choice would be to get the data from the transaction table. So, the select for your Sales Revenue object might look something like this:

Aggregate_Aware(SUM(AGG_TABLE.REVENUE),SUM(TRANS_TABLE.REVENUE))

Since this is a measure object, we must include the aggregate function, SUM, for each argument. If this is a new object, you may need to set the object type and projection aggregate on the properties tab of the object’s dialog box.

Keep in mind that there may also be dimension objects that could access this aggregate table. If your aggregate table has dimension columns, related dimension objects will also need to be redefined.

Step 4: Define Aggregate Navigation

Aggregate Navigation is a tool that allows us to decide which objects in a universe will be incompatible with the aggregate table. For example, if your aggregate table has data aggregated by Branch and Month, then you cannot use the Week Number object with this table. Therefore, if the Week Number object is used in a query, you cannot use the aggregate table for that query. So, we need to tell the universe that the Week Number object is incompatible with this table.

To set the Aggregate Navigation, in Designer, go to Tools – Aggregate Navigation. In Information Design Tool, go to Actions – Set Aggregate Navigation (Make sure you are in the Business Layer). This opens the Aggregate Navigation window. On the left, select the aggregate table. Then, on the right, put a check mark on each object that is incompatible with the aggregate table. DO NOT select objects that are compatible with the aggregate table.

NOTE: If you have set your contexts and cardinalities correctly, you can click the “Detect Incompatibility” button at the bottom. My experience is that this is about 95% accurate, so you will need to select the aggregate table on the left again, and go through and verify that the correct objects are checked.

Step 5: Test

That’s right. You have to test it to make sure you got it right. Try creating some queries, at various levels of aggregation, to make sure you get the correct SQL. Do not skip this step, even if you think you never make mistakes. 😉 And, after you test it, test it some more. Make sure you’re getting the same results at various levels of aggregation. If you get different totals from the aggregate table and the transaction table, there might be a problem with the ETL process that populates the aggregate table. So make sure you test that, as well. If all goes well, you are ready for user acceptance testing (UAT).

Other uses for Aggregate Awareness

Although this functionality was originally developed to simplify the use of aggregate tables in a universe, it can also be used for other purposes within the universe. Let’s explore a few of these.

Building a universe on a Data Marsh

What is a Data Marsh? I’m glad you asked. I coined the term years ago, when I had to build a universe on this type of data source. The client had been running SQL queries against various systems within the company and loading the results into individual tables in a database. Although the data was often related between the various systems, no cleansing of the data had taken place, and no dimensional modeling had been done. There was simply no way to reliably join the various tables.

So, using Aggregate Awareness, I created a single set of dimension objects in the universe, using the Aggregate_Aware() function to select the different tables from which the data was stored. For example, several of these tables had a column for Sales Rep. So, I created a Sales Rep object, using aggregate awareness to select the needed columns from the various tables.

Next, I used Aggregate Navigation to make sure that the correct table was used for the Sales Rep value, depending on which measure was chosen in the query. Measure A, for example was in Table A. So, Measure A was marked as incompatible with Table B and Table C. Therefore, if Measure A was used in a query, the Sales Rep object would use the Sales Rep column from Table A.

Of course, I would prefer to never build a universe on a Data Marsh. But sometimes, you have to work with whatever you are given. A little creativity can go a long way.

Resolving a Fan Trap with Aggregate Awareness

In reality, Fan Traps are fairly rare in data models. But, when it does happen, there are multiple ways to resolve the Fan Trap. What is a Fan Trap? It happens when you get a measure from one table, and a dimension from another table in a one-to-many relationship. The measure, in essence, gets “fanned out”, or multiplied by the number of dimension values retrieved.

Fan Trap

In the example, above, the Sales Revenue will be multiplied by the number of rows retrieved from the INVOICE_DETAIL table. This is what happens in a Fan Trap. The classic way to resolve this is to create an alias of INVOICE_HEADER table, and join the alias to the CUSTOMER table, putting that join in a separate context. Then, get the Sales Revenue from the alias, instead of the original table. Quite honestly, this is the method I usually use to solve a fan trap.

However, a Fan Trap can also be resolved using Aggregate Awareness. As in the classic resolution, created an Alias of the INVOICE_HEADER table, and join it to the CUSTOMER table.

Fan Trap ResolvedNext, define the Sales Revenue as follows:

@Aggregate_Aware(SUM(INVOICE_HEADER.REVENUE),SUM(ALIAS_INVOICE_HEADER))

Next, use Aggregate Navigation to make any objects that come from the INVOICE_DETAIL table, incompatible with the INVOICE_HEADER table. So, Sales Revenue will come from the INVOICE_HEADER table, unless an object is chosen from the INVOICE_DETAIL table.

Conclusion

Is this an exhaustive list of uses for the Aggregate_Aware() function? Probably not. But that wasn’t my intention in writing this article. My intention was to document the steps in setting it up, as well as provide the most common uses. If you have other creative uses for this function, feel free to share it in the comments. Thanks for reading.

Impressions of Information Design Tool

It’s been well over two years since I got my first look at the SAP BusinessObjects Information Design Tool (IDT). However, that was pre-Beta, and the tool has undergone some changes since then. The version that went GA last September, 2011, wasn’t the same version that I originally saw in early 2010. Since the GA release, I’ve spent quite a bit of time with IDT, and would like to record my current impressions.

It’s important to keep in mind that Universe Design Tool (Formerly known as Designer), is a very mature product, and quite stable. IDT, on the other hand, is a relatively young tool, and, as such, has some quirks and bugs. I’ll address some of those here.

Let’s Start With The Good News

If you’ve read my earlier post on What’s New in Universe Design, you would know that the universe has a new structure, consisting of three different layers:

  • Connection Layer (.cnx)
  • Data Foundation Layer (.dfx)
  • Business Layer (.blx)

A Data Foundation can consume one or more connections. So you can bring tables from multiple databases into a single Data Foundation. The Data Foundations contains the tables, joins, and contexts. Optionally, you can build multiple Business Layers on a Data Foundation. The following diagram summarizes the new universe structure.

This structure adds some significant benefits over the old method of building universes. The ability to build a universe with tables from multiple databases has been a long time coming. And, the ability to build the Data Foundation once, and reuse it for multiple universes, means less work in the long run.

These layers are built in a local, or shared, project. One of the benefits of using shared projects is that multiple people can work on the same project at the same time. For example, one person can be working on the Data Foundation, while another is working on a Business Layer.

One of the new features that I really like is the ability to add calculated columns to tables. You can add a column to a table in the Data Foundation, based on a calculation using columns from that table, or constant values. Once created, the column appears as just another column in the table.

You can also create time hierarchy columns automatically. This is so much better than the old Automatic Time Hierarchy feature.

I also like the new Families feature. A Family is a collection of related tables. Families can be identified by colored backgrounds and fonts. You can create as many families as you want, and add as many tables as you want to each family. A table can belong to one, and only one Family. You can then search for tables by Family, Name, Columns or Contexts. You can search by wildcards as well.

Another feature that has good potential is that Lists of Values, as well as Prompts, can be built as standalone objects in the Data Foundation. This means that they can be reused in multiple objects in a universe. But they can also be reused in multiple universes. However, the process of creating them is cumbersome and quirky. The various options aren’t well organized, and require too much experimentation to get them right. But once built, they work quite nicely.

In the Universe Design Tool (UDT), we’ve always had an issue with testing as we build. We could test in Desktop Intelligence, or, more recently, in Web Intelligence Rich Client. There is a Query Panel in UDT, but it’s mostly useless. However, in IDT, we can fully test as we go, without relying on another tool. We can build, and execute, queries, within the Business Layer. These queries can be saved in the universe so they can be used again. Once we run these queries, we can analyze the results with sorts, filters, and changing the order of the columns.

A feature that completely eliminates the need for Linked Universes is Business Layer Views. When you create a Business Layer, that becomes the Master layer. You can then create Views of that Business Layer, choosing which folders and objects to expose in that view. You can then grant rights to those views for various user groups.

Not All News is Good News

As I mentioned, IDT is a fairly young tool, and, as such, has some bugs and missing features. In fact, some of these missing features can cause huge problems.

I always recommend applying formatting to every numeric object in a universe. After all, we don’t want to force the users to apply formatting every time they use these objects. In UDT, we could select multiple numeric objects at once, and format them all at the same time. However, in IDT, we have to format objects one at a time. And the process of creating a format is very weird. For example, here is what the format for typical currency might look like:

That’s right. You can’t simply type in the format. You have to use the objects on the left, and add the items to your format. And, to make matters worse, once you create a custom format, you can’t save it. If you want to format another object the same way, you have to create the same custom format again. It’s very time consuming, and will likely discourage many people from formatting objects in the universe.

In the UDT, when we are using the @Select or @Where functions, we can check a box to “Show object SQL” to see the full SQL of the object. It’s very helpful for troubleshooting. Unfortunately, this option is missing from IDT. There is no way to see the full SQL when using these functions.

Another missing feature that is, in my opinion, a serious issue, is the ability to change a dimension to an attribute (Formerly known as detail), or vice versa. For example, if you create an object as an attribute, and later decide that it needs to be converted to a dimension, so it can be included in a Navigation Path (formerly known as Hierarchy), you can’t do it. You have to recreate it as a dimension. This, of course, would cause existing reports, that use that object, to fail. In my opinion, this is a huge problem.

The process of converting UNV universes to the new UNX format is very buggy. I have seen parameters get changed, and universe settings also get changed. As my friend, Dallas Marks, discovered, even the eFashion universe cannot be converted.

Now, to be fair, SAP seems to be doing a good job of listening to our concerns with the new tool. I have submitted quite a few ideas on the SAP Idea Place, and several of them have already been accepted. That impresses me. Now we’ll see how quickly these issues are addressed. In the meantime, I suggest keeping your current UNV universes in their current format, and creating new universes in IDT.

If you’ve worked with IDT, what is your experience so far?

Performing a Universe Review

Over the past decade, I have built many universes, taught the universe design class, and performed many universe reviews for various clients. Through it all, I’ve developed a standard methodology that I use for reviewing universes. In fact, I highly recommend developing a governance process for universes (as well as documents), that includes this review before universes are put into production. Remember, the users see the universe as their primary entry point into the systems that contain the data they need. If the users have a bad experience, they will have a bad impression of BusinessObjects, as well as any system attached to BusinessObjects.

When I perform a review of a universe, I document my findings into three categories:

Performance: These are issues that will affect the performance of queries built on this universe. We want to implement techniques that will maximize performance.

User Friendliness: We want to give the users a positive, professional experience. We don’t want them seeing the universe as poorly developed, or unprofessional.

Designer Friendliness: Be nice to your fellow designers. Eventually, someone else will need to work on the universes that you’ve built. So make sure that they can see quickly what you did, and why you did it.

Let’s start with performance related items. Then we’ll move on to user friendly topics, and finally, designer friendly topics.

Performance

This can vary depending on the database you’re using. Get familiar with the database you’re using. Your DBA can help with this.

So the first thing I go through are the settings in the universe parameters box, starting with the universe connection. In the connection settings there are a few items that can impact performance:

  • Connection pool mode: This should always be set to Disconnect after each transaction. This prevents users from tying up sessions at the database. If the database is maxed out on sessions, then new sessions will have to wait for a session.
  • Array fetch size: This determines the number of rows return in each fetch (or packet) from the database. The higher you set this, the fewer packets will need to be sent across the network. Depending on the database, I usually find somewhere between 300 and 1000 works well. Your mileage may vary.

Next, I look at the universe limits. These are not strictly related to performance, but can be an indicator of how the universe will perform. The universe should have a reasonable row limit and time limit. If either are set too high, I ask why. For example, if a universe has a row limit of 1 million rows, something is seriously wrong somewhere.

The next thing I look at are the Parameters on the Parameter tab of the Parameters box (there’s a tongue twister). Some of these settings can affect performance. Here are some:

  • ANSI92: This will force the universe to generate ANSI standard SQL in the queries, often using the best performance methods.
  • FILTER_IN_FROM: This will put the joins and filters in the FROM clause of the SQL. This is particularly helpful if you have partitioned tables in the database, or are using a Massively Parallel Processing (MPP) database, such as Teradata. Note: FILTER_IN_FROM cannot be used unless ANSI92 is set to Yes.
  • JOIN_BY_SQL: This can be useful with universes that have contexts. When set to Yes, it will merge the “Joined” queries across multiple contexts into a single SQL query with multiple Selects, and a FULL OUTER JOIN between each. This can reduce the volume of data returned to the report, as it causes the database to merge the results, rather than having the report merge the results.

Next, I start looking through the universe for transformations. A transformation is any SQL that transforms the results. For example, CASE statements, or Substring functions, perform transformations on the data. Transformations can exist in Derived Tables, Joins, Filters or Objects. You can’t avoid having transformations in universes. However, the more you have, the slower your queries will be. So I look for transformations that are costly, and could be pushed down to the ETL process. That way, the data is transformed in advance, rather than on the fly.

I also look for usage of Index Awareness in the universe. This feature can reduce the number of tables being accessed, thereby reducing the number of joins in the SQL. It can also make better use of indexes in the database.

If there are any isolated tables, I check each one to see if any visible objects are built from them. Needless to say, such objects, when used in a query, could cause a Cartesian product.

I look through every measure object, to see if it has an aggregate function. Every measure object should have an aggregate function in the Select. If you don’t want the data to aggregate, make it a dimension, not a measure. I have seen many universes with measure objects that only have a projection aggregate, not a Select aggregate. That can cause huge performance issues.

There may be other things I notice along the way that affect performance, but these are the most obvious ones.

User Friendliness

Whenever I’m working on a universe, the most important thing for me to remember is that I am doing this for the users. They are my customers, and it’s important for them to be happy. If they’re not happy with the universe, they’ll go elsewhere to get the information they need. So, keeping them happy is key to maintaining the entire system. So, we want the universes to be user friendly. So, when doing a universe review, these are some of the things I look for.

Let’s start with one of my basic rules of user friendliness: Nothing that the user sees should have underscores. Why? Because it tell the user that “this is going to be technical”. IT people love using underscores, and for very good reasons. But the rest of the world doesn’t. So keep it friendly. Don’t use underscores in the universe name, context names, class names, or object names. They should all be in proper case.

So, I first start with the name of the universe, and make sure it is user friendly. Then I look at the description. The universe description is our chance to tell the user what kind of information they can get when using this universe, and how up to date the data is. Again, it should be in user friendly terms.

In the universe Parameters, make sure that Cartesian Products is set to Prevent, not Warn. Most users won’t know what a Cartesian product is, so don’t let it happen.

If the universe has multiple fact tables, I take a look at the contexts. They should have a user friendly name and description. If the user is prompted for a context, the name and description should be friendly enough to help them make the right choice.

Next I look at the names of the classes and objects. Again, the name should be user friendly. Example of object names that are NOT user friendly would include the following:

  • CUSTOMER_ID
  • Customer Id
  • Customer Addr
  • Phone No
  • Start Dt

User friendly names would include the following:

  • Customer ID
  • Customer Address
  • Phone Number
  • Start Date

You get the idea. Make the names simple, but friendly. If more information is needed, include that in the object’s description.

Speaking of descriptions, every object should have a description. Don’t assume that the name is obvious, and a description isn’t needed. So I look at every object to see if it has a description, and the description is user friendly. If you really want to get fancy, you can even color code your object descriptions. Also, I check to see if the descriptions have misspelled words, or poor grammar. We want the users to see the universe as professionally built.

If I see any objects with a Where clause defined, I raise a red flag on that. It may be OK, but, as a rule of thumb, we want to avoid using the Where clause of an object, as it could cause conflicts between multiple objects.

I also check to make sure there aren’t too many classes, and they aren’t more than 3 levels deep. We don’t want the users to have to go on a treasure hunt just to find the objects that they need. Rather than too many classes, we can use Separator Objects, as described in this previous post.

While I’m checking the objects, I also check the List of Values (LOV) for each object. Measure objects, of course, should not have a LOV. LOVs should also be avoided for date objects, as they can be very large, and are not needed. For all other objects, there are ways to make the LOVs user friendly, such as adding columns to them, or using database delegation for large lists.

I also look for objects with the same name. Every object should have a unique name. This avoids confusion with the users.

One thing that often is overlooked in universes is object formatting. Every numeric object, whether detail, dimension, or measure, should be formatted in the universe.

Finally, I look at the hierarchies in the universe. Default hierarchies should never be used, as most of them won’t make sense for drilling. So make sure that hierarchies are set to custom only.

Designer Friendliness

We all like to think that we will be the only one to work on our universes, but the reality is that, at some point in time, someone else will need to work on a universe that I created. So, as part of reviewing a universe, I make sure that best practices are implemented to make this universe easier for the next person to work on.

Let’s start with something real simple. Every LOV should have a name that relates it to the object to which it belongs. So I look to see if the LOV has the default name (bad), or a meaningful name (good). Sometimes we need to track down the LOV file on a user’s PC. So meaningful names makes it easier.

If the cardinalities are not set on the joins, it makes no difference to the users, or the database. But they should all be set anyway, for several reasons:

  • It makes the structure pane easier to “read”, as you can more easily identify fact tables.
  • It makes detection routines, such as loop detection, work correctly.

If I see any isolated tables, I expect to see some verbiage next to them explaining why they are there. Perhaps they have been aliased, and the aliases are all that is being used. In fact, it can also be helpful to add some text next to alias tables explaining what they are used for.

Documentation can be very helpful, not just for other designers, but even for the original developer. You can add notes in the structure pane. You can add information in the Comments field of the universe Parameters box. You can also add comments in the Select of your objects. But make sure you use comment tags, so the database knows to ignore them.

Conclusion

Before a new universe goes into production, you can perform this universe review as part of the governance process. By doing this, you can make sure that the universe will make a good first impression on the users. And that’s very important.

Do you have a formal governance process for universe development? Does it include such a review? Feel free to comment, below.

Universe Contexts in a Nutshell

Way back in the second millennium, when I started doing universe development, one of the things that scared me most about the process, was Contexts. I did anything, and everything, to avoid using contexts in a universe. My first production universe was developed without contexts, even though contexts were really needed, and would have made the universe much more user friendly. Instead, I created aliases of nearly every dimension table, so as to avoid any loops. Here’s a frightening picture of that universe.

Of course, I look back at that universe and think, “If only I knew then what I know now.” You see, once I truly understood universe contexts, and their proper use, I found them easy to implement, and easy to maintain. So, let’s start with the basics.

When do I use Contexts in a Universe?

It’s really quite simple. If your universe has multiple fact (transaction) tables, you will need one context for each fact. And just to be clear, for this purpose, I am defining a fact table as the lowest level of transactional data. There may be multiple tables that belong to one fact, such as Invoice Header and Invoice Detail. In this case, the Invoice Detail table would be considered the lowest level of transactional data.

So, if your universe has only one fact table, you don’t need any contexts. But if you have multiple facts, you will need contexts.

OK, So what is a Context?

A context is a list of joins that define a logical path through a universe. Contexts are used to resolve loops that are caused by multiple fact tables accessing common dimension tables. The joins that belong to a context are the joins from the fact table, leading to any and all dimension tables that might be needed for that fact. In the picture below, the SALE_MODEL table is the fact. The joins highlighted in blue, are all members of the Sales Context.

As you can see, the highlighted joins include not only the joins that touch the SALE_MODEL table, but also joins that extend to distant dimension tables, such as REGION, that contain data related to the fact table.

IMPORTANT: Once you add contexts to a universe, all joins must be a member of at least one context (Shortcut joins are an exception to this rule). Joins that are not members of a context are called isolated joins. Isolated joins will not be recognized by the reporting tools. (Note: This rule will go away in BI 4.0.)

As you can imagine, joins that do not connect directly to a fact table, can be in multiple contexts. For example, in the image above, the join between the CLIENT and REGION tables would be used for both Sales and Rentals, so it would be a member of both contexts.

When properly implemented, contexts will prevent Fan Traps and Chasm Traps from returning incorrect results. If a user runs a query which includes objects from two different contexts, Web Intelligence will generate two separate SQL statement, and join the results on the common dimensions after results have been retrieved.

If you are using universes with Crystal Reports, you will need to change the value of the JOIN_BY_SQL parameter to Yes. This is because Crystal Reports doesn’t support the generation of separate SQL statements like Web Intelligence. So the JOIN_BY_SQL parameter will cause Crystal Reports to generate multiple select statements in a single SQL sentence with a FULL OUTER JOIN between them.

How do I create a Context?

There are several methods for creating contexts. They can be created manually, or automatically. The most accurate method is to create them manually. If you choose to create them automatically, you will still need to manually check to make sure that the context is correct.

To create the context manually, Ctrl-Click on the joins connected to the fact table, as well as all joins to dimension tables that contain data related to that fact. Once you have them all selected, click on the Insert Context button.

This opens the New Context box.

Enter a user friendly name for the context, and a user friendly description. More on this later. Click OK, and you have a new context created.

If you wish to create your contexts automatically, make sure that all joins have the correct cardinalities set. The detection tool uses the cardinalities to determine which joins to include in a context. There are several ways to create a context automatically, but they all work basically the same way, so we’ll just talk about one way.

Once you have verified that your cardinalities are set correctly on all joins, click the Detect Context button.

First, this button will look for fact tables. It recognizes a fact table as a table that is at the many end of all of its joins. Next, it checks to see if there is an existing context by the same name as that fact table. If not, it formulates the context, based on the cardinality of the joins in the universe, and recommends the context. You can then choose to accept, or reject, the proposed context. If you accept it, you can rename it with a more user friendly name. Once you have added the context, you can edit the context by double clicking on it in List View.

Editing an Existing Context

There are several things that may need to be edited in  a context. These include the name, description, and which joins are included. To edit a context, open List View, and double click on the context in the right pane. In the Edit Context box, you can change the name, add, or edit, the description, or add or remove joins. All joins will be shown in this box, but only the ones highlighted in blue are members of the context. To add a join, find it in the list, and click on it. To remove a join, find the highlighted join in the list, and click on it. (NOTE: You don’t have to hold down the Ctrl key when clicking on the joins in this box.)

What about the User Experience?

There has been an eternal debate regarding the user experience with contexts. When you add contexts to a universe, users may occasionally be prompted to select a context when running a query. There are those who believe that this should never happen. I disagree. I believe that it should be very rare that a user is prompted to select a context. But I wouldn’t say it should never happen. A context prompt, in my mind, is no different than any other prompt. It’s a way to allow the user to select query options at run time.

Having said that, I find that many universes prompt for contexts unnecessarily. We want to avoid this. Most of the time, a user is prompted for a context for one of two reasons:

1. The query includes no objects from a fact table. In this case, the prompt may be legitimate.

2. The universe isn’t built properly.

If users are being legitimately prompted for a context, that’s fine. Make sure that the contexts have user friendly names (Proper case and no underscores), and a user friendly description, that helps the user make the right choice. Note that, in Web Intelligence, if the user is prompted for a context, the prompt will appear each time the query is refreshed. This is the default behavior, but can be changed in the query properties.

If the users are being prompted for a context, but shouldn’t be, then you may need to take steps to minimize this. For example, if an object should always use one context, you can force it through that context by adding the fact table from that context into the Tables button of that object. This will prevent that object from prompting for a context. This is an excellent method for keeping those prompts to a minimum.

In Conclusion

When properly implemented, Contexts make universes much simpler for the users, as well as the developers. If you take the time to understand the correct use of Contexts, you will find them to be quite simple to implement and maintain. And if you don’t get it right the first time, fear not. Contexts are easy to edit.

Improving Query Performance (At the database)

I recently was assigned to a project at a company with a rather large BOBJ deployment. Their production environments contained well in excess of 100,000 Web Intelligence documents. These documents are running against a rather sizable data warehouse. My job was to reduce the burden on the data warehouse. Tens of thousands of queries were hitting the data warehouse on an average day. If we could reduce the CPU time at the database, we could prolong the eventual expansion of hardware in the data warehouse.

I spent a few weeks examining the environment, the universes, and some of the slower documents. I also spoke with some of the bright people there, and eventually came up with some ideas for improving query performance. Note that my focus was reducing CPU time on the database. So I wasn’t looking to improve report performance, but some of these ideas will have that effect anyway. There may also be other ways to improve report performance that will have little or no affect on database performance.

Query Stripping

No, I’m not talking about having a bunch of naked queries. I’m talking about stripping out everything that isn’t needed in your queries. Often times, we find that Web Intelligence documents are not developed from scratch. Existing documents are copied, and re-used. Users will take existing reports, copy them, then add what they need to make a new document. The problem is that, while they add new objects and/or queries, they don’t always remove old objects and queries that are not needed for this new document.

The end result is a document that contains far more queries than are needed, as well as more granularity of data than is needed.

Web Intelligence XI3.1 SP3 contains a new feature called Query Stripping. You’ll find it as a check box in t he document properties. When enabled, this feature will automatically remove any queries and/or objects from the SQL sent to the database that do not directly contribute to the report(s) in the document. This is a cool feature. However, it is a feature that is only available when using universes built against OLAP cubes. The vast majority of universes are built against relational databases, and this feature won’t work for those universes. 😦

BI 4.0 doesn’t have this limitation. Query Stripping should be available for all data sources in BI 4.0. So, until you upgrade to BI 4.0, you’ll have to manually strip your queries. This can be tedious, but may be worth the effort. So, since we have to do this manually, let’s look at the things that will get us the biggest bang for our buck.

Remove any unneeded queries. The fewer queries we run, the less burden we place on the database. So, purge and delete any queries that are no longer required for the report(s).

Remove any unneeded measures. As you may know, if a query has measures from multiple tables, Web Intelligence will split them into multiple queries, one for each measure from a different table, before sending the queries to the database. This is to avoid SQL traps that can yield exaggerated results.

Therefore, removing a single measure object from a query can eliminate a query sent to the database, provided that no other measures are coming from the same table. Reducing the number of queries sent to the database not only reduces the burden on the database, but it improves overall query time in the document.

Schedule Purging

Another problem that comes with time: documents that are scheduled to run on a scheduled basis, often continue running long after they are no longer needed. Part of the cause of this problem is that, by default, scheduled documents will run for ten years before they expire. I can’t imagine very many documents remaining useful for that long.

You can use Auditor to find documents that are no longer being viewed. Scheduled documents that are no longer used are still running their queries, and, therefore, are adding overhead to the database. Removing these schedules will free up space in the File Store, as well as free up resources on the database server.

While reviewing scheduled documents, also take a look at Publications. These may be harder to to track down, as Auditor won’t tell you if someone opened the email they got with the document attached. In this case, you will have to survey the recipients and ask if they still use the documents they receive. If no one responds telling you that they use it, feel free to remove the schedule. You can always add it back later if someone complains.

Index Awareness

This is a feature of the universe, and can be used to speed up queries significantly, depending on your database, and the types of queries being written. It can speed up queries in several ways:

  • It can remove joins from the SQL
  • It can remove tables from the SQL
  • It can search indexed, rather than non-indexed, columns

How does it work? Well, if a filter is added to a query that searches for a value in a non-indexed column, Index Awareness will edit the SQL to search for a matching value in an indexed column. In the example below, we’re searching for sales revenue for a specific client. Without Index Awareness, the SQL matches what is in the query panel.

First of all, notice that Client Name is a transformation. It’s the concatenation of two columns in the database. Searching a transformation is never going to be fast, as it cannot be indexed in the database. Also, notice that the CLIENT table is in the SQL, and it is joined to the SALE table. With Index Awareness enabled, the same query looks like this.

Instead of searching a Client Name, the SQL is now searching the Client ID, which is an indexed column. In addition, since the Client ID exists in the SALE table, the SQL no longer needs the CLIENT table at all. It can remove the CLIENT table, as well as the join to the CLIENT table. This query will run much faster.

Index Awareness is enabled in the object, in the universe. In this case, Client Name is defined with Index Awareness. On the Keys tab of an object, a Primary Key can be defined, as well as one or more Foreign Keys.

The Key values are then stored in the LOV for the object. When a user selects a value from the LOV, the SQL will switch the value to the indexed column. Of course, if the user does not select from the LOV, but types in the value instead, Index Awareness is not used. So you will need to train your users to select from the LOV.

Index awareness works best when used on a transactional database, or a snowflake schema. Star schemas are usually already optimized for queries, so you may not get as much improvement with a star schema.

Pushing Transformations

Complex SQL transformations are often seen in universe objects, and are often an easy way to get the data to appear the way we need it for reporting. In the previous example, Client Name was a simple transformation. Here’s an example of a complex transformation:

‘FY’ || substr(trim(( FISCAL_CALENDAR.FISCAL_YEAR*100+FISCAL_CALENDAR.FISCAL_QUARTER)),3,2) || ‘Q’ || substr(trim(( FISCAL_CALENDAR.FISCAL_YEAR*100+FISCAL_CALENDAR.FISCAL_QUARTER
This type of transformation can be quite costly in terms of query time. I recently ran a test with a similar transformation. I ran a query with, and without, the object that included this transformation. Without it, the query ran 14% faster. Of course, your mileage may vary, as there are many factors that come into play with this type of test. But be assured, if you can push this type of transformation down to the ETL process, your queries will run faster. Keep in mind that this transformation must be performed on every row of the results. And, if this object is commonly used in reports, it could be hitting the database hundreds, or thousands of times a day.
So, if possible, let the ETL process handle those complex transformations.
Conclusion

These are just a few ideas for improving performance. We could come up with more ideas that will lighten the load on the BOBJ server, or in the document. Improving performance will not only make your users happy, but will ultimately save money, as you can avoid expanding your database servers and BOBJ servers unnecessarily.
What other ideas have you implemented to improve performance? What results did you get?