Friday, April 4, 2008

Database Concepts or Database Keywords

Data Mart - A single star schema, i.e. dimensions and a fact table that stores data at the detail level. A Data Mart covers a specific area of the business: billing, inventory, transactions, claims, etc.

Data Modeling - Performing analysis on the business processes and data to discover attributes of, and relationships between, data elements. An Entity Relationship Diagram (ERD) is the implementation of a data model. Another way to think of it is to discover the business rules of the data elements.

Data Warehouse - The data warehouse to me is more of a logical concept. All of your data marts (with, hopefully conformed dimensions) plus any ODS or active staging areas. By active, I mean updated in real-time by operational systems primarily in support of the warehouse. To me the warehouse is all of the systems that are required to answer the end users' decision support questions across an enterprise. This is different than the Enterprise Data Warehouse (EDW) that is the granddaddy data source for all decision support. A data warehouse can reside in a single data or in many databases.

Dimension - A textual table in a star schema. The textual data describes a fact. Dimensions are the context of a measurement (fact). The dimensions included in the primary key of a fact table implement the grain of the fact table.

Dimensional Modeling - Data Modeling with the goal of extracting facts and defining descriptions of those facts from business systems. The relationships between data elements should be known before Dimensional Modeling starts (by previously performing Data Modeling to, at least, the entity level).

DML - Data Manipulation Language. Language used to add (insert), update, or delete records in a database table.

Fact - Measurement. In general, it is the numeric value that is acted upon by a user using a data mart.

Foreign Key - A foreign key (FK) is a link between two tables, i.e. a child or parent record. A foreign key is usually link to a primary key in a different table.

Index - A narrow table, usually containing a subset of a table's columns, to speed access to a particular record or set of records. An example would be an index on a name field if you frequently search on the name column.

Natural Key - A natural key is a key that has meaning. An employee's social security number might be used to uniquely identify employees in an HR system (I would NOT recommend this but I have seen it). That would be a natural key.

ODS - Operational Data Store. An ODS exists primarily when data quality is less than optimal. When an operational system cannot provide clean data, it can feed an ODS where the data is cleansed and/or transformed. An ODS is optional and is not a focal point of the warehouse. An ODS can have many sources of data and there can be many ODSs.

OLAP - Online Analytical Processing. Now called Business Intelligence (BI) for the most part. This includes data mining. OLAP used to be the queries that took hours to run but more and more OLAP is happening in real-time. OLAP is extracting historical, statistical, etc. information from the raw data.

OLTP - Online Transaction Processing. This used to be synonymous with real-time processing but now everything needs to happen in real-time. OLTP tends to be single, small statements from end point applications, i.e. POS, Entry Screens, embedded devices, log files, etc.

Oracle Designer - CASE tool developed by oracle. Designer is an end-to-end database design tool. I personally use it for its ERD functionality. At times I feel like even that is way too cumbersome.

Primary Key - A primary Key (PK) uniquely identifies a single row in a table.

Relational Database - RDBMS. A data store designed to store data in tables and columns with pre-defined relationships between tables. Primary and foreign keys define relationships.

Replication - Copying data from one database to another (Publisher and Subscriber Concept).

Schema - A set of related database objects. An HR schema would store personnel and payroll information; a GL schema would store the chart of accounts, etc.

Surrogate Key - A surrogate key is a sequential number that is meaningless. In oracle it's usually created by a sequence.

2 comments:

  1. Great list of database an data modeling terms. I hope to be blogging about it soon.

    Some other terms you could add:

    DDL
    XML Schema
    Alternate Key
    Entity
    Logical Data Model
    Physical Data Model

    Keep up the good work,

    Karen

    ReplyDelete
  2. Surrogate Keys are not meaningless and do not have to be sequential. They are unique identifiers. That would be the same thing as saying social security numbers are meaningless.

    ReplyDelete