Saturday, March 15, 2008

Database Concepts

 


• What is database or database management systems (DBMS)?
A collection of programs that enables you to store, modify, and extract information from a database. There are many different types of DBMSs, ranging from small systems that run on personal computers to huge systems that run on mainframes.
The following are examples of database applications:
* computerized library systems
* automated teller machines
* flight reservation systems
* computerized parts inventory systems

• What is difference between DBMS and RDBMS?
A DBMS has to be persistent, that is it should be accessible when the program created the data ceases to exist or even the application that created the data restarted. A DBMS also has to provide some uniform methods independent of a specific application for accessing the information that is stored.

RDBMS is a Relational Data Base Management System Relational DBMS. This adds the additional condition that the system supports a tabular structure for the data, with enforced relationships between the tables. This excludes the databases that don't support a tabular structure or don't enforce relationships between tables.

Many DBA's think that RDBMS is a Client Server Database system but thats not the case with RDBMS.

Yes you can say DBMS does not impose any constraints or security with regard to data manipulation it is user or the programmer responsibility to ensure the ACID PROPERTY of the database whereas the rdbms is more with this regard bcz rdbms difine the integrity constraint for the purpose of holding ACID PROPERTY.

• What are CODD rules?
A relational DBMS must use its relational facilities exclusively to manage and interact with the database.
The rules:

These rules were defined by Codd in a paper published in 1985. They specify what a relational database must support in order to be relational. These rules have been considerably extended in reference [1].
1. Information rule

* Data are represented only one way: as values within columns within rows.
* Simple, consistent and versatile.
* The basic requirement of the relational model.

2. Guaranteed access rule

* Every value can be accessed by providing table name, column name and key.
* All data are uniquely identified and accessible via this identity.

3. Systematic treatment of null values

* Separate handling of missing and/or non applicable data.
* This is distinct to zero or empty strings
* Codd would further like several types of null to be handled.

4. Relational online catalog

* Catalog (data dictionary) can be queried by authorized users as part of the database.
* The catalog is part of the database.

5. Comprehensive data sublanguage

* Used interactively and embedded within programs
* Supports data definition, data manipulation, security, integrity constraints and transaction processing
* Today means: must support SQL.

6. View updating rule

* All theoretically possible view updates should be possible.
* Views are virtual tables. They appear to behave as conventional tables except that they are built dynamically when the query is run. This means that a view is always up to date. It is not always theoretically possible to update views. Codd himself, did not completely understand this. One problem exists when a view relates to part of a table not including a candidate key. This means that potential updates would violate the entity integrity rule.

7. High-level insert, update and delete

* Must support set-at-a-time updates.
* ie. Transactions
* eg: UPDATE mytable SET mycol = value WHERE condition;
Many rows may be updated with this single statement.

8. Physical data independence

* Physical layer of the architecture is mapped onto the logical layer.
* Users and programs are not dependent on the physical structure of the database.
* (Physical layer implementation is dependent on the DBMS.)

9. Logical data independence

* Users and programs are independent of the logical structure of the database.
* i.e.: the logical structure of the data can evolve with minimal impact on the programs.

10. Integrity independence

* Integrity constraints are to be stored in the catalog not the programs.
* Alterations to integrity constraints should not affect application programs.
* This simplifies the programs.
* It is not always possible to do this.

11. Distribution independence

* Applications should still work in a distributed database (DDB).

12. Nonsubversion rule

* If there is a record-at-a-time interface (eg via 3GL), security and integrity of the database must not be violated.
* There should be no backdoor to bypass the security imposed by the DBMS.


• Is access database a RDBMS?

Yes Access is RDBMS

• What are page splits?

A page is 8Kbytes of data which can be index related, data related, large object binary (lob’s) etc...

When you insert rows into a table they go on a page, into ‘slots’, your row will have a row length and you can get only so many rows on the 8Kbyte page. What happens when that row’s length increases because you entered a bigger product name in your varchar column for instance, well, SQL Server needs to move the other rows along in order to make room for your modification, if the combined new length of all the rows on the page will no longer fit on that page then SQL Server grabs a new page and moves rows to the right or left of your modification onto it – that is called a ‘page split’.

• What are E-R diagrams?
Definition: An entity-relationship (ER) diagram is a specialized graphic that illustrates the interrelationships between entities in a database. ER diagrams often use symbols to represent three different types of information. Boxes are commonly used to represent entities. Diamonds are normally used to represent relationships and ovals are used to represent attributes.
Also Known As: ER Diagram, E-R Diagram, entity-relationship model

What is collation?
Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence, with options for specifying case-sensitivity, accent marks, kana character types and character width.

Case sensitivity
If A and a, B and b, etc. are treated in the same way then it is case-insensitive. A computer treats A and a differently because it uses ASCII code to differentiate the input. The ASCII value of A is 65, while a is 97. The ASCII value of B is 66 and b is 98.

Accent sensitivity
If a and á, o and ó are treated in the same way, then it is accent-insensitive. A computer treats a and á differently because it uses ASCII code for differentiating the input. The ASCII value of a is 97 and áis 225. The ASCII value of o is 111 and ó is 243.

Kana Sensitivity
When Japanese kana characters Hiragana and Katakana are treated differently, it is called Kana sensitive.

Width sensitivity
When a single-byte character (half-width) and the same character when represented as a double-byte character (full-width) are treated differently then it is width sensitive.
Database, Tables and columns with different collation

SQL Server 2000 allows the users to create databases, tables and columns in different collations.

• What is Extent and Page?
The fundamental unit of data storage in Microsoft SQL Server™ is the page. In SQL Server 2000, the page size is 8 KB. This means SQL Server 2000 databases have 128 pages per megabyte.

The start of each page is a 96-byte header used to store system information, such as the type of page, the amount of free space on the page, and the object ID of the object owning the page.

Types of pages in SQL Server
Data
Index
Text/Image
Global Allocation Map
Secondary Global Allocation Map
Index Allocation Map
Bulk Changed Map
Differential Changed Map

Extent is a collection of 8 pages. There are two types of extents. 1. Uniform Extents 2. Mix Extents

• What is normalization? What are different types of normalization?
In relational database design, the process of organizing data to minimize redundancy. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

There are three main normal forms, each with increasing levels of normalization:
# First Normal Form (1NF): Each field in a table contains different information. For example, in an employee list, each table would contain only one birthdate field.
# Second Normal Form (2NF): Each field in a table that is not a determiner of the contents of another field must itself be a function of the other fields in the table.
# Third Normal Form (3NF): No duplicate information is permitted. So, for example, if two tables both require a birthdate field, the birthdate information would be separated into a separate table, and the two other tables would then access the birthdate information via an index field in the birthdate table. Any change to a birthdate would automatically be reflect in all tables that link to the birthdate table.

There are additional normalization levels, such as Boyce Codd Normal Form (BCNF), fourth normal form (4NF) and fifth normal form (5NF). While normalization makes databases more efficient to maintain, they can also make them more complex because data is separated into so many different tables.

• What is denormalization?
As the name indicates, denormalization is the reverse process of normalization. It's the controlled introduction of redundancy in to the database design. It helps improve the query performance as the number of joins could be reduced.

2 comments:

  1. Excellent resource.

    You might want to put in a link to Ted Codd's actual paper, which is available via the ACM:

    http://portal.acm.org/citation.cfm?doid=362384.362685

    Karen

    ReplyDelete
  2. THe info. provide by u is useful

    ReplyDelete