To Previous Chapter To Table of Contents To Bottom of Page To Next Chapter

Chapter 5 - The Relational Model and Normalization

  1. The Relational Model
    • relation - 2d-table (file)
    • tuple (row) - data that pertains to some thing (record)
    • attribute (column) - field

    1. FUNCTIONAL DEPENDENCIES - relationship between/among attributes; ex.
      1. SID -> GPA
      2. ComputerSerialNumber -> MemorySize
      3. (SID, ClassName) -> Grade

      attribute on the left side is the determinant

    2. KEYS - group of one or more attributes that uniquely identify a tuple

      Activity Relation
      SID Activity Fee
      100 Skiing 200
      150 Swimming 50
      175 Squash 50
      200 Swimming 50
       
      Relation with a Two-Attribute Key
      SID Activity Fee
      100 Skiing 200
      100 Golf 65
      150 Swimming 50
      175 Squash 50
      175 Swimming 50
      200 Swimming 50
      200 Golf 65

    3. FUNCTIONAL DEPENDENCIES, KEYS, AND UNIQUENESS
      • determinant of a functional dependency is not necessarily unique
      • a key is unique

  2. Normalization
    1. MODIFICATION ANOMALIES - data changes that have undesirable effects
      in figure above, if we delete the tuple 100 - Skiing, we lose the cost of skiing - deletion anomaly
      can not enter the cost of scuba diving without entering a student - insertion anomaly

      The Division of ACTIVITY into Two Relations
      STU-ACT (SID, Activity)
      Key:SID
      SID Activity
      100 Skiing
      150 Swimming
      175 Squash
      200 Swimming
       
      ACT-COST(Activity, Fee)
      Key: Activity
      Activity Fee
      Skiing 200
      Swimming 50
      Squash 50

      referential integrity constraints - values of attributes in one relation must exist in another relation

    2. ESSENCE OF NORMALIZATION - redefining relations
      ex. - Problems exist when a relation contains facts about two (or more) different themes [may result in referential integrity constraints]

    3. CLASSES OF RELATIONS
      • normal forms - classes of relations and the techniques for preventing anomalies
      • 1NF -> 2NF -> 3NF -> Boyce-CoddNF -> 4NF -> 5NF
      • domain/key normal form (DKNF) a relation in DKNF is free of all modification anomalies

  3. First Through Fifth Normal Form
    1. FIRST NORMAL FORM
      1. cells in a table must be single values
      2. neither repeating groups nor arrays are allowed as values
      3. all entries in a column must be of the same type
      4. each column must have a unique name
      5. no two rows may be identical
    2. SECOND NORMAL FORM
      1. A relation is in 2NF if all of its nonkey attributes are dependent on all of the key [ACTIVITY Relation is not 2NF]
      2. if only single attribute key -> automatically 2NF (no partial dependencies)
    3. THIRD NORMAL FORM
      A relation is in 3NF if it is in 2NF and has no transitive dependencies ex.
      HOUSING (SID, Building, Fee)
      Key: SID
      Functional dependencies: Building->Fee, SID->Building->Fee
      SID Building Fee
      100 Randolph 1200
      150 Ingersoll 1100
      200 Randolph 1200
      250 Pitkin 1100
      300 Randolph 1200

    4. BOYCE-CODD NORMAL FORM
      A relation is in BCNF if every determinant is a candidate key (candidate key - two or more attributes or attribute collections that can be a key)
      ADVISER (SID, Major, Fname)
      Key (primary):(SID, Major)
      Key(candidate): (SID, Fname)
      Functional dependencies: Fname->Major
      SID Major Fname
      100 Math Cauchy
      150 Psychology Jung
      200 Math Riemann
      250 Math Cauchy
      300 Psychology Perls
      300 Math Riemann

      even though it is 1NF,2NF & 3NF, it still has modification anomalies
    5. FOURTH NORMAL FORM
      A relation is in 4NF if it is in BCNF and has no multivalued dependencies
      STUDENT(SID, Major, Activity)
      Key: (SID, Major, Activity)
      Multivalued dependencies: SID->->Major
      SID->->Activity
      SID Major Activity
      100 Music Swimming
      100 Accounting Swimming
      100 Music Tennis
      100 Accounting Tennis
      150 Math Jogging
    6. FIFTH NORMAL FORM - no clear intuitive meaning

  4. Domain/Key Normal Form
    1. DEFINITIONS
      • constraint - any rule governing static values of attributes that is precise enough that we can ascertain whether it is true or not
      • key - unique identifier of a tuple
      • domain - description of an attribute's allowed values (physical and logical)
      • a relation is in DKNF if enforcing key and domain restrictions causes all of the constraints to be met
    2. EXAMPLE 1 OF DKNF
      STUDENT(SID, GradeLevel, Building, Fee)

      Key:SID

      Constraints: Building->Fee
      SID must not begin with digit 1

      Domain Definitions

      SID  in  CDDD, where C is [1..9] and D is [0..9]
      GradeLevel  in  {'FR','SO','JR',SR'}
      Building  in  CHAR(4)
      Fee  in  DEC(4)

      Relation and Key Definitions

      STUDENT(SID, GradeLevel, Building)
      Key:SID

      BLDG-FEE(Building, Fee)
      Key:Building

    3. EXAMPLE 2 OF DKNF
      PROFESSOR(FID, Fname, Class, SID, Sname)

      Key:(FID,Class,SID)

      Constraints: FID->Fname
      Fname->FID
      FID->->Class|SID
      Fname->->Class|SID
      SID->FID
      SID->Fname
      SID->Sname
      FID must start with 1;SID must not begin with digit 1

      Domain Definitions

      FID  in  CDDD, C=1; D=decimal digit
      Fname   in  CHAR(30)
      Class  in  CHAR(10)
      SID  in  CDDD, where C is [1..9] and D is [0..9]
      Sname  in  CHAR(30)

      Relation and Key Definitions

      FACULTY (FID, Fname)
      KEY (primary): FID;
      Key (candidate): Fname

      PREPARATION (Fname, Class)
      Key:Fname, Class

      STUDENT(SID, Sname, Fname)
      Key:SID

    4. EXAMPLE 3 OF DKNF
      STU-ADVISER(SID, Sname, FID, Fname, GradFacultyStatus)

      Key:SID

      Constraints: FID->Fname
      Fname->FID
      FID and Fname->GradFacultyStatus
      Only graduate faculty can advise graduate students
      FID begins with 1
      SID must not begin with digit 1
      SID of graduate student begins with 9
      GradFacultyStatus = 0 (undergraduate faculty) or 1 (graduate faculty)

      Domain Definitions

      FID  in  CDDD, where C is 1 and D is [0..9]
      Fname  in  CHAR(30)
      Grad-faculty-status  in  [0,1]
      GSID  in  CDDD, where C is 9 and D is [0..9]
      UGSID  in  CDDD, where C <> 1 and C <> 9 and D is [0..9]
      Sname  in  CHAR(30)

      Additional Domain Definitions
      Gfname  in  (Fname of FACULTY, where GradFacultyStatus=1)

      Relation and Key Definitions

      FACULTY(FID, Fname, GradFacultyStatus)
      Key:FID or Fname

      G-ADV(GSID, Sname, Gfname)
      Key:GSID

      UG-ADV(UGSID, Sname, Fname)
      Key:UGSID

    Form Defining Characteristic
    1NF any relation
    2NF all nonkey attributes are dependent on all of the keys
    3NF there are no transitive dependencies
    BCNF every determinant is a candidate key
    4NF there are no multivalued dependencies
    5NF not described in this discussion
    DK/NF all constraints on relations are logical consequences of domains and keys

  5. The Synthesis of Relations
    1. ONE-TO-ONE ATTRIBUTE RELATIONSHIPS - A->B and B->A
      • if 2 attributes functionally determine each other, the relationship is 1-1
      • if 2 attributes uniquely identify the same thing (entity or object), the relationship is 1-1
      • if 2 attributes have a 1-1 relationship, they functionally determine each other
    2. MANY-TO-ONE ATTRIBUTE RELATIONSHIPS - A->B, B not -> A
    3. MANY-TO-MANY ATTRIBUTE RELATIONSHIPS - A not -> B, B not -> A
    Type of Attribute Relationship
    One-to-One Many-to-One Many-to-Many
    Relation Definition R(A,B) S(C,D) T(E,F)
    Dependencies A->B
    B->A
    C->D
    C not ->D
    E not->F
    F not->E
    Key Either A or B C (E,F)
    Rule for Adding
    Another Attribute
    Either A or B->C C->E (E,F)->G

  6. Multivalued Dependencies, Iteration 2 - ex. STUDENT(SID, Major, Activity)

  7. Trade-Offs
    • DE-NORMALIZATION EX.
      CUSTOMER(CustNumber, CustName, City, State, Zip) [Zip determines City and State]
    • OPTIMIZATION
      COLLEGE(CollegeName, Dean, AssistantDean) with multiple Assistant Deans can be normalized to
      DEAN(CollegeName, Dean) and ASSISTANT-DEAN(CollegeName, AssistantDean) or
      COLLEGE1(CollegeName, Dean, AssistantDean1, AssistantDean2, AssistantDean3)

To Previous Chapter To Table of Contents To top of page To Next Chapter