2002年 7月31日 DATA-CLUSTERING BACK >> 目次



 T-formed ERD (Entity-Relationship Diagram) uses two symbols:
  (1) a square to denote "entities"
  (2) a line to denote "relationship"

 T-formed ERD uses a square to denote a data grouping and a vertical line within the square:
 its format is so "T-formed"--the same format of a journal where accountants report transaction data--;
 the left side is the place where the "identifier" is stored, and the right where the "attributes" are stored.



 
the name of entity
identifierattributes
 


 T-formed ER method basically recommends five processes for (preparing) diagrams:
  (1) Data clustering
  (2) Data classification
  (3) Relationship creation
  (4) Data verification
  (5) Polysemy (multiplicity of data-meaning) elimination

 
Data clustering

 A single entity (occurrence) must have an identifier. Most computer systems assign an arbitrary number or code as the identifier: employee-number, division-code, order-number etc.
 The identifier must NOT uniquely identify a single occurrence: the uniqueness for accessing data through indexing with the (master) key shall not be now mixed up with the identifier, which is recommended later in the process of creating indexes.

 Attributes are data elements containing a single piece of information about entity: i.e.,
   name, date, amount

 Attributes are, therefore, named under the rule of "entity. attribute": i. e.,
   employee. name    order. date
   customer. name    delivery. date
   product. name     billing. date

 Take an attribute, and just ask "of which?": i. e.,
 attribute of "name", and ask "whose?" employee's? customer's? product's?
 attribute of "date", and ask "the date of which?" order? delivery? billing?



 
employee
 employee-NO employee-NAME



order
 order-NO order-DATE
 

 Attributes are placed in each entity, under the rule of "entity. attribute", assured of "one fact, one place."
 Data clustering is the process for eliminating data redundancy, normalizing data.

NOTE:
 Most COBOL programs suffer from a proliferation of homonyms, aliases, and synonyms. Studies have indicated that there are 20 "other" names for each real data element.

 

  << もどる HOME すすむ >>
  T字形ER手法の英訳