Saturday, October 22, 2005

OpenOffice Base 2.0 vs. Microsoft Access 2003 - Part II HSQLDB Tables

As I mentioned in Part I of this series, OOo Base is a desktop-database application that can perform CRUD (create, retrieve, update, and delete) operations on tables from almost any commercial or open-source RDBMS (relational database management system) that has ODBC, JDBC, or OLE DB drivers. OOo Base supports Jet (Microsoft Office Access 2003 and earlier), MySQL, PostgreSQL, Oracle, DB2, SQL Server, SQL Server Express, and MSDE databases, as well as many others.

Note: This and succeeding parts of the series were written based on the OOo 2.0 release version of 10/20/2005.

OOo/SO users and developers who have existing production databases will find that registering the database with—a.k.a, attaching database tables to—a new DatabaseName.odb file is considerably easier than adding and populating new, embedded HSQLDB tables. However, OOo 2.0/SO 8 will be many users' first introduction to PC databases. OOo considers HSQLDB to be "targeted at unexperienced users" or for the small-office/home-office (SOHO) market. Most of the targe market won't have existing databases, relational or otherwise. Thus, the capbility to easily create a useful database from scratch is an important feature of any desktop database application.

Why Use the Embedded HSQLDB Database?

The justifications proposed for moving from proprietary (vendor-specific binary or XML) file formats to a single standards-based schema—Relax NG instead of W3C-standard XML Schema 1.0, unfortunately—clearly apply to relational databases. Vendor lock-in by proprietary RDBMS file storage, metadata, and extensions to ANSI-standard SQL has an exclusionary effect that's identical to the use of Microsoft Word, Excel, and PowerPoint binary files. OOo 2.0/SO 8 with HSQLDB eliminate platform (i.e., Windows) lock-in; OOo 2.0/SO 8 versions are available for Windows, Linux (x86), Solaris (x86), and Solaris (SPARC). OOo 2.0 offers versions for Linux (PPC), FreeBSD, and Macintosh OS X also. (Some operations described in this post might not work with the OOo X11 release for the Mac).

The primary market-related difference between office software suites and RDBMSs is that no RDBMS (including Access) has a market share that's even close to that of Microsoft Office Standard edition. The Yankee Group says Microsoft Office has 90% of the market, according to Computer Reseller News. eWeek's Lisa Vaas reported in March 2005 that Oracle had 41.3% of the RDBMS market, followed by IBM with 30.6% and Microsoft SQL Server with 13.4%. Oracle's RDBMS market share is less than half Microsoft Office's share of the overall office suite market. However, it's generally accepted that Access—included with Microsoft Office Professional and higher—has a virtual lock on the Windows desktop database category. The cost differential between Office 2003 Standard and Professional purchased from Internet merchants ranges from US$21 to US$166 for the standard SKU and US$58 to US$90 for the upgrade SKU.

Filemaker Pro 8 Advanced is Access's only serious desktop-database competition on the PC, but costs substantially more than the Office Standard-to-Pro increment. (Plain Filemaker Pro 8 abandons many Access developer features to achieve simplicity.)

Currently, Microsoft Office Access 2003 is the standard of comparison for all desktop database applications, as noted in the lead paragraph of Lisa Vaas's October 21, 2005 eWeek article. Access's default Jet (*.mdb) binary format for storing all database objects—tables, query definitions, form and report designs, and VBA code modules—is proprietary and, of course, Windows-only. Many organizations upsize Access applications to take advantage of more robust client/server databases—such as Microsoft SQL Server 2000 or the no-charge MSDE 2000—by linking (attaching) tables to a Jet file or connecting directly to the database with Access data projects (*.adp).

Jet databases are suited for multi-user applications; HSQLDB isn't capable of concurrent multi-user access. Jet has ODBC and OLE DB drivers, but HSQLDB has no ODBC or JDBC drivers—the current JDBC implementation doesn't support *.odb files. Disregarding OOo Base's form, report, and programming limitations, HSQLDB's single-user restriction, questions about data security and user permissions, and lack of drivers for data stored in OpenDocument files are issues that are likely to dissuade most potential adopters from using the built-in HSQLSDB database. Almost all commercial OOo Base implementations will undoubtedly use a "commercial" open-source database—such as MySQL (with the Oracle-owned InnoBase engine) or PostgreSQL. If you must adopt the OASIS Open Document Format for Office Applications (OpenDocument) 1.0 specification for your single-user database projects, as well as for word-processing, spreadsheet, and presentation applications, the embedded HSQLDB RDBMS appears to be your only choice.

Only OOo 2.0/SO 8 with HSQLDB—to my knowledge—claim to store local table data in OpenDocument 1.0's all-encompassing Package format. The OpenDocument 1.0 specification's Section 6.5 defines Database Fields, section 8.6 specifies Database Ranges, section 11 decrees Form Content, and section 17 describes the ZIP-compressed Package format. Database Fields and Database Ranges refer to databases in Calc spreadsheets, not Base databases.

Files in a Base Package (*.odb File) for an HSQLDB Embedded Database

A DatabaseName.odb file for a Base project with an embedded HSQLDB file contains the following files:

  • manifest.xml provides a list of all files the in the package; the media type of the files, if applicable; and the information to decrypt the file, if it's encrypted.
  • context.xml contains data source settings and related database information, including forms and reports (if applicable).
  • settings.xml contains application configuration data.
  • mimetype is an uncompressed text file that contains application/vnd.sun.xml.base.
  • data contains all cached table data in compressed binary format.
  • properties contains is a compressed list of default property values for the database, such as hsqldb.default_table_type=cached.
  • script contains compressed, readable SQL DDL statements that define the database and its tables. The script file is executed when starting the database.
  • backup is a compressed snapshot of the data created when starting the database.
  • log is a redo log that consists of an SQL script for all database changes (DDL statements) executed since updating the backup file; the log file is deleted after re-creating the backup file.
Here's WinZip's window for an HSQLDB clone of the Northwind.mdb sample database:

You can use WinZip or any other Windows-based ZIP utility to view the XML files in Internet Explorer. You can view the other files in Notepad or the utility's ASCII viewer. The data, properties, script, backup, and log files are specific to HSQLDB and thus should be considered proprietary. However, section 17 of the OpenDocument 1.0 specification doesn't appear to impose any limitation on the content or structure of binary files.

Open Issue: If *.odb files that contain HSQLDB files with proprietary binary formats qualify as OpenDocument 1.0-compliant, embedding other proprietary, binary-formatted files should receive equal treatment. Replacing HSQLDB with an embedded, compressed Jet *.mdb file might provide many missing features. Adding compressed Excel, Word, or PowerPoint binaries to *.ods files might provide better performance and interesting new features.

HSQLDB Does NOT Store Data in XML Files

Several members of the trade press, including bloggers, have reported that HSQLDB stores table data as XML files. The latest instance is by Lisa Voss in eWeek's "OpenOffice Packs a Powerful New Database Punch" article (October 21, 2005, page 2): "The full version of the HSQL database engine included in OOo Base stores data in XML files." HSQLDB stores data in a proprietary binary format, as noted in the preceding section's description of the data file that's stored in a compressed (ZIP) file.

Here's a capture of the first few records of the Customers table from the 1.5-MB decompressed data file:

Stating that HSQLDB stores data in XML files seems to me to be a gratuitous attempt to represent HSQLDB storage as similar to the OpenDocument 1.0 XML format for other OOo documents. There is no way that anyone who had investigated HSQLDB's storage format could confuse the binary format with XML.

Generating HSQLDB Tables with the Wizard After you create and register a new HSQLDB database, click the Tables and Use Wizard to Create Table icons to start the Table Wizard. This Wizard—a knock-off of Access's Table Wizard—offers 22 business-oriented and 15 personal-style table designs from which to select.

Make sure to set the order of the Selected fields before clicking Next. It's almost impossible to change the field sequence in Table Edit mode.

The next step is to change data types and other field properties as required, and add or delete fields:

HSQLDB supports most SQL-92 data types, as shown in this list:

HSQLDB strings and [VAR]CHAR data types are Unicode by default.

The next Wizard dialog lets you add a primary key or select one of the existing fields as the primary key:

The final step offers the option of taking you directly to the data entry grid or editing the table design. Unfortunately, existing column properties are read-only, so your only option is to add new columns and delete the old ones. Another Access feature that's missing in the HSQLDB table design grid is the ability to alter the ordinal columns position by dragging column icons.

After you've added the tables you need for an applicaton, choose Tools, Relations to establish relationships between primary and foreign keys in the tables. The later "Establishing Relationships between Tables" section describes the Relations window.

Creating and Populating HSQLDB Tables with DDL Batches

You can define and add tables to an HSQLDB database with conventional ANSI SQL DDL statements. If you want to include lower-case letters or special characters in table and field names, you must use quoted identifiers. Open the HSQL Database Engine Documentation pages and click the SQL Syntax (Chapter 9) link to review HSQLDB's SQL-9x implementation. Choose Tools, SQL to open the Execute SQL Statement dialog to execute non-SELECT (a.k.a. "action") queries or commands. Here's an example of an SQL-92 batch to clone the Northwind Orders table:

CREATE CACHED TABLE "Orders"(
  "OrderID" INTEGER IDENTITY PRIMARY KEY,
  "CustomerID" CHAR(5) NULL,
  "EmployeeID" INTEGER NULL,
  "OrderDate" DATE NULL,
  "RequiredDate" DATE NULL,
  "ShippedDate" DATE NULL,
  "ShipVia" INTEGER NULL,
  "Freight" DECIMAL NULL,
  "ShipName" VARCHAR(40) NULL,
  "ShipAddress" VARCHAR(60) NULL,
  "ShipCity" VARCHAR(15) NULL,
  "ShipRegion" VARCHAR(15) NULL,
  "ShipPostalCode" VARCHAR(10) NULL,
  "ShipCountry" VARCHAR(15) NULL,
CONSTRAINT "PK_Orders" UNIQUE ("OrderID"))
You then write conventional INSERT statements to add rows to the table. HSQLDB expects DATE or TIMESTAMP literals in the 'YYYY-MM-DD [HH:MM:MM.######]' format. Obviously, the preceding approach won't appeal to "unexperienced" database users who, by definition, aren't likely to know how to write SQL DDL/DML commands. If you replace the optional (default) CACHED modifier with TEXT, you (theoretically) can specify a TableName.csv file in the same folder as the data source to populate the table when the database loads with the following instruction:
SET TABLE "Orders" SOURCE "Orders;ignore_first=true"

Chapter 6, "Text Tables" describes the syntax for the CREATE TABLE and SET TABLE instructions. The preceding syntax doesn't add rows to the table, despite the appearance of a "Command successfully executed" message. You must manually modify most source *.csv files, use an undocumented SET TABLE syntax, and abandon UTF-8 or Unicode for text fields. The following grid demonstrates how text tables replace accented characters with a symbol (diamond and embedded ?):

Creating and Populating HSQLDB Tables from CSV Files It's a common business task to replace multiple, large spreadsheets with desktop database applications that consist of relational tables, data-entry forms and reports. You can import CSV files into an OOo Calc spreadsheet or open a *.ods file. In either case, the first row of the spreadsheet must contain the column names. You create the table structure and populate it by dragging a Database range from the spreadsheet to an empty area of the Base Tables window. This process is much simpler than the SQL batch approach described in the preceding section.

The drag-and-drop technique is straightforward, if your CSV files don't contain dates. Importing to HSQLDB tables date from CSV files is a can of worms.

Here's the drill for tables with no DATE, TIME, or TIMESTAMP fields:

1. Create and register a new HSQLDB database, if you haven't already done so, and click the Tables icon.

2. Open With OpenOpenOffice.org the *.csv or *.ods file in Windows Explorer. If you open a *.csv file, click OK to accept the default CSV separator, text delimiter, and cell data types:

3. Select the upper-left cell of the Database range, usually A1:

4. Choose Data, Define Range to open the Define Database Range dialog. Type a name for the range, verify the upper-left and lower-right cell coordinates, and click OK.

5. With the entire Database range selected in the spreadsheet, drag the selection to the Base window's Tables pane to open the Copy Table Wizard's first dialog.

6. Change the Table Name from Table1 to the new table name, accept the remaining defaults, and click to open the Apply Columns dialog. (Do not add a new primary key column.)

7. Click the >> buttton to add all columns from the Existing Columns list, and click Next.

8. Set the columns' data types and other properties in the Column Information controls. Note that accepting the default IDENTITY modifier as the Auto-Increment Statement doesn't define ProductID as an AutoField column.

9. Click Create to add the table and data to the current database. If the table doesn't appear in the Tables list, choose View, Refresh tables.

Setting the Primary Key and Specifying an Auto-Increment (Auto-Value) Column

I found that setting a primary key and specifying the column as an identity (Auto-Increment or AutoField) column requires executing two DDL statements, such as these for the Products table, in the Execute SQL Statement dialog:

ALTER TABLE "Products" ADD PRIMARY KEY("ProductID"); ALTER TABLE "Products" ALTER COLUMN "ProductID" INTEGER IDENTITY PRIMARY KEY;

To verify the primary key, choose View, Refresh Tables, right-click the table name in the Tables list, choose Edit, and verify that the row for the primary key column contains a miniature key icon. To verify that the column is an IDENTITY column, open the table in datasheet view, click the new row button and look for the <AutoValue> entry in the otherwise-empty row.

Importing CSV Dates, Times, or Both

OOo 2.0 appears to have problems importing CSV date, time, or date/time values to HSQLDB DATE, TIME, or TIMESTAMP columns. Excel and OOo/SO Spreadsheets store dates as integers (1 = 1/1/1900), time as a decimal fraction (of 24 hours), and date/time data as a decimal number. As an example, 1.5 represents 1/1/1900 12:00:00 PM. Regardless of the date format you apply in Excel or OOo/SO, the value stored in the CSV file is the unformatted numeric value that represents the date and/or time.

HSQLDB expects DATE values to be supplied in 'yyyy-mm-dd' (java.sql.Date) format and TIME values as 'hh:mm:ss' (java.sql.Time) strings. TIMESTAMP fields require 'yyyy-mm-dd hh:mm:ss' format or 'yyyy-mm-dd hh:mm:ss.fffffffff' for microsecond resolution(java.sql.Timestamp). Leading zeros for hour (hh) and nine fractional digits (f) for seconds are required.

Generating Date-Compatible CSV files with Access

Access's Export Text Wizard's Export Specification dialog lets you specify the Date Order and Date Delimiter for CSV or fixed-width text file:

The preceding settings (YMD as the Date Order and - as the Date Delimiter) result in the following partial CSV file:

As far as I've been able to determine, neither Excel nor OOo/SO Calc will export a CSV in the preceding format. However, even this format won't import correctly into an OOo spreadsheet. You must add text delimiters ("s) to the date[time] fields. The yyyy-mm-dd [h]h:mm:ss format makes search and replace to add the "s to the fields.

To ready the CSV file for import to an HSQLDB table, open the CSV file in Notepad, replace ,19 with ,"19 and :00, with :00", to change the CSV field to text. If you don't need time values, replace [space]0:00:00 with nothing. If you need a date/time (TIMESTAMP) value, replace [space]0: with [space]00: and :00" with :00.000000000", as shown here:

Follow the "Creating and Populating HSQLDB Tables from CSV Files" sections' steps 2 through 9, except as follows:

1. In step 2, be sure to change the Fields Column Type from Standard to Text for each of the date[time] fields you've specified.

2. In step 6, substitute TableName1 for Table1 as the table name. You must copy this table to a modified final version.

3. In step 9, accept the default VARCHAR type for date fields.

4. Proceed with the "Make the Final HSQLDB Table Copy" section.

Generating Date-Compatible Fields with Excel or OOo Spreadsheet

An alternative to Access's Text Export Wizard is to create the CSV file with any spreadsheet or database application that offers export-to-CSV capability, and then open the CSV file in OOo Calc, add a text field for each date or date/time column, and then use the TEXT(CellValue, "date format") expression to populate the text column's cells. This method is limited to processing a maximum of 64K rows per table.

[Details on this method and following topics to come later.]

Make the Final HSQLDB Table Copy

Establishing Relationships between Tables

Adding Indexes to Tables

Exporting Table Data

If you use the embedded HSQLDB database and want to export its data to another RDBMS, you must copy each table to an individual Calc worksheet and save the worksheet as a CSV file. Then import the CSV file to the new database. Whether you need to jump through the same hoops required to import CSV files with date or date/time fields to HSQLDB depends on the RDBMS you select.

Technorati:

1 comments:

Anonymous said...

With TEXT tables (CSV and other delimited formats) you can specify the text encoding for your files using the SET TABLE SOURCE "..." syntax. e.g. SET TABLE MYTABLE SOURCE "myfile;encoding=UTF-8". Other supported parameters for this command allow the separator and quoting to be specified.