Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to specify a column specific character set #206

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ralfbecker
Copy link
Contributor

Limiting index size is important for MySQL performance. This pull request allows eg. to create columns with ascii charset (1 byte instead of 3 for utf8), if content only requires ascii characters.

The patch (mis-)uses constrain parsing to specify the character set, eg.

C(64) CONSTRAIN "CHARACTER SET ascii"

Original goal was to keep ADOdb modifications to a minimum, but I could also change parser to understand something like C(64) CHARACTER SET ascii and pass it via an extra parameter to _CreateSuffix for data dictionaries supporting it.

Ralf

…scii" via constrain parameter, required to be first part of suffix
@ralfbecker
Copy link
Contributor Author

Do you want me to modify this pull request to use
C(64) CHARACTER SET ascii
instead of currently used
C(64) CONSTRAIN "CHARACTER SET ascii"

Ralf

@mnewnham
Copy link
Contributor

I don't know whether you gave this some consideration, but I wonder if there might be a way we could add something that would allow us to add any user defined options that might be required, for example in your case:

C(64) OPTIONS "CHARACTER SET ASCII"

or perhaps a slightly more sophisticated example option using SQL SERVER:

C(64) OPTIONS "SPARSE,COLLATE 'UTF-8',ENABLE TRIGGER 'C64_TRIGGER'"

This would allow users to define any column definitions they want without us having to shoehorn even more sophisticated parsing into what is already a difficult to maintain piece of code. @dregad , I wonder if you have any opinion about this, and whether this would be an appropriate time to achieve such a feature.

@dregad
Copy link
Member

dregad commented Feb 29, 2016

add something that would allow us to add any user defined options that might be required

As a matter of fact, I was actually thinking along the same lines. In my opinion, having a generic option that is passed on to the driver as is, is probably the most flexible way to achieve this, while requiring the least maintenance on ADOdb end.

@mnewnham
Copy link
Contributor

mnewnham commented Mar 1, 2016

So I'm stealing an idea here from AXMLS. What if the $flds argument to createTableSql etc became an optional array. We could easily intercept that in the _genFields() method. We would then declare $flds to look something like this:

$flds = array('COL1'=>array('type'=>'C(32)',
                        'portableOptions'=>array("NOTNULL",
                                                   "DEFAULT 'abc'"
                                           ),
                        'driverOptions'=>array('mysqli'=>array("CHARACTER SET 'UTF8'"),
                                        'mssqlnative'=>array("COLLATE 'UTF8'",
                                                        "SPARSE")
                                                     ),
                       'providerOptions'=>array(),
            ),
            'COL2'=>array('type'=>'I',
                        'portableOptions'=>array("DEFAULT 0")
                        )
            );

The minimum required entries would be the key (field name) the 'type'.
optionally there would be:

  • 'portable options' which would link to the existing parser
  • 'driver options' which would only be included if the driver matched
  • 'provider options' which would only be included if the provider matched

@dregad
Copy link
Member

dregad commented Mar 1, 2016

The approach is interesting, and has the advantage of providing good flexibility.

How would you propose we handle backwards compatibility ?

@ralfbecker
Copy link
Contributor Author

I don't see a column specific character set as something driver specific, even if some databases might not offer it. ADOdb drivers should be used to abstract the db specific syntax for column specific charsets. As driver specific I see eg. certain type of indexes.
About BC: I would pass a string argument straight to current parser, which creates the array representation. Array argument could then be passed on to the driver.

@mnewnham
Copy link
Contributor

mnewnham commented Mar 1, 2016

I was just using the character set as an example, but my point about stealing the idea from AXMLS still stands. For example, here is part of an AXMLS table from my system:

<?xml version="1.0"?>
<schema version="0.3">
<table name="words" >
        <opt platform="mysql">TYPE=MyISAM</opt>
        <opt platform="db2">TABLESPACE="APP-LARGE-SPACE"</opt>
.....

Based on the provider, the options are passed into createtableSql()

I understand your point about ADOdb being an abstraction device, and I believe that my concept of 'portableOptions' addresses that. If an option such as CHARACTER SET is worthy of becoming a portable option, I think this redesign would allow us to achieve that without disturbing the core code too much. Perhaps each portable option could become a standardized API method that could be extended as necessary on a driver by driver basis. We should also recognize that a lot of people use ADOdb as an abstraction device, but are not interested in portability, just that the software provides easy access to database functionality, and our data dictionary should accommodate it with driver-specific functionality.

/*
* We pass in our field definitions
*/
$flds = array('COL1'=>array('type'=>'C(32)',
                        'portableOptions'=>array("NOTNULL",
                                                   "DEFAULT'=>'abc',
                                                   'CHARACTERSET'=>'UTF-8'.
                                           ),
/*
* For each of our portable options we have a method
*/
protected function metaOption_NOTNULL()
{
    return 'NOT NULL';
}

protected function metaOption_CHARACTERSET()
{
    $a = func_get_args();
    $cSet = "CHARACTER SET '" . $a[0] . "'";
    return $cSet;
}

/*
* In our SQL Server driver, we would then overload
*/
final protected function metaOption_CHARACTERSET()
{
    $a = func_get_args();
    $cSet = "COLLATE ' " . $a[0] . "'";
    return $cSet;
}

In the above example, the passed arguments are flexible that we could pass a string, an empty string or even another array depending on the complexity of the option. If needed, extra portable options could then be implemented by simply adding the method to the file.

We would then iterate through our portable options

foreach ($flds as $fieldName=>$items)
{
    foreach ($items['portableOptions'] as $k=>$v)
    {
        $f = 'metaOption_' . $k;
        $text .= $this->$f($v);
    }
}

Indeed, we could take the functionality further by making each method a self-enclosed class, and each option becomes a plugin. The main methods such as createIndexSql(), addColumnSql() would no longer need to be modified if a new portable option needed to be created.

From a backwards compatibility perspective, my suggestion about testing for an array instead of a string is still a valid test, I think. We would branch off there and return without entering the main body of the genfield() code.

To summarize, my main points are this:

  • The main functionality of this code has really not been modified in 10+ years
  • Whilst the core portable functions can still achieve 95% of user requirements, there are a continually increasing number of sophisticated options/attributes that we cannot address with the current code design. Indeed, other recent requests for modifications to this method (ENUM, NOT-NULL BLOB FIELDS) have been tricky to implement, and have ultimately been done "hack" method.
  • The current design, with its definition as a flat string is, to me, annoying in its self and easily subject to errors, especially when dynamically generating columns/indexes etc.
  • The fact that there is driver-specific code in the core functions is another annoyance
  • This would be a valuable enhancement to the functionality of the data dictionary.

I do however, recognize that what you are trying to do is get your own enhancements into the core product as quickly as possible, and that this is quite a substantial project, so maybe it might be possible to do that on the basis that your request would be a one-off, and no further requests on the subject would be accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants