Category Archives: MySQL

MySQL: Get the tail of a string after the last delimiter

MySQL has several String functions to search, replace, extract information from a string.

We had a list of string values in the format “[something1]_R[something2]”. We wanted to convert the values on this column to only store the [something2] part of the string after the last ‘_R’.
The restriction was that we couldn’t ensure that [something1] has not any “_R” sequence of characters.

Some examples:

  • For the string “12345_R27″ we want only to store “27”
  • For the string “12_Rat_R87″ we want only to store “87”

MySQL has SUBSTRING function but to solve our problem we would need to locate the last occurrence of “_R” to know where we should start the substring. Fortunately, MySQL has another function SUBSTRING_INDEX that “Returns the substring from string str before count occurrences of the delimiter delim

With this function we can say that want the substring starting on 1st, 2nd, 3rd, … nth occurrence of the delimiter.

But we can also invoke with count=-1 which will return the substring of the original str after the last occurrence of the delim (delimiter).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mysql> select substring_index('12345_R27', '_R', -1)
    -> ;
+----------------------------------------+
| substring_index('12345_R27', '_R', -1) |
+----------------------------------------+
| 27                                     |
+----------------------------------------+
1 row in set (0.00 sec)

mysql> select substring_index('12_Rat_R87', '_R', -1);
+-----------------------------------------+
| substring_index('12_Rat_R87', '_R', -1) |
+-----------------------------------------+
| 87                                      |
+-----------------------------------------+
1 row in set (0.00 sec)

MySQL database size

1
2
3
4
5
6
SELECT table_schema "Data Base Name",
SUM( data_length + index_length ) / 1024 /
1024 "Data Base Size in MB",
SUM( data_free )/ 1024 / 1024 "Free Space in MB"
FROM information_schema.TABLES
GROUP BY table_schema ;

MySQL delete duplicate lines (MyISAM and InnoDB)

Sometimes this task can be a real pain in the ass.. Heres a quick & dirty way to remove.

Let’s say you have the following table:

+-------+------------------+------+-----+---------+----------------+
| Field | Type             | Null | Key | Default | Extra          |
+-------+------------------+------+-----+---------+----------------+
| id    | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| col_a | varchar(10)      | YES  |     | NULL    |                |
| col_b | varchar(10)      | YES  |     | NULL    |                |
+-------+------------------+------+-----+---------+----------------+

With the following data:

+----+-------+-------+
| id | col_a | col_b |
+----+-------+-------+
|  1 | YY    | ZZ    |
|  2 | YY    | XX    |
|  3 | YY    | ZZ    |
|  4 | YY    | XX    |
|  5 | XX    | XX    |
+----+-------+-------+

Note: On this example I’ll assume a unique row is identified by the pair “col_a” + “col_b”.

If you want to delete duplicate rows and you don’t care for the PRIMARY KEY or which of the rows gets selected, you can add a UNIQUE INDEX.

For MyISAM engine tables

1
ALTER IGNORE TABLE `example_table` ADD UNIQUE INDEX `idx_uq` (`col_a`, `col_b`);

Doing this will automatically purge the duplicate rows instead of shouting an error because we’re using IGNORE.

Et voila:

+----+-------+-------+
| id | col_a | col_b |
+----+-------+-------+
|  1 | YY    | ZZ    |
|  2 | YY    | XX    |
|  5 | XX    | XX    |
+----+-------+-------+

Important Note on InnoDB engine tables

If your under MySQL 5.5/InnoDB this will not work due to a known “InnoDB index limitation“.

Luckily, someone at stackoverflow.com came up with a simple solution:

1
2
SET SESSION old_alter_table=1;
ALTER IGNORE TABLE `example_table` ADD UNIQUE INDEX `idx_uq` (`col_a`, `col_b`);

MySQL kill all queries by user

Here’s a dirty command-line utility that kills all MySQL queries originated by a given user:

1
mysql -u'YOUR_MYSQL_USER' -p'YOUR_MYSQL_PASSWORD' -h'YOUR_MYSQL_HOST' -e "select ID from information_schema.processlist where user='YOUR_MYSQL_USER';" | sed '1d' | awk '{print "kill ", $1, ";"}' | xargs -i mysql -u'YOUR_MYSQL_USER' -p'YOUR_MYSQL_PASSWORD' -h'YOUR_MYSQL_HOST' -e "{}"

MySQL count(1) count(*) Myth

Sometimes people say that you should use select count(1) from table; instead select count(*) from table; because is faster, if you read some good books on MySQL they never refer this optimisation (strange).

So a little testing:

mysql> explain extended select sql_no_cache count(*) from lc_contact_detail \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: lc_contact_detail
type: index
possible_keys: NULL
key: contact_id
key_len: 4
ref: NULL
rows: 13393816
filtered: 100.00
Extra: Using index
1 row in set, 1 warning (0.00 sec)

mysql> explain extended select sql_no_cache count(1) from lc_contact_detail \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: lc_contact_detail
type: index
possible_keys: NULL
key: contact_id
key_len: 4
ref: NULL
rows: 13393849
filtered: 100.00
Extra: Using index
1 row in set, 1 warning (0.00 sec)

Looks MySQL will execute the  query in the same way in either case, if didn’t convince you, profile:

mysql> select sql_no_cache count(*) from lc_contact_detail;
+———-+
| count(*) |
+———-+
| 13457066 |
+———-+
1 row in set (5.64 sec)

mysql> show profile for query 1;
+——————–+———-+
| Status             | Duration |
+——————–+———-+
| starting           | 0.033511 |
| Opening tables     | 0.000019 |
| System lock        | 0.000007 |
| Table lock         | 0.000012 |
| init               | 0.000016 |
| optimizing         | 0.000008 |
| statistics         | 0.000015 |
| preparing          | 0.000012 |
| executing          | 0.000007 |
| Sending data       | 5.606788 |
| end                | 0.000018 |
| query end          | 0.000005 |
| freeing items      | 0.000059 |
| logging slow query | 0.000004 |
| cleaning up        | 0.000006 |
+——————–+———-+
15 rows in set (0.02 sec)

mysql> select sql_no_cache count(1) from lc_contact_detail;
+———-+
| count(1) |
+———-+
| 13457088 |
+———-+
1 row in set (5.54 sec)

mysql> show profile for query 2;
+——————–+———-+
| Status             | Duration |
+——————–+———-+
| starting           | 0.000077 |
| Opening tables     | 0.000020 |
| System lock        | 0.000007 |
| Table lock         | 0.000011 |
| init               | 0.000017 |
| optimizing         | 0.000007 |
| statistics         | 0.000013 |
| preparing          | 0.000011 |
| executing          | 0.000008 |
| Sending data       | 5.545308 |
| end                | 0.000019 |
| query end          | 0.000006 |
| freeing items      | 0.000058 |
| logging slow query | 0.000004 |
| cleaning up        | 0.000006 |
+——————–+———-+
15 rows in set (0.00 sec)

Is actually the same. So feel free to use one form or another. count(*) looks better IMO

The missing php function: mysql_fetch_all

The function:

1
2
3
4
5
6
7
8
function mysql_fetch_all($query, $kind = 'assoc')
{
    $result = array();
    $kind = $kind === 'assoc' ? $kind : 'row';
    eval('while(@$r = mysql_fetch_' . $kind . '($query)) array_push($result, $r);');
    return $result;

}

Usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$query = "SELECT * FROM my_table;"
$res = mysql_query($query, $mysqll);
$res_array = mysql_fetch_all($res);

print_r($res_array);

/*
Output:
Array
(
    [0] => Array
        (
            [col_A] => val_A1
            [col_b] => val_B1
        )
    [1] => Array
        (
            [col_A] => val_A2
            [col_b] => val_B2
        )
* /

Show current database on mysql console

I’m sure everyone has at least once deleted something on the wrong database…

In an attempt to avoid that you can rewrite mysql’s PS1 value in order to display the current database you’re on.

Just edit your ~/.bashrc:

1
export MYSQL_PS1="mysql@\d > "

Don’t forget to relaunch it:

1
source ~/.bashrc

Et voila, the old “mysql>” is now replaced with “mysql@current_database>

Sweet :)

MySQL Error: Illegal mix of collations

Say you wanted to do a simple join such as:

1
SELECT count(1) FROM table_a a LEFT OUTER JOIN table_b b ON (a.varchar_column = b.varchar_column) WHERE b.varchar_column IS NULL;

And you got:

1
ERROR 1267 (HY000): Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_unicode_ci,IMPLICIT) for operation '='

All you need to do is update your table’s charset AND collation:

1
ALTER TABLE table_b CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

PS – table_b or table_a, whichever you decide that should match the other’s charset and collation.