Jaslabs: High performance Software

High Performance Software

Archive for January, 2006

using memcached and php

What is memcached?

memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Danga Interactive developed memcached to enhance the speed of LiveJournal.com, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. memcached dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss.

How it Works

First, you start up the memcached daemon on as many spare machines as you have. The daemon has no configuration file, just a few command line options, only 3 or 4 of which you’ll likely use: # ./memcached -d -m 2048 -l 10.0.0.40 -p 11211

This starts memcached up as a daemon, using 2GB of memory, and listening on IP 10.0.0.40, port 11211. Because a 32-bit process can only address 4GB of virtual memory (usually significantly less, depending on your operating system), if you have a 32-bit server with 4-64GB of memory using PAE you can just run multiple processes on the machine, each using 2 or 3GB of memory.

Shouldn’t the database do this?

Regardless of what database you use (MS-SQL, Oracle, Postgres, MysQL-InnoDB, etc..), there’s a lot of overhead in implementing ACID properties in a RDBMS, especially when disks are involved, which means queries are going to block. For databases that aren’t ACID-compliant (like MySQL-MyISAM), that overhead doesn’t exist, but reading threads block on the writing threads.

What about shared memory?

The first thing people generally do is cache things within their web processes. But this means your cache is duplicated multiple times, once for each mod_perl/PHP/etc thread. This is a waste of memory and you’ll get low cache hit rates. If you’re using a multi-threaded language or a shared memory API (IPC::Shareable, etc), you can have a global cache for all threads, but it’s per-machine. It doesn’t scale to multiple machines. Once you have 20 webservers, those 20 independent caches start to look just as silly as when you had 20 threads with their own caches on a single box. (plus, shared memory is typically laden with limitations)

The memcached server and clients work together to implement one global cache across as many machines as you have. In fact, it’s recommended you run both web nodes (which are typically memory-lite and CPU-hungry) and memcached processes (which are memory-hungry and CPU-lite) on the same machines. This way you’ll save network ports.

What about MySQL 4.x query caching?

MySQL query caching is less than ideal, for a number of reasons:

MySQL’s query cache destroys the entire cache for a given table whenever that table is changed. On a high-traffic site with updates happening many times per second, this makes the the cache practically worthless. In fact, it’s often harmful to have it on, since there’s a overhead to maintain the cache.

On 32-bit architectures, the entire server (including the query cache) is limited to a 4 GB virtual address space. memcached lets you run as many processes as you want, so you have no limit on memory cache size.

MySQL has a query cache, not an object cache. If your objects require extra expensive construction after the data retrieval step, MySQL’s query cache can’t help you there.
If the data you need to cache is small and you do infrequent updates, MySQL’s query caching should work for you. If not, use memcached.

What about database replication?

You can spread your reads with replication, and that helps a lot, but you can’t spread writes (they have to process on all machines) and they’ll eventually consume all your resources. You’ll find yourself adding replicated slaves at an ever-increasing rate to make up for the diminishing returns each addition slave provides.

The next logical step is to horizontally partition your dataset onto different master/slave clusters so you can spread your writes, and then teach your application to connect to the correct cluster depending on the data it needs.

While this strategy works, and is recommended, more databases (each with a bunch of disks) statistically leads to more frequent hardware failures, which are annoying.
With memcached you can reduce your database reads to a mere fraction, leaving the databases to mainly do infrequent writes, and end up getting much more bang for your buck, since your databases won’t be blocking themselves doing ACID bookkeeping or waiting on writing threads.

Is memcached fast?

Very fast. It uses libevent to scale to any number of open connections (using epoll on Linux, if available at runtime), uses non-blocking network I/O, refcounts internal objects (so objects can be in multiple states to multiple clients), and uses its own slab allocator and hash table so virtual memory never gets externally fragmented and allocations are guaranteed O(1).

What about race conditions?

You might wonder: “What if the get_foo() function adds a stale version of the Foo object to the cache right as/after the user updates their Foo object via update_foo()?”
While the server and API only have one way to get data from the cache, there exists 3 ways to put data in:

set — unconditionally sets a given key with a given value (update_foo() should use this)
add — adds to the cache, only if it doesn’t already exist (get_foo() should use this)
replace — sets in the cache only if the key already exists (not as useful, only for completeness)Additionally, all three support an expiration time.

server can be downloaded here: http://www.danga.com/memcached/dist/memcached-1.1.12.tar.gz

php module can be downloaded here: http://pecl.php.net/package/memcache

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • DZone
  • Slashdot
  • StumbleUpon
  • Technorati
No comments

Using PEAR cache

The PEAR Cache is a set of caching classes that allows you to cache multiple types of data, including HTML and images.

The most common use of the PEAR Cache is to cache HTML text. To do this, we use the Output buffering class which caches all text printed or echoed between the start() and end() functions:

require_once(”Cache/Output.php”);

$cache = new Cache_Output(”file”, array(”cache_dir” =&GT “cache/”) );

if ($contents = $cache-&GTstart(md5(”this is a unique key!”))) {

#
# aha, cached data returned
#

print $contents;
print “&LTp&GTCache Hit&LT/p&GT”;

} else {

#
# no cached data, or cache expired
#

print “&LTp&GTDon’t leave home without it…&LT/p&GT”; # place in cache
print “&LTp&GTStand and deliver&LT/p&GT”; # place in cache
print $cache-&GTend(10);

}

The Cache constructor takes the storage driver to use as the first parameter. File, database and shared memory storage drivers are available; see the pear/Cache/Container directory. Benchmarks by Ulf Wendel suggest that the “file” storage driver offers the best performance. The second parameter is the storage driver options. The options are “cache_dir”, the location of the caching directory, and “filename_prefix”, which is the prefix to use for all cached files. Strangely enough, cache expiry times are not set in the options parameter.

To cache some data, you generate a unique id for the cached data using a key. In the above example, we used md5(”this is a unique key!”).

The start() function uses the key to find a cached copy of the contents. If the contents are not cached, an empty string is returned by start(), and all future echo() and print() statements will be buffered in the output cache, until end() is called.

The end() function returns the contents of the buffer, and ends output buffering. The end() function takes as its first parameter the expiry time of the cache. This parameter can be the seconds to cache the data, or a Unix integer timestamp giving the date and time to expire the data, or zero to default to 24 hours.

Another way to use the PEAR cache is to store variables or other data. To do so, you can use the base Cache class:

&LT?php

require_once(”Cache.php”);

$cache = new Cache(”file”, array(”cache_dir” =&GT “cache/”) );
$id = $cache-&GTgenerateID(”this is a unique key”);

if ($data = $cache-&GTget($id)) {

print “Cache hit.&LTbr&GTData: $data”;

} else {

$data = “The quality of mercy is not strained…”;
$cache-&GTsave($id, $data, $expires = 60);
print “Cache miss.&LTbr&GT”;

}

?&GT

To save the data we use save(). If your unique key is already a legal file name, you can bypass the generateID() step. Objects and arrays can be saved because save() will serialize the data for you. The last parameter controls when the data expires; this can be the seconds to cache the data, or a Unix integer timestamp giving the date and time to expire the data, or zero to use the default of 24 hours. To retrieve the cached data we use get().

You can delete a cached data item using $cache-&GTdelete($id) and remove all cached items using $cache-&GTflush().

New: A faster Caching class is Cache-Lite. Highly recommended.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • DZone
  • Slashdot
  • StumbleUpon
  • Technorati
1 comment

How to install php 4.4.1 on iis 6.0 - updated

Earlier this month, I wrote a howto on how to install php 4.4.1 on iis 6.0. I have a small change in those instructions.

As I have recently discovered (I’m not sure why I never saw this before), if you set the doc_root=your web directory, IIS will not be able to see a php file in any of your subdirectories.

This value doesn’t even need to be set at all.

so, rather than setting the doc_root to your web root directory, don’t even bother settting it, unless you know what you are doing.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • DZone
  • Slashdot
  • StumbleUpon
  • Technorati
No comments

The php zend engine

The Zend Engine is the internal compiler and runtime engine used by PHP4. Developed by Zeev Suraski and Andi Gutmans, the Zend Engine is an abbreviation of their names. In the early days of PHP4, it worked as follows:

The PHP script was loaded by the Zend Engine and compiled into Zend opcode. Opcodes, short for operation codes, are low level binary instructions. Then the opcode was executed and the HTML generated sent to the client. The opcode was flushed from memory after execution.

Today, there are a multitude of products and techniques to help you speed up this process. In the following diagram, we show the how modern PHP scripts work; all the shaded boxes are optional.

PHP Scripts are loaded into memory and compiled into Zend opcodes.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • DZone
  • Slashdot
  • StumbleUpon
  • Technorati
No comments

How to play a movie on your website

The following html code will allow you to play flash,quicktime,real meadia, or microsoft media files from your webpage.

Flash:


Quicktime:


REAL:




Launch in external player

Microsoft media:


Launch in external player
Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • DZone
  • Slashdot
  • StumbleUpon
  • Technorati
2 comments

Next Page »