How to remove non-UTF8 characters from strings with PHP

Short post this one – seem to be having some trouble generating an XML feed from a database of over 10,000 listings and remove non-UTF8 characters from the feed. Well, PHP to the rescue. There are many ways of doing this, but find below some regexes that I have tried and tested and work pretty well.

//reject simple non utf-8 xters
preg_replace('/[^(\x20-\x7F)]*/','', $string);
//reject overly long 2 byte sequences,as well as characters above U+10000 and replace with nothing
preg_replace('/[x00-x08x10x0Bx0Cx0E-x19x7F]|[x00-x7F][x80-xBF]+|([xC0xC1]|[xF0-xFF])[x80-xBF]*|[xC2-xDF]((?![x80-xBF])|[x80-xBF]{2,})|[xE0-xEF](([x80-xBF](?![x80-xBF]))|(?![x80-xBF]{2})|[x80-xBF]{3,})/S','?', $string );
//reject overly long 3 byte sequences and UTF-16 surrogates and replace with nothing
preg_replace('/xE0[x80-x9F][x80-xBF]|xED[xA0-xBF][x80-xBF]/S','?', $string );

Should these not work for you, comment!

Share

Block or Allow access to PHP script based on remote IP and CIDR list

400px-CIDR_Address Spam…we all hate it. It is nice when things like reCaptcha and Asimet work but these Chinese have a way of bypassing those spam filters and then fill your blog or website with useless comments and fake registrations. So, how about a way to just block all ip addresses coming from a specific region from viewing your registration page or comment box? Well…find below some steps to follow to do exactly that.

Continue reading

Share

PHP Script to create Add-on Domain in cPanel via XML-API

whm-0 I know, I have been away for a while and the point of this blog being the place where I document challenges I face seems somewhat lost but alas….recovery of that point is nigh.

Now, this may not be a problem that many people face but getting good sample code from the cPanel XML-API documentation is a bust. They do have a nice Forum though, where it is possible to get the answers one may seek but forums involve a lot of reading which I try to avoid when looking for what should be quick solutions.

I have therefore decided to document something that I figured would only take me a couple of minutes to figure out, only for it to kill four precious hours. So here goes…

Continue reading

Share