Development

Generic Syntax Highlighting with Regular Expressions in pure PHP

Due to Google AMP (Accelerated Mobile Pages) , I have been looking for a way to effectively do Syntax Highlighting without Javascript in pure PHP.

I was about to write my own, when I found an older article from phoboslab. Thanks Dominic for saving me some time ;) Its not perfect, but close enough.

A simple Syntax Highlighting Class that does just that. The class was not working with PHP 5.4.x+, as it uses  preg_replace() with the /e modifier.

It will not cover all, but its better than nothing :) I will also add a section to my my AMP tweaks article to showcase the integration of Geshi.

Here an updated version using the preg_replace_callback() function.

THE SYNTAX HIGHLIGHT CLASS

  1. class SyntaxHighlight {
  2.    
  3.     static $tokens = array();// This array will be filled from the regexp-callback
  4.  
  5.     public static function process($s) {
  6.         $s = htmlspecialchars($s);
  7.  
  8.         // Workaround for escaped backslashes
  9.         $s = str_replace('\\\\','\\\\<e>', $s);
  10.  
  11.         $regexp = array(
  12.  
  13.             // Punctuations
  14.             '/([\-\!\%\^\*\(\)\+\|\~\=\`\{\}\[\]\:\"\'<>\?\,\.\/]+)/'
  15.             => '<span class="P">$1</span>',
  16.  
  17.             // Numbers (also look for Hex)
  18.             '/(?<!\w)(
  19.                (0x|\#)[\da-f]+|
  20.                \d+|
  21.                \d+(px|em|cm|mm|rem|s|\%)
  22.            )(?!\w)/ix'
  23.             => '<span class="N">$1</span>',
  24.  
  25.             // Make the bold assumption that an
  26.             // all uppercase word has a special meaning
  27.             '/(?<!\w|>|\#)(
  28.                [A-Z_0-9]{2,}
  29.            )(?!\w)/x'
  30.             => '<span class="D">$1</span>',
  31.  
  32.             // Keywords
  33.             '/(?<!\w|\$|\%|\@|>)(
  34.                and|or|xor|for|do|while|foreach|as|return|die|exit|if|then|else|
  35.                elseif|new|delete|try|throw|catch|finally|class|function|string|
  36.                array|object|resource|var|bool|boolean|int|integer|float|double|
  37.                real|string|array|global|const|static|public|private|protected|
  38.                published|extends|switch|true|false|null|void|this|self|struct|
  39.                char|signed|unsigned|short|long
  40.            )(?!\w|=")/ix'
  41.             => '<span class="K">$1</span>',
  42.  
  43.             // PHP/Perl-Style Vars: $var, %var, @var
  44.             '/(?<!\w)(
  45.                (\$|\%|\@)(\->|\w)+
  46.            )(?!\w)/ix'
  47.             => '<span class="V">$1</span>'
  48.  
  49.         );      
  50.        
  51.         $s = preg_replace_callback( '/(
  52.                \/\*.*?\*\/|
  53.                \/\/.*?\n|
  54.                \#.[^a-fA-F0-9]+?\n|
  55.                \<\!\-\-[\s\S]+\-\-\>|
  56.                (?<!\\\)&quot;.*?(?<!\\\)&quot;|
  57.                (?<!\\\)\'(.*?)(?<!\\\)\'
  58.            )/isx' , array('SyntaxHighlight', 'replaceId'),$s);
  59.                        
  60.         $s = preg_replace(array_keys($regexp), array_values($regexp), $s);
  61.  
  62.         // Paste the comments and strings back in again
  63.         $s = str_replace(array_keys(SyntaxHighlight::$tokens), array_values(SyntaxHighlight::$tokens), $s);
  64.  
  65.         // Delete the "Escaped Backslash Workaround Token" (TM)
  66.         // and replace tabs with four spaces.
  67.         $s = str_replace(array('<e>', "\t"), array('', '    '), $s);
  68.  
  69.         return '<pre>'.$s.'</pre>' ;
  70.     }
  71.  
  72.     // Regexp-Callback to replace every comment or string with a uniqid and save
  73.     // the matched text in an array
  74.     // This way, strings and comments will be stripped out and wont be processed
  75.     // by the other expressions searching for keywords etc.
  76.      static function replaceId($match) {
  77.         $id = "##r" . uniqid() . "##";
  78.        
  79.         // String or Comment?
  80.         if(substr($match[1], 0, 2) == '//' || substr($match[1], 0, 2) == '/*' || substr($match[1], 0, 2) == '##' || substr($match[1], 0, 7) == '<!--') {
  81.             SyntaxHighlight::$tokens[$id] = '<span class="C">' . $match[1] . '</span>';
  82.         } else {
  83.            SyntaxHighlight::$tokens[$id] = '<span class="S">' . $match[1] . '</span>';
  84.         }
  85.                
  86.         return $id;
  87.     }
  88. }

THE CSS

  1. pre {
  2.         font-family: 'Courier New', 'Bitstream Vera Sans Mono', 'monospace';
  3.         font-size: 9pt;
  4.         border-top: 1px solid #333;
  5.         border-bottom: 1px solid #333;
  6.         padding: 0.4em;
  7.         color: #fff;
  8. }
  9. pre span.N{ color:#f2c47f; } /* Numbers */
  10. pre span.S{ color:#42ff00; } /* Strings */
  11. pre span.C{ color:#838383; } /* Comments */
  12. pre span.K{ color:#ff0078; } /* Keywords */
  13. pre span.V{ color:#70d6ff; } /* Vars */
  14. pre span.D{ color:#ff9a5d; } /* Defines */

USAGE

  1. echo SyntaxHighlight::process( $your_code );

@GitHub portalzine/UtilityBelt/SyntaxHighlight

Enjoy coding …

Alex

I am a full-stack developer. I love programming,  design and know my way around server architecture as well.  I would never feel complete, with one of these missing. I have a broad range of interests, that’s why I constantly dive into new technologies and expand my knowledge where ever required. Technologies are evolving fast and I enjoy using the latest. Apart from that, I am a peace loving guy who tries to have people around him that think the same.  I truly believe in the principle: “If you help someone, someone will help you, when you need it."

Recent Posts

B&B / Hotel Booking Solutions for WordPress | 2024

BOOKING SOLUTIONS 202x This is my take on a subset of booking, appointment, PMS or… Read More

3 weeks ago

WordPress Cron + WP-CLI + Ntfy

THE GOAL Create a system cron for WordPress, that is accessible and can be easily… Read More

2 months ago

2024 is here and now :)

2024, what's cooking? Slowly getting into the 2024 spirit. 3 projects coming to a close… Read More

4 months ago

2023 ends and whats next !

Short look back at 2023 This has been a busy and interesting year. I am… Read More

4 months ago

cubicFUSION Grid Tweaker – Elementor Grid made easy.

Elementor Pro provides grid containers as an experimental feature. The options provided are limited, when… Read More

5 months ago

Archaeology Travel Booth – Travel Innovation Summit 2023

Archaeology Travel is an online travel guide for people who enjoy exploring the world’s pasts.… Read More

6 months ago