Blank Lines at the Top of Website Pages & Reasons for Characters and Perfect Solutions

Posted by roze on Mon, 07 Oct 2019 04:15:02 +0200

From Personal Blog: http://hurbai.com

Sometimes a blank line appears on the head of the webpage. Looking at the source code, it is found that there is an illegal character &#65279 at the beginning of the body, because the page's encoding is UTF-8 + BOM.

UTF-8 + BOM encoding will normally appear in the windows operating system, such as WINDOWS's own notepad and other software. When saving a UTF-8 encoded file, it will insert three invisible characters at the beginning of the file (0xEF 0xBB 0xBF, BOM). It is a string of hidden characters, used to let notepad and other editors recognize whether the file is encoded in UTF-8. For general documents, this will not cause any trouble. But for PHP, BOM is a big problem. Because PHP does not ignore BOM, BOM is used as part of the body at the beginning of the file when reading, including, or referencing these files. According to the characteristics of embedded language, this string of characters will be directly executed (displayed), that is, the (&#279) characters we see.

Solution

Find the relevant pages that appear &#65279 characters (php,html,css,js, etc.) to view page encoding. If UTF-8 + BOM encoding is used, notepad++ or other tools can be stored as UTF-8 without BOM.

If you have more files and don't know where to start, you can use the following methods to achieve this.

Save the following code as a.php (optionally named) file in the root directory, and then run the file to automatically clean up the format.

** Supplement: ** If it is cleared in the server, for security reasons, first backup and then operation, and make sure that the file has write permission, otherwise it can not be cleared.

<?php
// Set the root directory of the BOM you want to clear (all subdirectories and files will be scanned automatically)
$HOME = dirname(__FILE__);
// If it's a Windows system, change it to: $WIN = 1;
$WIN = 0;
?>
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>UTF8 BOM Scavenger</title>
    <style>
        body { font-size: 10px; font-family: Arial, Helvetica, sans-serif; background: #FFF; color: #000; }
        .FOUND { color: #F30; font-size: 14px; font-weight: bold; }
    </style>
</head>
<body>
<?php
$BOMBED = array();
RecursiveFolder($HOME);
echo '<h2>These documents include UTF8 BOM,But I cleaned them up::</h2><p class="FOUND">';
foreach ($BOMBED as $utf) { echo $utf ."<br />\n"; }
echo '</p>';
// Recursive scanning
function RecursiveFolder($sHOME) {
    global $BOMBED, $WIN;
    $win32 = ($WIN == 1) ? "\\" : "/";
    $folder = dir($sHOME);
    $foundfolders = array();
    while ($file = $folder->read()) {
        if($file != "." and $file != "..") {
            if(filetype($sHOME . $win32 . $file) == "dir"){
                $foundfolders[count($foundfolders)] = $sHOME . $win32 . $file;
            } else {
                $content = file_get_contents($sHOME . $win32 . $file);
                $BOM = SearchBOM($content);
                if ($BOM) {
                    $BOMBED[count($BOMBED)] = $sHOME . $win32 . $file;
                    // Remove BOM information
                    $content = substr($content,3);
                    // Write back to the original file
                    file_put_contents($sHOME . $win32 . $file, $content);
                }
            }
        }
    }
    $folder->close();
    if(count($foundfolders) > 0) {
        foreach ($foundfolders as $folder) {
            RecursiveFolder($folder, $win32);
        }
    }
}
// Search for BOM in the current file
function SearchBOM($string) {
    if(substr($string,0,3) == pack("CCC",0xef,0xbb,0xbf)) return true;
    return false;
}
?>
</body>
</html>

From Personal Blog: http://hurbai.com

Topics: PHP encoding Windows