Scripts - Tutorials - Forum - Downloads - Showcase - Contact

Artikels

VPN vergelijken

Algemeen

Beginpagina

FAQ

Grafische worm (243)

Links

Nieuwsartikels

Nieuwsarchief

Boeken programmeren

Overzicht

Samenwerken

Webhosting

Zoek op Sitemasters

Leden

Registreren

Ledenlijst

Ons team

Links

webhostingtop10.be

Sociale media

Follow @sitemasters

Sitemasters

Adverteren op Sitemasters?

Contacteer ons

RSS

Link naar ons

Donaties

Poll

Je moet ingelogd zijn om te stemmen.

Statistieken

Linkpartners

Forum

Categorieën > PHP

Shuffelen van tekst in HTML (Opgelost)

Jointjeff - 09/10/2014 12:37
HTML interesse		Hallo iedereen, FangorN , Ik heb een HTML-code waarvan de tekst binnen de HTML tags geshuffled moet worden. Met een preg_match_all gooi ik daarom alle HTML elementen in een array. php code - Bekijk de code zonder highlighting - Klap code in preg_match_all( '/<([^\s>]+)(.?)>((.?)<\/\1>)?\|(?<=^\|>)(.+?)(?=$\|<)/i', $content , $elements ); preg_match_all( '/<([^\s>]+)(.?)>((.?)<\/\1>)?\|(?<=^\|>)(.+?)(?=$\|<)/i', $content , $elements ); Vervolgens foreach ik door de elementen heen en gebruik ik de shuffle() functie om de woorden te shuffelen. Echter zijn er een aantal elementen die niet geshuffled moeten worden. Bijvoorbeeld de img-tags, ul, ol, blockquote (inhoud weer wel). In het kort: Een HTML-code waarin alle woorden geshuffled moeten worden, maar de HTML-tags intact blijven. Hoe zou ik dit het beste aan kunnen pakken?

9 antwoorden

Gesponsorde links

Thomas - 09/10/2014 13:25
Moderator		Euh, wat wil je hier uiteindelijk mee bereiken? Je wilt dus met bovenstaande regexp de woorden van $elements[4] door elkaar gooien? En dan dus per tag? Stel je hebt een HTML string <p>A B C</p> <p>D E F</p> Dan mag dit dus worden: <p>B C A</p> <p>E D F</p> Maar niet dat A, B of C naar de tweede paragraaf gaat, en/of D, E of F naar de eerste? Shufflen geschiedt dus per tag? Ik zou een soort van whitelist voor de te shufflen tags opstellen denk ik. Als al deze tags bestaan uit elementen die je niet nest, dan is het simpel. En anders, niet bepaald . Wat je waarschijnlijk gedwongen bent om te doen is het onthouden van de posities van de tags en de onderlinge samenhang, waarschijnlijk moet je dan al aan een soort van parser gaan denken... maar de oplossing hangt sterk af van wat je uiteindelijk wilt bereiken.

Jointjeff - 09/10/2014 13:36 (laatste wijziging 09/10/2014 14:00)
HTML interesse		Ik probeer een techniek na te maken die ook te zien is op MyJour. Zij zetten de tekst in een shuffle totdat je bent ingelogd. https://myjour....m-van-isis Je voorbeeld is correct. Maar het volgende: <li>A B C</li> <li>D E F</li> Moet dus ook dit worden: <li>B A C</li> <li>F E D</li> Terwijl b.v. img-tags moeten worden overgeslagen, of beter: HTML onaangetast moet blijven. Code die ik nu heb: Plaatscode: 142381 Maar volgens mij moet dat beter kunnen.

Thomas - 09/10/2014 14:08
Moderator		Lol what the hell. Wat is daar dan het achterliggende idee van? Is dit om zoekmachines toch relevante resultaten te laten vinden qua zoektermen in (semi afgeschermde) content? Trouwens, als ik die pagina refresh staat de content in dezelfde geshufflede volgorde. Deze wordt dus waarschijnlijk gecached wat mij eigenlijk wel een verstandig idee lijkt, je wilt dit niet elke keer opnieuw uitrekenen maar opslaan in een aparte tabelkolom bij het origineel ofzo. Bij gebrek aan een beter idee zou ik denken aan het bouwen van een soort van parser. Deze leest je content, markeert de positie van alle HTML-tags, deelt dit op in passages (mogelijk met recursie indien je geneste tags hebt). Dan moet je dus per passage bijhouden uit hoeveel woorden deze bestaat, vervolgens gooi je alle woorden in een bak. Na afloop van dit alles gebruik je de eerder opgebouwde structuur om een nieuwe tekst met gelijke structuur en gehusselde inhoud op te bouwen ofzo... Dat is best veel handwerk als je het mij vraagt. En HTML ontleden is best een hels karwei (waar ik overigens niet al te veel ervaring mee heb, ik gaf het meestal halverwege op lol). Heb je al alternatieven overwogen?

Jointjeff - 09/10/2014 15:38
HTML interesse		Ja dat is het achterliggende idee inderdaad. Ik sla nu ook de geshuffelde tekst op in de database, dus dan voorkom je inderdaad dat je telkens die hele functie weer moet uitvoeren. Hoe zou jij dat aanpakken met die parser, technisch gezien. Ik ben al vrij trots dat ik het tot zover werkend heb gekregen, alleen, voor m'n gevoel kan het beter. Ik zou niet direct een alternatief weten als ik eerlijk ben. Jij?

Thomas - 09/10/2014 16:38 (laatste wijziging 09/10/2014 16:40)
Moderator		Als je het simpel wilt houden zou je er voor kunnen kiezen om enkel een inleidende paragraaf te tonen (in de juiste volgorde). Bijkomend voordeel daarvan is dat als dit opgepikt wordt door een zoekmachine dat deze passage bestaat uit een of meer leesbare zinnen. Ik weet sowieso niet of zoekmachines altijd de volledige text verwerken, wellicht kan onze SEO guru hier meer over vertellen? Daarnaast weet ik niet in hoeverre meta tags (description, keywords) nog relevant zijn? Andere oplossingen die mij te binnen schoten (gebruik jQuery om text te husselen, controleer user agent en besluit daarna hoe je content toont, maar dan begeef je je al snel op glad ijs, dat neigt naar "cloaking" (ik dacht dat het zo heette, iig: je biedt dan een zoekmachine andere content aan dan een gewone bezoeker, in het ongunstigste geval kun je dan worden geblacklist)) zijn door handige gebruikers makkelijk te omzeilen (zet javascript uit, pretendeer een zoekmachine te zijn). @parser: ik heb hier zogauw geen goed technische oplossing voor; het idee is dat je je tekst opsplitst in HTML (je tags) en content. De structuur van je HTML moet behouden worden en je content moet volgens bepaalde regels door elkaar gehusseld worden. Je moet je complete HTML-tekst dus opdelen in blokjes ("tokens") met een bepaalde betekenis. EDIT: misschien kan een XML library (SimpleXML, XPath, ... of hoe het ook heet?) hierbij uitkomst bieden? Wat je misschien nog zou kunnen doen om eea eenvoudiger te maken is de nesting eruit kieperen, in de zin dat je geneste tags stript, maar daarmee help je je tekst om zeep in termen van zoekmachine optimalisatie. Denk bijvoorbeeld aan het volgende: <p>Dit is een lopende zin met een <em>belangrijke passage</em>.</p> Als die <em> eruitvliegt, dan is je tekst al in waarde afgenomen waarschijnlijk. En als je dit niet zou doen staat er wellicht complete brol binnen die <em>...</em> tags. Dat doe je eigenlijk sowieso al: je rukt de oorspronkelijke (con)text hiermee uit zijn verband. Als je er zo tegenaan kijkt is dat hele door elkaar gooien van woorden niet echt zo'n strak plan. Ik zou gaan voor een simpele(re) oplossing denk ik.

Jointjeff - 09/10/2014 16:46
HTML interesse		Ik heb al eens een test gedaan met een artikel die geshuffeled was en deze werd vrij goed geïndexeerd (beter dan ik had gedacht). Andere content tonen aan de user agent van b.v. Google is inderdaad cloaking, vandaar ook deze oplossing. Ook niet netjes misschien, maar niet tegen de regels. Ben nu bezig met 'PHP Simple HTML DOM Parser Manual'. Gaat al heel aardig tot dusver. Bedankt voor de info. Ik zal de vorderingen hier ook plaatsen.

Thomas - 09/10/2014 17:56

Moderator

Dit is zo gauw wat ik in de loop van deze thread in elkaar heb gezet. Dit is nog niet echt bruikbaar, maar wellicht geeft het wat inspiratie:

php code - Bekijk de code zonder highlighting - Klap code in

<?php
$content = '<h1>Hello world</h1>
<p>This is a paragraph text.</p>
<ul class="test" style="">
<li>This</li>
<li>is</li>
<li>an</li>
<li>unordered<ul>
    <li>nested</li>
</ul></li>
<li>list.</li>
</ul>
<p>This is yet another paragraph. With an image <img src="" alt="" /></p>';

class HtmlTokens
{
    protected $input;
    protected $structure;

    public function __construct($input) {
        $this->input = $input;
        $this->structure = array();
        $this->openPositions = array();
        $this->closePositions = array();
    }

    public function parse() {
        $stack = array(); // structure index => tag type
        $matches = array();
        $structureIndex = 0;

        preg_match_all('#<([^>]+)>#si', $this->input, $matches, PREG_OFFSET_CAPTURE);
        // use $matches[0] for positions and $matches[1] to read tag types
        foreach ($matches[0] as $matchKey => $matchData) {
            $tagContent         = explode(' ', trim($matches[1][$matchKey][0]));
            $tagPositionStart   = $matchData[1];
            $tagPositionEnd     = $matchData[1] + strlen($matchData[0]) - 1;
            $tag                = $tagContent[0];
            $matchingIndex      = false; // structure index of matching opening tag
            $parent             = false;

            if ($tag{0} == '/') {
                // closing tag
                $tag = substr($tag, 1);
                $type = 'closing';
                // there should be something on the stack, otherwise: malformed HTML?
                if (count($stack) == 0) {
                    die('malformed HTML?');
                }
                $stackData = array_pop($stack);
                // the tags should match as well
                if ($stackData['tag'] == $tag) {
                    $matchingIndex = $stackData['index'];
                    // also update the opening tag
                    $this->structure[$stackData['index']]['match'] = $structureIndex;
                } else {
                    die('malformed HTML?');
                }
            } else {
                if (count($stack) > 0) {
                    $parent = $stack[count($stack)-1]['index'];
                }
                // opening tag... or self closing tag
                if (array_pop($tagContent) == '/') {
                    $type = 'self closing';
                } else {
                    $type = 'opening';
                    $stack[] = array(
                        'index' => $structureIndex,
                        'tag'   => $tag,
                    );
                }
            }
            $this->structure[$structureIndex] = array(
                'tag'       => $tag,
                'type'      => $type,
                'start'     => $tagPositionStart,
                'end'       => $tagPositionEnd,
                'match'     => $matchingIndex,
                'parent'    => ($type == 'opening' || $type == 'self closing' ? $parent : false),
            );
            $structureIndex++;
        } // foreach

        // if the stack still contains unclosed tags...
        if (count($stack)) {
            die('malformed HTML?');
        }

        var_dump($this->structure);
        return $this;
    } // parse
} // class

$test = new HtmlTokens($content);
$test->parse();
?>

<?php
$content = '<h1>Hello world</h1>
<p>This is a paragraph text.</p>
<ul class="test" style="">
<li>This</li>
<li>is</li>
<li>an</li>
<li>unordered<ul>
    <li>nested</li>
</ul></li>
<li>list.</li>
</ul>
<p>This is yet another paragraph. With an image <img src="" alt="" /></p>';
 
class HtmlTokens
{
    protected $input;
    protected $structure;
 
    public function __construct($input) {
        $this->input = $input;
        $this->structure = array();
        $this->openPositions = array();
        $this->closePositions = array();
    }
 
    public function parse() {
        $stack = array(); // structure index => tag type
        $matches = array();
        $structureIndex = 0;
 
        preg_match_all('#<([^>]+)>#si', $this->input, $matches, PREG_OFFSET_CAPTURE);
        // use $matches[0] for positions and $matches[1] to read tag types
        foreach ($matches[0] as $matchKey => $matchData) {
            $tagContent         = explode(' ', trim($matches[1][$matchKey][0]));
            $tagPositionStart   = $matchData[1];
            $tagPositionEnd     = $matchData[1] + strlen($matchData[0]) - 1;
            $tag                = $tagContent[0];
            $matchingIndex      = false; // structure index of matching opening tag
            $parent             = false;
 
            if ($tag{0} == '/') {
                // closing tag
                $tag = substr($tag, 1);
                $type = 'closing';
                // there should be something on the stack, otherwise: malformed HTML?
                if (count($stack) == 0) {
                    die('malformed HTML?');
                }
                $stackData = array_pop($stack);
                // the tags should match as well
                if ($stackData['tag'] == $tag) {
                    $matchingIndex = $stackData['index'];
                    // also update the opening tag
                    $this->structure[$stackData['index']]['match'] = $structureIndex;
                } else {
                    die('malformed HTML?');
                }
            } else {
                if (count($stack) > 0) {
                    $parent = $stack[count($stack)-1]['index'];
                }
                // opening tag... or self closing tag
                if (array_pop($tagContent) == '/') {
                    $type = 'self closing';
                } else {
                    $type = 'opening';
                    $stack[] = array(
                        'index' => $structureIndex,
                        'tag'   => $tag,
                    );
                }
            }
            $this->structure[$structureIndex] = array(
                'tag'       => $tag,
                'type'      => $type,
                'start'     => $tagPositionStart,
                'end'       => $tagPositionEnd,
                'match'     => $matchingIndex,
                'parent'    => ($type == 'opening' || $type == 'self closing' ? $parent : false),
            );
            $structureIndex++;
        } // foreach
 
        // if the stack still contains unclosed tags...
        if (count($stack)) {
            die('malformed HTML?');
        }
 
        var_dump($this->structure);
        return $this;
    } // parse
} // class
 
$test = new HtmlTokens($content);
$test->parse();
?>

Jointjeff - 10/10/2014 20:06 (laatste wijziging 10/10/2014 20:11)
HTML interesse		Ik heb met de PHP Simple HTML DOM Parser een script geschreven die in grote lijnen dat wat ik wil, zie ook: http://plaatscode.be/142383/ Zit enkel nog met tabellen. Die zijn wat lastiger door de "nested"-elementen. Ik zal jouw methode ook even gaan uitproberen.

Thomas - 11/10/2014 12:58

Moderator

Oh, ik dacht dat je de woorden van de hele tekst door de hele tekst wilde husselen, met uitzondering van enkele elementen, maar wat in bovenstaande code gebeurt -als ik het goed begrijp- is dat de inhoud van specifieke elementen binnen het element worden verwisseld. Zoveel vermoedde ik al eerder:

FangorN schreef:

Maar niet dat A, B of C naar de tweede paragraaf gaat, en/of D, E of F naar de eerste? Shufflen geschiedt dus per tag?

Wat is er trouwens mis met implode en explode (je gebruikt een preg_replace en een loop hiervoor)?

Anyway, mijn laatste versie:

php code - Bekijk de code zonder highlighting - Klap code in

<?php
class HtmlTokens
{
    protected $input;
    protected $structure;

    public function __construct($input) {
        $this->input = $input;
        $this->structure = array(
            array(
                'tag'       => 'root',
                'type'      => 'root',
                'start'     => 0,
                'end'       => 0,
                'match'     => false,
                'parent'    => false,
                'children'  => array(),
            ),
        );
        $this->openPositions = array();
        $this->closePositions = array();
    }

    public function parse() {
        $stack = array(); // structure index => tag type
        $matches = array();
        $structureIndex = 1;
        // http://stackoverflow.com/questions/3558119/are-self-closing-tags-valid-in-html5
        $voidElements = array(
            'area',
            'base',
            'br',
            'col',
            'command',
            'embed',
            'hr',
            'img',
            'input',
            'keygen',
            'link',
            'meta',
            'param',
            'source',
            'track',
            'wbr',
        );

        preg_match_all('#<([^>]+)>#si', $this->input, $matches, PREG_OFFSET_CAPTURE);
        // use $matches[0] for positions and $matches[1] to read tag types
        foreach ($matches[0] as $matchKey => $matchData) {
            $tagContent         = explode(' ', trim($matches[1][$matchKey][0]));
            $tagPositionStart   = $matchData[1];
            $tagPositionEnd     = $matchData[1] + strlen($matchData[0]) - 1;
            $tag                = $tagContent[0];
            $matchingIndex      = false; // structure index of matching opening tag
            $parent             = false;

            if ($tag{0} == '/') {
                // closing tag
                $tag = substr($tag, 1);
                $type = 'closing';
                // there should be something on the stack, otherwise: malformed HTML?
                if (count($stack) == 0) {
                    die('malformed HTML?');
                }
                $stackData = array_pop($stack);
                // the tags should match as well
                if ($stackData['tag'] == $tag) {
                    $matchingIndex = $stackData['index'];
                    // also update the opening tag
                    $this->structure[$stackData['index']]['match'] = $structureIndex;
                } else {
                    die('malformed HTML?');
                }
            } else {
                if (count($stack) > 0) {
                    $parent = $stack[count($stack)-1]['index'];
                } else {
                    $parent = 0;
                }

                // opening tag... or self closing tag - the slash is optional apparently
                if (array_pop($tagContent) == '/') {
                    if (!in_array($tag, $voidElements)) {
                        die('a non-void element may not be self-closing');
                    }
                    $type = 'self closing';
                } elseif (in_array($tag, $voidElements)) {
                    $type = 'self closing';
                } else {
                    $type = 'opening';
                    $stack[] = array(
                        'index' => $structureIndex,
                        'tag'   => $tag,
                    );
                }
            }
            $this->structure[$structureIndex] = array(
                'tag'       => $tag,
                'type'      => $type,
                'start'     => $tagPositionStart,
                'end'       => $tagPositionEnd,
                'match'     => $matchingIndex,
                'parent'    => ($type == 'opening' || $type == 'self closing' ? $parent : false),
                'children'  => array(),
            );
            // store item as child of parent
            if ($type == 'opening' || $type == 'self closing') {
                $this->structure[$parent]['children'][] = $structureIndex;
            }
            $structureIndex++;
        } // foreach

        // if the stack still contains unclosed tags...
        if (count($stack)) {
            die('malformed HTML?');
        }

        // var_dump($this->structure);
        return $this;
    } // parse

    // $ignoreTags contains tags in which the order of words should not be altered
    public function getNonTagIntervals($ignoreTags=array()) {
        $positions = array();
        $inputLength = strlen($this->input) - 1;

        $index = 0;
        $positions[$index][0] = 0;
        $positions[$index][2] = false; // initially, we do not ignore text
        $ignoreStack = array();

        foreach ($this->structure as $data) {
            // here you could add rules that determine whether content should be shuffled inside specific tags, or not
            // are we opening a tag?
            if ($data['type'] == 'opening') {
                // is it an element we want to ignore?
                if (in_array($data['tag'], $ignoreTags)) {
                    // add it to the stack
                    $ignoreStack[] = $data['tag'];
                }
            } elseif ($data['type'] == 'closing') {
                // if it is an element we want to ignore, it implicitly matches the opening tag (parse() guarantees this)
                // @todo double check this :)
                if (in_array($data['tag'], $ignoreTags)) {
                    array_pop($ignoreStack);
                }
            }
            $positions[$index][1] = ($data['start'] - 1 < 0 ? 0 : $data['start']) ;
            $index++;
            $positions[$index][0] = ($data['end'] + 1 > $inputLength ? $inputLength : $data['end'] + 1);
            $positions[$index][2] = count($ignoreStack) > 0; // whether to ignore
        }
        $positions[$index][1] = $inputLength;

        // traverse $positions in reverse order
        $test = $this->input;
        foreach (array_reverse($positions) as $position) {
            // only process nonempty intervals
            if ($position[1] > $position[0]) {
                $length = $position[1] - $position[0];
                $text = substr($test, $position[0], $length);
                // do not handle text if it contains nothing but spaces and linebreaks
                if (trim($text) != '') {
                    // ignore?
                    if ($position[2] === false) {
                        // debug to see what parts are selected
                        // $text = '[c]'.$text.'[/c]';
                        $text = explode(' ', trim($text));
                        shuffle($text);
                        $text = implode(' ', $text);
                    }
                }
                $test = substr($test, 0, $position[0]).$text.substr($test, $position[1]);
            }
        }
        return $test;
    }
} // class

$content = '<h1>Hello world</h1>
<p>This is a paragraph text.</p>
<ul class="test" style="">
<li>This</li>
<li>is</li>
<li>an</li>
<li>unordered<ul>
    <li>nested</li>
</ul></li>
<li>list.</li>
</ul>
<p>This is yet another paragraph. With an image <img src="" alt="" /></p>';

$test = new HtmlTokens($content);
$test->parse();
// Aan getNonTagIntervals() kun je een array meegeven van HTML elementen die je niet wilt shufflen.
echo '<pre>'.htmlspecialchars($test->getNonTagIntervals(), ENT_QUOTES, 'UTF-8').'</pre>';
?>

<?php
class HtmlTokens
{
    protected $input;
    protected $structure;
 
    public function __construct($input) {
        $this->input = $input;
        $this->structure = array(
            array(
                'tag'       => 'root',
                'type'      => 'root',
                'start'     => 0,
                'end'       => 0,
                'match'     => false,
                'parent'    => false,
                'children'  => array(),
            ),
        );
        $this->openPositions = array();
        $this->closePositions = array();
    }
 
    public function parse() {
        $stack = array(); // structure index => tag type
        $matches = array();
        $structureIndex = 1;
        // http://stackoverflow.com/questions/3558119/are-self-closing-tags-valid-in-html5
        $voidElements = array(
            'area',
            'base',
            'br',
            'col',
            'command',
            'embed',
            'hr',
            'img',
            'input',
            'keygen',
            'link',
            'meta',
            'param',
            'source',
            'track',
            'wbr',
        );
 
        preg_match_all('#<([^>]+)>#si', $this->input, $matches, PREG_OFFSET_CAPTURE);
        // use $matches[0] for positions and $matches[1] to read tag types
        foreach ($matches[0] as $matchKey => $matchData) {
            $tagContent         = explode(' ', trim($matches[1][$matchKey][0]));
            $tagPositionStart   = $matchData[1];
            $tagPositionEnd     = $matchData[1] + strlen($matchData[0]) - 1;
            $tag                = $tagContent[0];
            $matchingIndex      = false; // structure index of matching opening tag
            $parent             = false;
 
            if ($tag{0} == '/') {
                // closing tag
                $tag = substr($tag, 1);
                $type = 'closing';
                // there should be something on the stack, otherwise: malformed HTML?
                if (count($stack) == 0) {
                    die('malformed HTML?');
                }
                $stackData = array_pop($stack);
                // the tags should match as well
                if ($stackData['tag'] == $tag) {
                    $matchingIndex = $stackData['index'];
                    // also update the opening tag
                    $this->structure[$stackData['index']]['match'] = $structureIndex;
                } else {
                    die('malformed HTML?');
                }
            } else {
                if (count($stack) > 0) {
                    $parent = $stack[count($stack)-1]['index'];
                } else {
                    $parent = 0;
                }
 
                // opening tag... or self closing tag - the slash is optional apparently
                if (array_pop($tagContent) == '/') {
                    if (!in_array($tag, $voidElements)) {
                        die('a non-void element may not be self-closing');
                    }
                    $type = 'self closing';
                } elseif (in_array($tag, $voidElements)) {
                    $type = 'self closing';
                } else {
                    $type = 'opening';
                    $stack[] = array(
                        'index' => $structureIndex,
                        'tag'   => $tag,
                    );
                }
            }
            $this->structure[$structureIndex] = array(
                'tag'       => $tag,
                'type'      => $type,
                'start'     => $tagPositionStart,
                'end'       => $tagPositionEnd,
                'match'     => $matchingIndex,
                'parent'    => ($type == 'opening' || $type == 'self closing' ? $parent : false),
                'children'  => array(),
            );
            // store item as child of parent
            if ($type == 'opening' || $type == 'self closing') {
                $this->structure[$parent]['children'][] = $structureIndex;
            }
            $structureIndex++;
        } // foreach
 
        // if the stack still contains unclosed tags...
        if (count($stack)) {
            die('malformed HTML?');
        }
 
        // var_dump($this->structure);
        return $this;
    } // parse
 
    // $ignoreTags contains tags in which the order of words should not be altered
    public function getNonTagIntervals($ignoreTags=array()) {
        $positions = array();
        $inputLength = strlen($this->input) - 1;
 
        $index = 0;
        $positions[$index][0] = 0;
        $positions[$index][2] = false; // initially, we do not ignore text
        $ignoreStack = array();
 
        foreach ($this->structure as $data) {
            // here you could add rules that determine whether content should be shuffled inside specific tags, or not
            // are we opening a tag?
            if ($data['type'] == 'opening') {
                // is it an element we want to ignore?
                if (in_array($data['tag'], $ignoreTags)) {
                    // add it to the stack
                    $ignoreStack[] = $data['tag'];
                }
            } elseif ($data['type'] == 'closing') {
                // if it is an element we want to ignore, it implicitly matches the opening tag (parse() guarantees this)
                // @todo double check this :)
                if (in_array($data['tag'], $ignoreTags)) {
                    array_pop($ignoreStack);
                }
            }
            $positions[$index][1] = ($data['start'] - 1 < 0 ? 0 : $data['start']) ;
            $index++;
            $positions[$index][0] = ($data['end'] + 1 > $inputLength ? $inputLength : $data['end'] + 1);
            $positions[$index][2] = count($ignoreStack) > 0; // whether to ignore
        }
        $positions[$index][1] = $inputLength;
 
        // traverse $positions in reverse order
        $test = $this->input;
        foreach (array_reverse($positions) as $position) {
            // only process nonempty intervals
            if ($position[1] > $position[0]) {
                $length = $position[1] - $position[0];
                $text = substr($test, $position[0], $length);
                // do not handle text if it contains nothing but spaces and linebreaks
                if (trim($text) != '') {
                    // ignore?
                    if ($position[2] === false) {
                        // debug to see what parts are selected
                        // $text = '[c]'.$text.'[/c]';
                        $text = explode(' ', trim($text));
                        shuffle($text);
                        $text = implode(' ', $text);
                    }
                }
                $test = substr($test, 0, $position[0]).$text.substr($test, $position[1]);
            }
        }
        return $test;
    }
} // class
 
$content = '<h1>Hello world</h1>
<p>This is a paragraph text.</p>
<ul class="test" style="">
<li>This</li>
<li>is</li>
<li>an</li>
<li>unordered<ul>
    <li>nested</li>
</ul></li>
<li>list.</li>
</ul>
<p>This is yet another paragraph. With an image <img src="" alt="" /></p>';
 
$test = new HtmlTokens($content);
$test->parse();
// Aan getNonTagIntervals() kun je een array meegeven van HTML elementen die je niet wilt shufflen.
echo '<pre>'.htmlspecialchars($test->getNonTagIntervals(), ENT_QUOTES, 'UTF-8').'</pre>';
?>

Bedankt door: Jointjeff

Gesponsorde links

Je moet ingelogd zijn om een reactie te kunnen posten.

Actiefste leden van de maand

Actieve forumberichten