打开/关闭菜单
打开/关闭外观设置菜单
打开/关闭个人菜单
未登录
登录后可编辑和发表评论。

Module:Flatten

来自Vocawiki
模块文档  [查看] [编辑] [历史] [刷新]

此模块用于把多行wikitext压缩至一行。

用途

此模块最初是为了{{Hid}}编写的。由于MediaWiki的wikitext解析器存在问题,导致把多行wikitext放在列表(*#)缩进(:;)上时会出现错误的结果。

一个著名的例子就是{{Hide}}不能与列表和缩进联用(参见Special:滥用过滤器/30):

* {{Hide}}
* 文本
文本
    • 文本

    文本

    由于{{Hide}}展开后是多行wikitext,与列表或缩进连用会导致后续内容全部缩进。而此模块能够预先把多行wikitext压缩至一行,从而避免该问题。

    * {{#invoke:Flatten|main| {{Hide}} }}
    * 文本
    文本
    
    • 文本

    文本

    此模块同样适用于面临相同困扰的其他模板,例如{{VersionHistory}}、{{Clade}}等。

    技术细节

    参见:敌我同源

    此模块用Lua部分重写了MediaWiki内置的wikitext解析器,能够事先将表格、列表以及段落解析为HTML,再将它们压缩至一行。

    然而,此模块尚未经过相对充分的测试,其解析结果可能会与预期存在一定的差别。

    MediaWiki原生的解析器标签(如<ref>)和各种扩展带来的扩展标签(如<poem>)在传入模块时会被替换为条状标记,这使得模块无法得知标签内部有什么内容。因此,此模块无法将这些标签压缩为一行。但这包括<nowiki>,因为Scribunto唯独提供了展开此条状标记的方法。[1]

    不过,由于<poem>使用较频繁且原理简单,此模块实现了<poem>的部分效果,可以利用[poem]标签来替代:

    {{#tag:pre|{{#invoke:Flatten|main|1=
    [poem style="color:red;"]
    第一行文本
    第二行文本
    [/poem]
    }}}}
    
    <div class="poem" style="color:red;">第一行文本<br/>第二行文本</div>

    目前,本模块检测<categorytree><choose><dynamicpagelist><gallery><poem><poll>这六个标签,若存在相关的条状标记则会报错。

    注释

    1. 实际上,低版本的Scribunto可以展开任何条状标记,但高版本移除了这一功能。
local match = string.match
local find = string.find
local len = string.len
local sub = string.sub
local rep = string.rep
local gsub = string.gsub
local insert = table.insert
local remove = table.remove
local concat = table.concat
local min = math.min
local ipairs = ipairs

-- 该函数模仿了 PHP 中 OFFSET_CAPTURE 模式下的 preg_match 方法
-- 不过该函数只返回整个配对字符串的偏移量,不返回每个捕获组的偏移量
local function match_with_offset(pattern, str, result, offset)
    offset = offset or 1
    result[1] = {match(str, '('..pattern..')', offset)}
    result[2] = find(str, pattern, offset)
    local ret = not (result[1][1] == nil)
    if #result[1] == 1 then
        result[1] = result[1][1]
    end
    return ret
end

local function explode(delim, str, limit)
    local result = {}
    local m = {0, 0}
    local offset = 1
    local count = 1
    while (limit == nil or count < limit) and match_with_offset(delim, str, m, offset) do
        insert(result, sub(str, offset, m[2]-1))
        offset = m[2] + len(delim)
        count = count + 1
    end
    insert(result, sub(str, offset))
    return result
end

local function trim(str)
    return match(str, '^ *(.-) *$')
end

local function rtrim(str)
    return match(str, '^(.-) *$')
end

local function strspn(str, charlist)
    return len(match(str, '^[' .. charlist .. ']+') or '')
end

-- 改自 Parser.php
local function parseTables(lines)
    local out = {}
    local td_history = {};
    local last_tag_history = {};
    local tr_history = {};
    local tr_attributes = {};
    local has_opened_tr = {};
    local indent_level = 0;
    
    for _, outLine in ipairs(lines) do
        local line = trim(outLine)
        
        if line == '' then
            insert(out, outLine)
        else--CONTINUE
        local first_character = sub(line, 1, 1)
        local first_two = sub(line, 1, 2)
        local matches = {}
        
        matches[1], matches[3], matches[2] = match(line, '^(:*)%s*({|)(.*)$')
        if matches[3] ~= nil then
            indent_level = len(matches[1] or '')
            
            local attributes = matches[2] or ''
            -- unstripBoth & fixTagAttributes
            
            outLine = rep('<dl><dd>', indent_level) .. '<table ' .. attributes .. '>'
            insert(td_history, false)
            insert(last_tag_history, '')
            insert(tr_history, false)
            insert(tr_attributes, '')
            insert(has_opened_tr, false)
        elseif #td_history == 0 then
            
        elseif first_two == '|}' then
            line = '</table>' .. sub(line, 3)
            local last_tag = remove(last_tag_history)
            
            if not remove(has_opened_tr) then
                line = '<tr><td></td></tr>' .. line
            end
            
            if remove(tr_history) then
                line = '</tr>' .. line
            end
            
            if remove(td_history) then
                line = '</' .. last_tag .. '>' .. line
            end
            remove(tr_attributes)
            if indent_level > 0 then
                outLine = rtrim(line) .. rep('</dd></dl>', indent_level)
            else
                outLine = line
            end
        elseif first_two == '|-' then
            line = gsub(line, '^|%-+', '')
            
            local attributes = line
            -- unstripBoth & fixTagAttributes
            remove(tr_attributes)
            insert(tr_attributes, attributes)
            
            line = ''
            local last_tag = remove(last_tag_history)
            remove(has_opened_tr)
            insert(has_opened_tr, true)
            
            if remove(tr_history) then
                line = '</tr>'
            end
            
            if remove(td_history) then
                line = '</' .. last_tag .. '>' .. line
            end
            
            outLine = line
            insert(tr_history, false)
            insert(td_history, false)
            insert(last_tag_history, '')
        elseif first_character == '|'
            or first_character == '!'
            or first_two == '|+' then

            if first_two == '|+' then
                first_character = '+'
                line = sub(line, 3)
            else
                line = sub(line, 2)
            end
            
            if first_character == '!' then
                -- replaceMarkup
                line = gsub(line, '!!', '||')
            end

            local cells = explode('||', line)
            
            outLine = ''
            
            for _, cell in ipairs(cells) do
                local previous = ''
                if first_character ~= '+' then
                    local tr_after = remove(tr_attributes)
                    if not remove(tr_history) then
                        previous = '<tr ' .. tr_after .. '>'
                    end
                    insert(tr_history, true)
                    insert(tr_attributes, '')
                    remove(has_opened_tr)
                    insert(has_opened_tr, true)
                end
                
                local last_tag = remove(last_tag_history)
                
                if remove(td_history) then
                    previous = '</' .. last_tag .. '>' .. previous
                end
                
                if first_character == '|' then
                    last_tag = 'td'
                elseif first_character == '!' then
                    last_tag = 'th'
                elseif first_character == '+' then
                    last_tag = 'caption'
                else
                    last_tag = ''
                end
                
                insert(last_tag_history, last_tag)
                
                local cell_data_iter = explode('|', cell, 2)
                local cell_data = {}
                for _, item in ipairs(cell_data_iter) do
                    insert(cell_data, item)
                end
                
                if match('[[', cell_data[1]) or match('%-{', cell_data[1]) then
                    cell = previous .. '<' .. last_tag .. '>' .. trim(cell)
                elseif #cell_data == 1 then
                    cell = previous .. '<' .. last_tag .. '>' .. trim(cell_data[1])
                else
                    local attributes = cell_data[1]
                    -- unstripBoth & fixTagAttributes
                    cell = previous .. '<' .. last_tag .. ' ' .. attributes .. '>' .. trim(cell_data[2])
                end
                
                outLine = outLine .. cell
                insert(td_history, true)
            end
        end
        insert(out, outLine)
        end--CONTINUE
    end
    
    while #td_history > 0 do
        if remove(td_history) then
            insert(out, '</td>')
        end
        if remove(tr_history) then
            insert(out, '</tr>')
        end
        if not remove(has_opened_tr) then
            insert(out, '<tr><td></td></tr>')
        end
        
        insert(out, '</table>')
    end
    
    if out[#out] == '\n' then
        remove(out)
    end
    
    --if out == '<table><tr><td></td></tr></table>' then
    --    out = ''
    --end
    
    return out
end


-- 改自 BlockLevelPass.php
local DTopen = false
local lastParagraph = ''

--[[
local COLON_STATE = {
    ['TEXT']=0,
    ['TAG']=1,
    ['TAGSTART']=2,
    ['CLOSETAG']=3,
    ['TAGSLASH']=4,
    ['COMMENT']=5,
    ['COMMENTDASH']=6,
    ['COMMENTDASHDASH']=7,
    ['LC']=8
}
]]

local function hasOpenParagraph()
    return lastParagraph ~= ''
end

local function closeParagraph(atTheEnd)
    atTheEnd = atTheEnd or false
    local result = ''
    
    if hasOpenParagraph() then
        result = '</' .. lastParagraph .. '>'
        if not atTheEnd then
            result = result .. '\n'
        end
    end
    lastParagraph = ''
    return result
end
    
local function getCommon(st1, st2)
    local shorter = min(len(st1), len(st2))
    local count = 1
    
    while count <= shorter do
        if sub(st1, count, count) ~= sub(st2, count, count) then
            break
        end
        count = count + 1
    end
    return count - 1
end

local function openList(char)
    local result = closeParagraph()
    
    if char == '*' then
        result = result .. '<ul><li>'
    elseif char == '#' then
        result = result .. '<ol><li>'
    elseif char == ':' then
        result = result .. '<dl><dd>'
    elseif char == ';' then
        result = result .. '<dl><dt>'
        DTopen = true
    end
        
    return result
end

local function nextItem(char)
    if char == '*' or char == '#' then
        return '</li>\n<li>'
    elseif char == ':' or char == ';' then
        local close = '</dd>\n'
        if DTopen then
            close = '</dt>\n'
        end
        if char == ';' then
            DTopen = true
            return close .. '<dt>'
        else
            DTopen = false
            return close .. '<dd>'
        end
    end
    return ''
end

local function closeList(char)
    local text = ''
    if char == '*' then
        text = '</li></ul>'
    elseif char == '#' then
        text = '</li></ol>'
    elseif char == ':' then
        if DTopen then
            DTopen = false
            text = '</dt></dl>'
        else
            text = '</dd></dl>'
        end
    end
    return text
end

--[[
local function findColonNoLinks(str, before_after)
    local m = {0, 0}
    if not (match_with_offset(':', str, m) or match_with_offset('<', str, m) or match_with_offset('%-{', str, m)) then
        return false
    end
    
    if m[1] == ':' then
        local colonPos = m[2]
        before_after[1] = sub(str, 1, colonPos+1)
        before_after[2] = sub(str, colonPos+2)
        return colonPos
    end
    
    local state = COLON_STATE.TEXT
    local ltLevel = 0
    local lcLevel = 0
    local length = len(str)
    local i = m[2]
    while i < length do
        local c = sub(str, i, i)
        if state == COLON_STATE.TEXT then
            if c == '<' then
                state = COLON_STATE.TAGSTART
            elseif c == ':' then
                if ltLevel == 0 then
                    before_after[1] = sub(str, 1, i+1)
                    before_after[2] = sub(str, i+2)
                    return i
                end
            else
                if not (match_with_offset(':', str, m) or match_with_offset('<', str, m) or match_with_offset('%-{', str, m)) then
                    return false
                end
                if m[1] == '-{' then
                    state = COLON_STATE.LC
                    lcLevel = lcLevel + 1
                    i = m[2] + 1
                else
                    i = m[2] - 1
                end
            end
        elseif state == COLON_STATE.LC then
            if not (match_with_offset('%-{', str, m, i+1) or match_with_offset('}%-', str, m, i+1)) then
                break
            end
            if m[1] == '-{' then
                i = m[2] + 1
                lcLevel = lcLevel + 1
            elseif m[1] == '}-' then
                i = m[2] + 1
                lcLevel = lcLevel - 1
                if lcLevel == 0 then
                    state = COLON_STATE.TEXT
                end
            end
        elseif state == COLON_STATE.TAG then
            if c == '>' then
                ltLevel = ltLevel + 1
                state = COLON_STATE.TEXT
            elseif c == '/' then
                state = COLON_STATE.TAGSLASH
            end
        elseif state == COLON_STATE.TAGSTART then
            if c == '/' then
                state = COLON_STATE.CLOSETAG
            elseif c == '!' then
                state = COLON_STATE.COMMENT
            elseif c == '>' then
                state = COLON_STATE.TEXT
            else
                state = COLON_STATE.TAG
            end
        elseif state == COLON_STATE.CLOSETAG then
            if c == '>' then
                if ltLevel > 0 then
                    ltLevel = ltLevel - 1
                end
                state = COLON_STATE.TEXT
            end
        elseif state == COLON_STATE.TAGSLASH then
            if c == '-' then
                state = COLON_STATE.COMMENTDASH
            else
                state = COLON_STATE.COMMENT
            end
        elseif state == COLON_STATE.COMMENTDASH then
            if c == '>' then
                state = COLON_STATE.TEXT
            else
                state = COLON_STATE.COMMENT
            end
        end
        i = i + 1
    end
    return false
end
]]

-- 不含 h1
local open_blockElems = {
    '<table%A', '<h2%A', '<h3%A', '<h4%A', '<h5%A', 
    '<h6%A', '<pre%A', '<p%A', '<ul%A', '<ol%A', 
    '<dl%A'
}
local close_blockElems = {
    '</table%A', '</h2%A', '</h3%A', '</h4%A', '</h5%A', 
    '</h6%A', '</pre%A', '</p%A', '</ul%A', '</ol%A', 
    '</dl%A'
}
-- 全部包含
local open_antiBlockElems = {'</td%A', '</th%A'}
local close_antiBlockElems = {'<td%A', '<th%A'}
-- 全部包含
local open_others = {'</?tr%A', '</?caption%A', '</?dt%A', '</?dd%A', '</?li%A'}
-- 第一行是 BlockLevelPass.php 原生的,不含 mw:、aside、figure
-- 第二行是根据实际情况添加的
local close_others = {
    '</?center%A', '</?blockquote%A', '</?div%A', '</?hr%A',
    '%[%[File:', '%[%[Image:'
}
local function parseBlockLevel(textLines)
    local lastPrefix = ''
    local output = {}
    DTopen = false
    local inBlockElem = false
    local prefixLength = 0
    local pendingPTag = false
    local inBlockquote = false

    local prefix2 = ''
    
    for _, inputLine in ipairs(textLines) do
        local lastPrefixLength = len(lastPrefix)
        prefixLength = strspn(inputLine, '*#:;')
        local prefix = sub(inputLine, 1, prefixLength)
        prefix2 = gsub(prefix, ';', ':')
        local t = sub(inputLine, prefixLength+1)
        
        if prefixLength ~= 0 and lastPrefix == prefix2 then
            insert(output, nextItem(sub(prefix, -1, -1)))
            pendingPTag = false
            
            --[[
            if sub(prefix, -1, -1) == ';' then
                local term_t2 = {'', ''}
                if findColonNoLinks(t, term_t2) ~= false then
                    t = term_t2[1]
                    insert(output, trim(term_t2[1]) .. nextItem(':'))
                end
            end
            ]]
        elseif prefixLength ~= 0 or lastPrefixLength ~= 0 then
            local commonPrefixLength = getCommon(prefix, lastPrefix)
            pendingPTag = false
            
            while commonPrefixLength < lastPrefixLength do
                insert(output, closeList(sub(lastPrefix, lastPrefixLength, lastPrefixLength)))
                lastPrefixLength = lastPrefixLength - 1
            end
            
            if prefixLength <= commonPrefixLength and commonPrefixLength > 0 then
                insert(output, nextItem(sub(prefix, commonPrefixLength, commonPrefixLength)))
            end
            
            if DTopen and commonPrefixLength > 0 and sub(prefix, commonPrefixLength, commonPrefixLength) == ':' then
                insert(output, nextItem(':'))
            end
            
            if lastPrefix ~= '' and prefixLength > commonPrefixLength then
                insert(output, '\n')
            end
            while prefixLength > commonPrefixLength do
                local char = sub(prefix, commonPrefixLength + 1, commonPrefixLength + 1)
                insert(output, openList(char))
                
                --[[
                if char == ';' then
                    local term_t2 = {'', ''}
                    if findColonNoLinks(t, term_t2) ~= false then
                        t = term_t2[1]
                        insert(output, trim(term_t2[1]) .. nextItem(':'))
                    end
                end
                ]]
                commonPrefixLength = commonPrefixLength + 1
            end
            if not prefixLength ~= 0 and lastPrefix ~= '' then
                insert(output, '\n')
            end
            lastPrefix = prefix2
        end
        
        if prefixLength == 0 then
            -- blockElems & antiBlockElems 的定义见上方
            
            local openMatch = false
            for _, elem in ipairs(open_antiBlockElems) do
                if match(t, elem) then
                    openMatch = true
                    break
                end
            end
            if not openMatch then
                for _, elem in ipairs(open_blockElems) do
                    if match(t, elem) then
                        openMatch = true
                        break
                    end
                end
            end
            if not openMatch then
                for _, elem in ipairs(open_others) do
                    if match(t, elem) then
                        openMatch = true
                        break
                    end
                end
            end
            local closeMatch = false
            for _, elem in ipairs(close_antiBlockElems) do
                if match(t, elem) then
                    closeMatch = true
                    break
                end
            end
            if not closeMatch then
                for _, elem in ipairs(close_blockElems) do
                    if match(t, elem) then
                        closeMatch = true
                        break
                    end
                end
            end
            if not closeMatch then
                for _, elem in ipairs(close_others) do
                    if match(t, elem) then
                        closeMatch = true
                        break
                    end
                end
            end
            
            if openMatch or closeMatch then
                pendingPTag = false
                insert(output, closeParagraph())
                local bqOffset = 1
                local bqMatch = {0, 0}
                while match_with_offset('<(/?)blockquote[%s>]', t, bqMatch, bqOffset) do
                    inBlockquote = not bqMatch[1][2]
                    bqOffset = bqMatch[2] + len(bqMatch[1][1])
                end
                inBlockElem = not closeMatch
            elseif not inBlockElem then
                if trim(t) ~= ''
                    and sub(t, 1, 2) == ' '
                    and not inBlockquote then
                    t = sub(t, 2)
                elseif match(t, '^<style%A[^>]*>.-</style>$')
                    or match(t, '<link%A[^>]*>%s*') then
                    if pendingPTag ~= '' and pendingPTag ~= false then
                        insert(output, closeParagraph())
                        pendingPTag = false
                    end
                else
                    if trim(t) == '' then
                        if pendingPTag ~= '' and pendingPTag ~= false then
                            insert(output, pendingPTag .. '<br />')
                            pendingPTag = false
                            lastParagraph = 'p'
                        elseif lastParagraph ~= 'p' then
                            insert(output, closeParagraph())
                            pendingPTag = '<p>'
                        else
                            pendingPTag = '</p><p>'
                        end
                    elseif pendingPTag ~= '' and pendingPTag ~= false then
                        insert(output, pendingPTag)
                        pendingPTag = false
                        lastParagraph = 'p'
                    elseif lastParagraph ~= 'p' then
                        insert(output, closeParagraph() .. '<p>')
                        lastParagraph = 'p'
                    end
                end
            end
        end
        if pendingPTag == false then
            if prefixLength == 0 then
                insert(output, t)
                if hasOpenParagraph() then
                    insert(output, '\n')
                end
            else
                insert(output, trim(t))
            end
        end
    end
    while prefixLength > 0 do
        insert(output, closeList(sub(prefix2, prefixLength, prefixLength)))
        prefixLength = prefixLength - 1
        if prefixLength ~= 0 and hasOpenParagraph() then
            insert(output, '\n')
        end
    end
    insert(output, closeParagraph(true))

    return output
end


-- 改自 Poem.php
local function parsePoems(text)
    return gsub(text,
        '%[poem(.-)%](.-)%[/poem%]',
        function(attr, content)
            local poemClass = ''
            content = gsub(content, '^\n*(.-)\n*$', '%1')
            attr = gsub(attr, '(.-)class *="(.-)" *(.-)',
                function(prefix, class, suffix)
                    poemClass = ' ' .. class
                    return prefix .. suffix
                end
            )
            return '<div class="poem' .. poemClass .. '" ' .. trim(attr) .. '>' .. gsub(content, '\n', '<br/>') .. '</div>'
        end
    )
end


-- 检查传入的 wikitext
local unsupportedTags = {
    'categorytree', 'choose', 'dynamicpagelist', 'gallery', 'poll'
}
local function sanitize(text)
    -- 检查是否含有不支持的解析器扩展标签
    for _, tag in ipairs(unsupportedTags) do
        if find(text, '\'"`UNIQ%-%-' .. tag .. '%-%w%w%w%w%w%w%w%w%-QINU`"\'') ~= nil then
            return true, '<b class="error">[[模块:Flatten]]错误:由于技术原因,暂不支持<code><' .. tag .. '></code>标签。</b>'
        end
    end
    -- 检查是否含有 <poem>
    if find(text, '\'"`UNIQ%-%-poem%-%w%w%w%w%w%w%w%w%-QINU`"\'') ~= nil then
        return true, [=[<div class="error">
'''[[模块:Flatten]]错误:由于技术原因,暂不支持'''<code><poem></code>'''标签。'''<br/>
<ul><li>
不过,您可以使用<code>[poem]</code>来实现类似的效果。例如:
<pre class="prettyprint linenums lang-wiki">
[poem style="color:red;"]
第一行文字
第二行文字
[/poem]
</pre>
</li>
<li>须注意<code>[poem]</code>标签只实现了Poem扩展的部分功能,因此显示效果可能与原版本有差异。</li>
</ul>
</div>]=]
    end
    return false, text
end


-- 模块本体
local p = {}

function p.main(frame)
    local text = frame.args[1] or ''
    text = mw.text.unstripNoWiki(text)
    text = mw.text.decode(text)
    text = frame:preprocess(text)

    local hasFatalError
    hasFatalError, text = sanitize(text)
    if hasFatalError then
        return text
    end

    text = parsePoems(text)
    local lines = explode('\n', text)
    lines = parseTables(lines)
    lines = parseBlockLevel(lines)
    text = gsub(concat(lines), '\n', '')
    return text
end

return p