GitLab Wiki MarkDown Support iframe

Abstarct: This article introduce how to configure the GitLab and let the GitLab Wiki MarkDown support the iframe html element.

GitLab is a great software to deploy git repository and manage issues. Meanwhile, the wiki system is also awesome.

The wiki of GitLab support Markdown, AsciiDoc and RDoc, which nearly include all the syntax and presentation for a document even for formal thesis and paper.

Currently, most of the developers use the Markdown as the main document language, the GitLab not only support the standard MarkDown but also support other features. This markdown is called GitLab Flavour Markdown (GFM).

According to the official document, we can find that it supports the Inline HTML which means that we can add any HTML source code to render the Wiki. This feature can enhance the ability of representation of our document.

However, there are some limitations when using the HTML. Thinking about the security, there is a whitelist HTML syntax that can be rendered. If you use the HTML tag which is not in the whitelist, the DOM will be ignored.

This whitelist is based on the Sanitize which is Whitelist-based Ruby HTML and CSS sanitizer. The detailed document is HTML::Pipeline’s SanitizationFilter class.

Now the problem is that, is it possible to add some other HTML into the GitLab wiki (on-premise not cloud)? The answer is yes! We can customise the SanitizationFilter by ourselves.

Sanitize official example

There is an example in the official document which adds the iframe with only support YouTube video.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
youtube_transformer = lambda do |env|
node = env[:node]
node_name = env[:node_name]

# Don't continue if this node is already whitelisted or is not an element.
return if env[:is_whitelisted] || !node.element?

# Don't continue unless the node is an iframe.
return unless node_name == 'iframe'

# Verify that the video URL is actually a valid YouTube video URL.
return unless node['src'] =~ %r|\A(?:https?:)?//(?:www\.)?youtube(?:-nocookie)?\.com/|

# We're now certain that this is a YouTube embed, but we still need to run
# it through a special Sanitize step to ensure that no unwanted elements or
# attributes that don't belong in a YouTube embed can sneak in.
Sanitize.node!(node, {
:elements => %w[iframe],

:attributes => {
'iframe' => %w[allowfullscreen frameborder height src width]
}
})

# Now that we're sure that this is a valid YouTube embed and that there are
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
# to whitelist the current node.
{:node_whitelist => [node]}
end

GitLab relative sanitization_filter file

The first and important thing to let GitLab wiki support the iframe is finding the corresponding configuration: sanitization_filter.rb

By using the Linux shell to search sudo find / -name "sanitization_filter.rb" we can find there are two files with this name.

By the way, the GitLab version is 11.1.4-33.

1
2
/opt/gitlab/embedded/service/gitlab-rails/lib/banzai/filter/sanitization_filter.rb
/opt/gitlab/embedded/lib/ruby/gems/2.4.0/gems/html-pipeline-2.8.3/lib/html/pipeline/sanitization_filter.rb

The second one is what we need to customise which is under the html-pipeline embedded lib.

sanitization_filter update

The original file is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
HTML::Pipeline.require_dependency('sanitize', 'SanitizationFilter')

module HTML
class Pipeline
# HTML filter with sanization routines and whitelists. This module defines
# what HTML is allowed in user provided content and fixes up issues with
# unbalanced tags and whatnot.
#
# See the Sanitize docs for more information on the underlying library:
#
# https://github.com/rgrove/sanitize/#readme
#
# Context options:
# :whitelist - The sanitizer whitelist configuration to use. This
# can be one of the options constants defined in this
# class or a custom sanitize options hash.
# :anchor_schemes - The URL schemes to allow in <a href> attributes. The
# default set is provided in the ANCHOR_SCHEMES
# constant in this class. If passed, this overrides any
# schemes specified in the whitelist configuration.
#
# This filter does not write additional information to the context.
class SanitizationFilter < Filter
LISTS = Set.new(%w[ul ol].freeze)
LIST_ITEM = 'li'.freeze

# List of table child elements. These must be contained by a <table> element
# or they are not allowed through. Otherwise they can be used to break out
# of places we're using tables to contain formatted user content (like pull
# request review comments).
TABLE_ITEMS = Set.new(%w[tr td th].freeze)
TABLE = 'table'.freeze
TABLE_SECTIONS = Set.new(%w[thead tbody tfoot].freeze)

# These schemes are the only ones allowed in <a href> attributes by default.
ANCHOR_SCHEMES = ['http', 'https', 'mailto', :relative, 'github-windows', 'github-mac'].freeze

# The main sanitization whitelist. Only these elements and attributes are
# allowed through by default.
WHITELIST = {
elements: %w[
h1 h2 h3 h4 h5 h6 h7 h8 br b i strong em a pre code img tt
div ins del sup sub p ol ul table thead tbody tfoot blockquote
dl dt dd kbd q samp var hr ruby rt rp li tr td th s strike summary
details caption figure figcaption
],
remove_contents: ['script'],
attributes: {
'a' => ['href'],
'img' => %w[src longdesc],
'div' => %w[itemscope itemtype],
'blockquote' => ['cite'],
'del' => ['cite'],
'ins' => ['cite'],
'q' => ['cite'],
all: %w[abbr accept accept-charset
accesskey action align alt
aria-describedby aria-hidden aria-label aria-labelledby
axis border cellpadding cellspacing char
charoff charset checked
clear cols colspan color
compact coords datetime dir
disabled enctype for frame
headers height hreflang
hspace ismap label lang
maxlength media method
multiple name nohref noshade
nowrap open prompt readonly rel rev
rows rowspan rules scope
selected shape size span
start summary tabindex target
title type usemap valign value
vspace width itemprop]
},
protocols: {
'a' => { 'href' => ANCHOR_SCHEMES },
'blockquote' => { 'cite' => ['http', 'https', :relative] },
'del' => { 'cite' => ['http', 'https', :relative] },
'ins' => { 'cite' => ['http', 'https', :relative] },
'q' => { 'cite' => ['http', 'https', :relative] },
'img' => {
'src' => ['http', 'https', :relative],
'longdesc' => ['http', 'https', :relative]
}
},
transformers: [
# Top-level <li> elements are removed because they can break out of
# containing markup.
lambda { |env|
name = env[:node_name]
node = env[:node]
if name == LIST_ITEM && node.ancestors.none? { |n| LISTS.include?(n.name) }
node.replace(node.children)
end
},

# Table child elements that are not contained by a <table> are removed.
lambda { |env|
name = env[:node_name]
node = env[:node]
if (TABLE_SECTIONS.include?(name) || TABLE_ITEMS.include?(name)) && node.ancestors.none? { |n| n.name == TABLE }
node.replace(node.children)
end
}
]
}.freeze

# A more limited sanitization whitelist. This includes all attributes,
# protocols, and transformers from WHITELIST but with a more locked down
# set of allowed elements.
LIMITED = WHITELIST.merge(
elements: %w[b i strong em a pre code img ins del sup sub p ol ul li]
)

# Strip all HTML tags from the document.
FULL = { elements: [] }.freeze

# Sanitize markup using the Sanitize library.
def call
Sanitize.clean_node!(doc, whitelist)
end

# The whitelist to use when sanitizing. This can be passed in the context
# hash to the filter but defaults to WHITELIST constant value above.
def whitelist
whitelist = context[:whitelist] || WHITELIST
anchor_schemes = context[:anchor_schemes]
return whitelist unless anchor_schemes
whitelist = whitelist.dup
whitelist[:protocols] = (whitelist[:protocols] || {}).dup
whitelist[:protocols]['a'] = (whitelist[:protocols]['a'] || {}).merge('href' => anchor_schemes)
whitelist
end
end
end
end

So what we need to do is to add new lambda in the transformers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# YouTube Special  
lambda { |env|
node = env[:node]
node_name = env[:node_name]

# Don't continue if this node is already whitelisted or is not an element.
return if env[:is_whitelisted] || !node.element?

# Don't continue unless the node is an iframe.
return unless node_name == 'iframe'

# Verify that the video URL is actually a valid YouTube video URL.
return unless node['src'] =~ %r|\A(?:https?:)?//(?:www\.)?youtube(?:-nocookie)?\.com/|

# We're now certain that this is a YouTube embed, but we still need to run
# it through a special Sanitize step to ensure that no unwanted elements or
# attributes that don't belong in a YouTube embed can sneak in.
Sanitize.node!(node, {
:elements => %w[iframe],

:attributes => {
'iframe' => %w[allowfullscreen frameborder height src width]
}
})
# Now that we're sure that this is a valid YouTube embed and that there are
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
# to whitelist the current node.
{:node_whitelist => [node]}
},

Then the final configure looks like as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
HTML::Pipeline.require_dependency('sanitize', 'SanitizationFilter')

module HTML
class Pipeline
# HTML filter with sanization routines and whitelists. This module defines
# what HTML is allowed in user provided content and fixes up issues with
# unbalanced tags and whatnot.
#
# See the Sanitize docs for more information on the underlying library:
#
# https://github.com/rgrove/sanitize/#readme
#
# Context options:
# :whitelist - The sanitizer whitelist configuration to use. This
# can be one of the options constants defined in this
# class or a custom sanitize options hash.
# :anchor_schemes - The URL schemes to allow in <a href> attributes. The
# default set is provided in the ANCHOR_SCHEMES
# constant in this class. If passed, this overrides any
# schemes specified in the whitelist configuration.
#
# This filter does not write additional information to the context.
class SanitizationFilter < Filter
LISTS = Set.new(%w[ul ol].freeze)
LIST_ITEM = 'li'.freeze

# List of table child elements. These must be contained by a <table> element
# or they are not allowed through. Otherwise they can be used to break out
# of places we're using tables to contain formatted user content (like pull
# request review comments).
TABLE_ITEMS = Set.new(%w[tr td th].freeze)
TABLE = 'table'.freeze
TABLE_SECTIONS = Set.new(%w[thead tbody tfoot].freeze)

# These schemes are the only ones allowed in <a href> attributes by default.
ANCHOR_SCHEMES = ['http', 'https', 'mailto', :relative, 'github-windows', 'github-mac'].freeze

# The main sanitization whitelist. Only these elements and attributes are
# allowed through by default.
WHITELIST = {
elements: %w[
h1 h2 h3 h4 h5 h6 h7 h8 br b i strong em a pre code img tt
div ins del sup sub p ol ul table thead tbody tfoot blockquote
dl dt dd kbd q samp var hr ruby rt rp li tr td th s strike summary
details caption figure figcaption
],
remove_contents: ['script'],
attributes: {
'a' => ['href'],
'img' => %w[src longdesc],
'div' => %w[itemscope itemtype],
'blockquote' => ['cite'],
'del' => ['cite'],
'ins' => ['cite'],
'q' => ['cite'],
all: %w[abbr accept accept-charset
accesskey action align alt
aria-describedby aria-hidden aria-label aria-labelledby
axis border cellpadding cellspacing char
charoff charset checked
clear cols colspan color
compact coords datetime dir
disabled enctype for frame
headers height hreflang
hspace ismap label lang
maxlength media method
multiple name nohref noshade
nowrap open prompt readonly rel rev
rows rowspan rules scope
selected shape size span
start summary tabindex target
title type usemap valign value
vspace width itemprop]
},
protocols: {
'a' => { 'href' => ANCHOR_SCHEMES },
'blockquote' => { 'cite' => ['http', 'https', :relative] },
'del' => { 'cite' => ['http', 'https', :relative] },
'ins' => { 'cite' => ['http', 'https', :relative] },
'q' => { 'cite' => ['http', 'https', :relative] },
'img' => {
'src' => ['http', 'https', :relative],
'longdesc' => ['http', 'https', :relative]
}
},
transformers: [
# Top-level <li> elements are removed because they can break out of
# containing markup.
lambda { |env|
name = env[:node_name]
node = env[:node]
if name == LIST_ITEM && node.ancestors.none? { |n| LISTS.include?(n.name) }
node.replace(node.children)
end
},

# Table child elements that are not contained by a <table> are removed.
lambda { |env|
name = env[:node_name]
node = env[:node]
if (TABLE_SECTIONS.include?(name) || TABLE_ITEMS.include?(name)) && node.ancestors.none? { |n| n.name == TABLE }
node.replace(node.children)
end
},
# YouTube Special
lambda { |env|
node = env[:node]
node_name = env[:node_name]

# Don't continue if this node is already whitelisted or is not an element.
return if env[:is_whitelisted] || !node.element?

# Don't continue unless the node is an iframe.
return unless node_name == 'iframe'

# Verify that the video URL is actually a valid YouTube video URL.
return unless node['src'] =~ %r|\A(?:https?:)?//(?:www\.)?youtube(?:-nocookie)?\.com/|

# We're now certain that this is a YouTube embed, but we still need to run
# it through a special Sanitize step to ensure that no unwanted elements or
# attributes that don't belong in a YouTube embed can sneak in.
Sanitize.node!(node, {
:elements => %w[iframe],

:attributes => {
'iframe' => %w[allowfullscreen frameborder height src width]
}
})
# Now that we're sure that this is a valid YouTube embed and that there are
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
# to whitelist the current node.
{:node_whitelist => [node]}
},
# Google Drive Shared file Special
lambda { |env|
node = env[:node]
node_name = env[:node_name]

# Don't continue if this node is already whitelisted or is not an element.
return if env[:is_whitelisted] || !node.element?

# Don't continue unless the node is an iframe.
return unless node_name == 'iframe'

# Verify that the URL is actually a valid Google Drive document URL.
return unless node['src'] =~ %r|\A(?:https?:)?//(?:drive\.)?google(?:-nocookie)?\.com/|

# We're now certain that this is a YouTube embed, but we still need to run
# it through a special Sanitize step to ensure that no unwanted elements or
# attributes that don't belong in a YouTube embed can sneak in.
Sanitize.node!(node, {
:elements => %w[iframe],

:attributes => {
'iframe' => %w[allowfullscreen frameborder height src width]
}
})
# Now that we're sure that this is a valid YouTube embed and that there are
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
# to whitelist the current node.
{:node_whitelist => [node]}
}
]
}.freeze

# A more limited sanitization whitelist. This includes all attributes,
# protocols, and transformers from WHITELIST but with a more locked down
# set of allowed elements.
LIMITED = WHITELIST.merge(
elements: %w[b i strong em a pre code img ins del sup sub p ol ul li]
)

# Strip all HTML tags from the document.
FULL = { elements: [] }.freeze

# Sanitize markup using the Sanitize library.
def call
Sanitize.clean_node!(doc, whitelist)
end

# The whitelist to use when sanitizing. This can be passed in the context
# hash to the filter but defaults to WHITELIST constant value above.
def whitelist
whitelist = context[:whitelist] || WHITELIST
anchor_schemes = context[:anchor_schemes]
return whitelist unless anchor_schemes
whitelist = whitelist.dup
whitelist[:protocols] = (whitelist[:protocols] || {}).dup
whitelist[:protocols]['a'] = (whitelist[:protocols]['a'] || {}).merge('href' => anchor_schemes)
whitelist
end
end
end
end

Apply the configuration

In order to apply the change, we need to update the configuration:

1
sudo gitlab-ctl restart

Other configuration

if you want to add more HTML support, you can reference the official document to update the sanitization_fileter.rb file