how to fix square boxes in pdf? [closed]

while going through my pdf for regular expressions, and in many places i see that some characters are replaced by square boxes which is some ASCII code Is there any way i can fix this? i have checked this link

http://www.tableausoftware.com/support/knowledge-base/square-boxes http://acrobatusers.com/tutorials/text-matching-regular-expressions 

enter image description here

and others but did not find any solution. aatched is how the square boxes look.

187k 36 36 gold badges 297 297 silver badges 345 345 bronze badges asked Sep 21, 2011 at 15:39 krisdigitx krisdigitx 7,114 20 20 gold badges 65 65 silver badges 97 97 bronze badges

btw. I have never heard the term "closure" for those special characters * and + , they are normally called "quantifier".

Commented Sep 21, 2011 at 19:04 What's the input like, and how do you generate PDF from it? Commented Sep 21, 2011 at 20:04 i did not generate the pdf, i am looking for any way, i could edit this pdf and fix the square boxes Commented Sep 23, 2011 at 9:27 @krisdigitx: which PDF viewer do you use here? Is it the same if you use Acrobat Reader? Commented Dec 5, 2011 at 19:44

4 Answers 4

As stema said, this has nothing to do with regular expressions.

Neither is it about some "pdf escape sequences", as PDF uses binary safe text encodings.

These square blocks are usually shown in place of some characters that doesn't have a representation in the chosen font. Often, it happens that the typesetting software replaces some quotes or other characters with a 'nicer' Unicode alternative; but the font doesn't have those characters.

You could try to copy/paste the text from the PDF into some other document and replace the font, or even use some PDF editing tools (enfocus PitStop is one of the most popular; it's cheap but not free) to replace the font with another more complete.

1 1 1 silver badge answered Sep 21, 2011 at 18:54 62k 9 9 gold badges 80 80 silver badges 126 126 bronze badges

I am really not proficient in the pdf creation process, the reason I came up with this idea is this answer and the fact that what he wants to be printed in his document is normally only a backslash followed by a character, I can't imagine a font that have no representation of a backslash, or am I wrong?

Commented Sep 21, 2011 at 19:09

At first, i thought it might be a 'typographical quote', as some word processors automatically replace ASCII quotes with those; but now i think it's a space, mostly because of the last paragraph. Maybe it's something one of the extra 30 different spaces in Unicode, or something that represents 'type a space here', like §, Ø, or þ

Commented Sep 22, 2011 at 4:50 maybe the OP tell us what the problem was, anyway +1 to you, I think you are right. Commented Sep 22, 2011 at 5:51

These square blocks actually are an 'officially' required glyph in every PostScript font. It's name is the .notdef glyph, and as Javier says, the PDF reader is requied to render this glyph at every place where the 'real' character's glyph cannot be found in the embedded font.

Commented Dec 5, 2011 at 19:41

At first, this has nothing to do with regex, except that the document you are writing is about regular expressions.

I assume, the sequence that is replaced by a square is \s , isn't it?

I think the problem here is that some regular expression shortcuts are interpreted as escape sequences in the pdf creation process and therefor not printed literally.

You don't write how you create your pdf, but I would assume that will be OK when you escape the backslashes, when you want to print them literally.

So when you want to see a \s in the pdf, type \\s in your source format. (If you have somewhere a escaped backslash you want to print like \\ then write \\\\ ).

answered Sep 21, 2011 at 18:33 92.4k 20 20 gold badges 109 109 silver badges 135 135 bronze badges

Javier's answer is nearly complete. But let me add this:

You'll have a small chance to get Acrobat Reader display the square boxes using a "substitute" font by toggling a certain setting in its application preferences.

IIRC, the setting is called 'Use local fonts'. You can usually find it in the Page display section of the preferences settings, but over the different releases Adobe kept adding, removing or re-locating different settings.

Background info: If you have NOT enabled Use local fonts, then you require the Reader to only use the PDF-embedded fonts for displaying all text. In case the font is embedded, but misses some required glyphs, enabling said setting may find the required font on your system to render the text, or the Reader may use its built-in Multiple Master fonts which will try to fake the look of the original glyph, more or less.