Re: converting PDF and retaining paragraph format
but it appears that your procedure deletes all hard returns. That is not what is wanted. Paragraph divisions are still wanted. The problem is that every soft return is made into a hard return. The point is not to delete all hard returns. The point is to have soft returns except where paragraphs begin.
If you convert any document from a word processor or html format to a txt format, all soft returns will be converted into hard returns. There is no soft return in the txt character set, if that is the correct term. But if you convert the file to a word processor format and the same thing occurs, that is, all soft returns are converted to hard returns, then the subject should be discussed further. If a word processor format retains soft returns, that should solve the problem.
----- Original Message -----
One might also want to delete right parentheses, apostrophes, and some other punctuation marks just preceding hard returns, such as: )^zzz with )^n; ‘zzz with ‘^n; !zzz with !^n; ?zzz with ?^n, (in step 5 per the scheme below).
This is what I did, and it seemed to work:
1. Converted the pdf to text.
2. Select All in the text ffile, copy to clipboard.
3. Open a blank document in Microsoft Word.
3A: Paste the clipboard contents into the Word document.
4. Now we’re going to do Finding and Replacing, and the Special Characters dialogue, More... button. Find: ^n . Replace with: zzz . REPLACE ALL Button, and ENTER.
5. Find: .zzz . Replace With: .n . Replace All.
5A: (If necessary, Find and Replace All . zzz with . ^n ....
6. Now, Find and Replace ALL zzz with nothing, leaving the replacement field empty.
7. You may require more finding and replacing, use judiciously.
HTH, and Good Luck!
I like to convert PDF documents to other formats, usually text but could be any of several others. When I use the conversion routine built into Adobe Reader, all the line breaks are converted into hard line breaks, making it hard to find the actual paragraph boundaries. Does anybody have a solution to this?