# Text normalizer, anyone?

When I write text documents (using LaTeX or Docbook), I like to wrap lines, as it makes them easier to edit (less things moving on the screen), and allow to have easy-to-read diffs.

However, I always hesitate before rewrapping paragraphs (using vim’s gqap): this mean that I will add noise to my git history. So I only do that from time to time, making “rewrapping-only commits”. But that sucks, since in the meantime, I sometimes make a lot of changes, and my lines grow long again. Of course, I could rewrap my paragraphs before each commit, but if I simply add a word to a paragraph, it might cause all the lines to be rewrapped.

So I what I would need is some kind of “text normalizer” that will:

• split lines at The Right Place. After ‘.’, ‘,’, ‘:’, ‘;’, etc. So rewrapping won’t propagate changes too far away.
• understand the basics of LaTeX, so it won’t rewrap
\begin{tabular}{|l|l|}\hline
x & y \\
1 & 2 \\\hline
\end{tabular}

or

\begin{figure}
\centerline{\includegraphics{fig}}
\caption{Cool stuff}
\label{coolstuff}
\end{figure}

(vim does rewrap those examples.)

• be editor-agnostic. So other committers could use it as well.
• support for other document formats (docbook XML) would be nice too.

I’ve looked at plasTeX: I could use it to parse a LaTeX document, and export it as LaTeX. But then it would be a LaTeX-only solution. Does anyone have a better solution?

## 9 thoughts on “Text normalizer, anyone?”

1. I have roughly the same problem, colleagues mostly use TeXshop and just use the default soft wrapping, ending up with long lines or random line breaks.

I personally try to start new lines for each sentence, but then the text is not as nicely wrapped in the editor. Maybe just replace period-space-space by period-linebreak before commits, and the reverse at updates ? Is it possible to tell svn to use period-space-space as an additional end-of-line marker ?

2. I start every sentence on a new line and use a fill-sentence macro from Luca de Alfaro. (For AucTeX, replace fill-region-as-paragraph with LaTeX-fill-region-as-paragraph.)

3. Matthew W. S. Bell says:

Why not just turn on vim’s wrap mode? This only affects the display of the file.

4. Matthew: the preblem with softwrap is that it makes really long lines so it makes conflicts more probable and reviewing differences difficult (eg fixing a typo in a paragraph when the whole paragraph is one hard line)

5. > However, I always hesitate before rewrapping paragraphs (using vim’s
> gqap): this mean that I will add noise to my git history.

I don’t think that wrapping lines adds noise to git’s history if you
re-wrap lines while changing a file.

In other words, suppose you edit file a.tex and in the process also
make some whitespace changes. Then if you do a commit using
git commit a.tex -m “Some changes to a.tex”
Now you want to see the changes you made but ignore whitespace
changes. So you run
git diff -b
The -b switch asks that white space changes are ignored.

This approach is only problematic if you want to generate patches
which are to be applied outside git.

6. Lucas says:

Kapil: git diff -b doesn’t solve my problem.

If I have a line:
a c d e f g h i j k l m
I edit the line to add a “bbbbbbbbbb”, but that causes the line to go past the 80-char limit. so, when rewrapping:
a bbbbbbbbbbbbbb c d e
f g h i j k l m
I haven’t checked, but git diff -b won’t help here (if git diff -b behaves like diff -b).

7. That was my mistake. I thought this option made “diff” behave like “wdiff”.

I have used “wdiff” in the past when my collaborator sent me a
para-reformatted TeX file.

Maybe one should create something like “git-wdiff”.

8. While I think you are slightly overoptimizing, using git you _can_ add a filter to convert between checkin and checkout.. However perhaps git diff –color-words can help your reviewing eyes! (That’s one out of few git features made only for actual plain text processing!)

9. ulrik: heh, now I really wish I could push people to use git at work :)