.\" .\" The following Copyright notice, license and disclaimer .\" applies to all files of this presentation except the images .\" and is not repeated in each individual file. .\" .\" Copyright (c) 2016 Ingo Schwarze .\" .\" Permission to use, copy, modify, and distribute this presentation for any .\" purpose with or without fee is hereby granted, provided that the above .\" copyright notice and this permission notice appear in all copies. .\" .\" THE PRESENTATION IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL .\" WARRANTIES WITH REGARD TO THIS PRESENTATION INCLUDING ALL IMPLIED .\" WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE .\" AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL .\" DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA .\" OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER .\" TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR .\" PERFORMANCE OF THIS PRESENTATION. .\" .\" -------------------------------------------------------------------- .\" .\" These slides use the mm and gpresent groff macros. .\" For example, on OpenBSD, install these ports: .\" groff, gpresent, ghostscript. .\" Then run: .\" groff -P-p7.74i,11.5i -mm -mpresent \ .\" eurobsdcon2016-utf8.roff > eurobsdcon2016-utf8.ps .\" ps2pdf eurobsdcon2016-utf8.ps .\" .\" -------------------------------------------------------------------- .\" .\" --- global mm configuration settings ------------------------------- .nr Pi 3 .MARGIN 0i .\" --- global gpresent configuration settings ------------------------- .DEFCOLOR Kea1 0 0.8 0.48 .DEFCOLOR Kea2 0 0.5 0.3 .TITLECOLOR Kea1 .SUBTITLEFORMAT C .SUBTITLECOLOR Kea2 .FOOTERSIZE 2 .\" We don't want a header line for the title page, .\" so we have to start it before setting up headers. .SUBTITLE "Why and how you ought to" .TITLE "Keep multibyte character support simple" .\" === gpresent header setup ========================================== .\" --- define gpresent extension registers ---------------------------- .nr gpe_page_tot 1 .nr gpe_page_sec 0 .af gpe_page_sec I .nr gpe_time_tsec 14*60+3*60 .nr gpe_time_hour 14 .nr gpe_time_min 03 .af gpe_time_min 02 .nr gpe_time_sec 0 .af gpe_time_sec 02 . .\" --- macro to start a new section ----------------------------------- .de GPE_SECTION .ds gpe_title_sec \\$1 .nr gpe_page_sec 0 .. .\" --- macro to prepare a new page ------------------------------------ .de GPE_NEXT .ds gpe_next \\$1 .SK .. .\" --- gpresent page header callback ---------------------------------- .de HEADER .nr gpe_page_tot +1 .nr gpe_page_sec +1 .sp 0.5v .ds gpe_middle page \\n[gpe_page_tot]: \\*[gpe_title_sec] \\n[gpe_page_sec] .tl 'Ingo Schwarze: Keep multibyte character support simple'\ \h'9m'\\*[gpe_middle]'\ Beograd, September 25, 2016' .sp -0.5v .\" horizontal line below the page header \l'\\n(.lu'\h'-\\n(.lu' .br .. .\" --- initialize the first section before completing the title page -- .GPE_SECTION INTRO .\" === define some gpresent extension macros ========================== .\" --- two-column mode (for images) ----------------------------------- .\" 1st arg: width of first column .\" 2nd arg: move second column up by this amout (default 0.5v) .\" switch column with normal .MULN, end with normal .MULE .de GPE_MULB .nr gpe_colwr \\n(.l-\\$1-1n .ie \\n[.$]>1 .ds gpe_vsp \\$2 .el .ds gpe_vsp 0.5v .sp -\\*[gpe_vsp] .MULB \\$1 1n \\n[gpe_colwr]u .sp \\*[gpe_vsp] .. .\" --- emphasis ------------------------------------------------------- .\" arg: text .de GPE_EM .COLOR red \\$1 .COLOR P .. .\" --- small text ----------------------------------------------------- .\" arg: text .de GPE_SM .S -4 .ce \\$1 .S P .. .\" --- small text with one emphasised word ---------------------------- .\" 1st arg: text before emphasis .\" 2nd arg: emphasised text .\" 3rd arg: text after emphasis .de GPE_SMEM .GPE_SM "\\$1 \m[red]\\$2\m[] \\$3" .. .\" --- title page ----------------------------------------------------- .\" The main title line has already been printed. .SUBTITLE "EuroBSDCon, Beograd, September 25, 2016" .SUBTITLE "Ingo Schwarze " .MULB 6i 0i 4i .PSPIC Images/CanyonCampground.eps .GPE_SM "Canyon Campground below Lower Kananaskis Lake" .GPE_SM "and the Opal Range (2900-3000m)" .MULN .PSPIC Images/KananaskisWelcome.eps .GPE_SM "Alberta Highway 66 near" .GPE_SM "Bragg Creek, Elbow Valley" .MULE .\" === gpresent footer setup ========================================== .\" We dont want a footer line for the title page, .\" so we have to set it up after completing the title page. .SK .\" --- macros to start a new page ------------------------------------- .\" arg: time for this page in seconds .de GPE_TIME .nr gpe_time_tsec +\\$1 .nr gpe_time_hour \\n[gpe_time_tsec]/3600 .nr gpe_time_min \\n[gpe_time_tsec]%3600/60 .nr gpe_time_sec \\n[gpe_time_tsec]%60 .. .\" --- gpresent page footer callback ---------------------------------- .de FOOTER .ps 18 .vs 20 .sp -2v \l'\\n(.lu'\h'-\\n(.lu' .br .tl '\s-6\\n[gpe_time_hour]:\\n[gpe_time_min]:\s-2\\n[gpe_time_sec]\s+8''\ \\m[Kea2]\\*[gpe_next]\ \ \(->\\m[]' .ps .vs .. .\" The INTRO section was already started in header.roff. .TITLE "Topic of this talk" .SUBTITLE "Multibyte character support in the base system" .BL .LI Multibyte character support in basic infrastructure .br like ksh(1), xterm(1), OpenSSH, man(1), libedit... .LI LC_CTYPE support in BSD and POSIX utility programs, .br in particular the simplest ones like ls(1), ps(1), cut(1), wc(1) .LI POSIX multibyte and wide character functions in the C library .LI Multibyte character support in the kernel terminal driver? .LE .PSPIC Images/NestorBuller16.eps .GPE_SM "Mount Nestor (2975m), the southern pillar of the Goat Range" .GPE_TIME 45 .GPE_NEXT "What won't be covered?" .TITLE "Out of scope for this talk" .GPE_MULB 7i .BL .LI Internationalization in general .br That's a vast field and could only be covered in an overview talk. .br This talk explores specific technical details \(-> limited scope. .LI Not about general locale support \(em only about LC_CTYPE. .LI Not about character encoding conversions. .br To convert files from one encoding to another, .br simply install the GNU iconv package. .LE .MULN .PSPIC Images/LowerKananaskisLake.eps .GPE_SM "Lower Kananaskis Lake (1680m)" .MULE .BL .LI Not about typesetting. .br That cannot be done with C library utilities, and even Unicode is utterly inadequate to to express any non-trivial arrangement of glyphs, for example for non-trivial mathematical formulae or anything similar. Typesetting requires specialized software like TeX or groff, and in such contexts, character set handling is one of the points to be considered, but a relatively minor one, and in any case, for such software, it is completely irrelevant whether or not the base system offers any kind of multibyte character support or not. .LE .sp .S -4 It is not my intention to dismiss any real-world tasks as irrelevant, in particular not the handling of legacy non-UTF-8 multibyte encodings, which is a legitimate concern; i am merely studying what can reasonably be done with C library support in the base system without compromizing other goals like security, reliability, and usability, and what may better be left to specialized add-on software. .br .S P .GPE_TIME 90 .GPE_NEXT "Can all be done?" .TITLE "The myth of feasibility" .BL .LI I suspect that many people think that as long as you implement carefully and use all standard facilities and interfaces as designed, complete multibyte character support can be done. .LI At least i thought so before i set out on the quest i'm talking about. .LI But it turned out that is not true. If you build support for arbitrary character set locales into the base system, there are several aspects with respect to which making things secure, reliable, and usable becomes outright impossible. .LI As a first step, i will show some of these unsolvable issues. .LE .PSPIC Images/CalgaryCenterStreet.eps .GPE_SM "A thunderstorm approaching Calgary, Center Street Bridge" .GPE_TIME 45 .GPE_NEXT "Table of contents" \& .GPE_MULB 5i .TITLE "Table of contents" .BL .LI Examples of problems .br unsolvable with arbitrary encodings .LI Benefits of supporting UTF-8 only .LI Implementation techniques .BL .LI isu8cont() for simplicity .LI mblen(3) for validation .LI mbtowc(3) for property inspection .LI utf8.c for modularization .LI fgetwc(3) for unusually complex cases .LI Techniques to avoid, if possible .LE .LI Examples of bugs in libraries and tools .LI Conclusions and outlook .LE .MULN .PSPIC Images/KananaskisMap.eps .GPE_SM "Kananaskis Country, Alberta, Canada" .MULE .GPE_TIME 90 .GPE_SECTION "UNSOLVABLE PROBLEMS" .GPE_NEXT "Examples of problems" .TITLE "An extreme example of breakage in the standard: write(1)" .sp -1v .SUBTITLE "POSIX requires:" .B1 \(lqThe following environment variable shall affect the execution of write: LC_CTYPE: Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments and input files). If the recipient's locale does not use an LC_CTYPE equivalent to the sender's, the results are undefined.\(rq .B2 .SUBTITLE "When the locales agree, POSIX requires:" .B1 \(lqTyping characters from LC_CTYPE classifications \(oqprint\(cq or \(oqspace\(cq shall cause those characters to be sent to the recipient's terminal.\(rq .B2 .GPE_MULB 7i .P Now as a matter of fact, for the sending program, there is no way to find out the recipient's locale. By its basic design, the locale is part of the environment of each program, and it is essential for system security that without elevated privileges, no program can inspect the environment of other user's programs. .P So to satisfy the requirement for the case of matching locales, the standard effectively requires the write(1) program to unconditionally \m[red]write all printable characters using the sender's locale\m[], no matter what the recipient's locale may be. .MULN .PSPIC Images/MountWarspite.eps .GPE_SM "Mount Warspite (2850m)" .GPE_SM "across the Kananaskis Lakes" .MULE .GPE_TIME 120 .GPE_NEXT "What can we do?" .TITLE "write(1) implementation cannot be fixed" .SUBTITLE "Locales mismatch \(-> print garbage" .BL .LI Even if the senders are well-intentioned and only send byte sequences they consider as printable characters in their own locale, .LI the recipient's locale might interpret some of them as terminal control sequences .LI may screw up the recipient's terminal state .LI may display wrong and misleading information .LI may even put the terminal into a state where it interpretes user input in a way different from what the recipient wants and reasonably expects. .LE .GPE_MULB 6i .SUBTITLE "Impossible to reduce functionality to make it safe" For each and every byte sequence, there can be a locale in which it might represent a potentially dangerous control sequence, so \m[red]no byte or byte sequence at all is safe\m[] to print to a terminal if you do not know the encoding. .MULN .PSPIC Images/MistLineham16.eps .GPE_SM "Mist Mountain (3138m) from Lineham Creek" .MULE .GPE_TIME 60 .GPE_NEXT "Any way out?" .MULB 7i 0i 3i .TITLE "write(1) standard cannot be fixed" .SUBTITLE "Standard effectively requires utterly unsafe behaviour" On a system providing arbitrary locales, there is no way how the standard could be improved, short of completely deleting the entire write(1) program. .SUBTITLE "Who uses write(1) anyway?" .BL .LI People moved on to WhatsApp, didn't they? .LI But wait: wall(1) has the same problem. .LI And \m[red]shutdown(8)\m[] uses wall(1)! .LI Worst time to screw up people's terminals: .LI Right before shutdown when they hurry to save their work... .LE .sp That teaches us that even if many users use traditional low-level tools much less nowadays, they may still be more relevant in subtle ways than one might naively think. .MULN .PSPIC Images/CowboyTrail.eps .GPE_SM "Highway 22 near Bragg Creek," .GPE_SM "Elbow River Valley" .MULE .GPE_TIME 50 .GPE_NEXT "Are important tools affected?" .TITLE "Tough problems in basic tools: ssh(1)" .SUBTITLE "An ssh(1) connection involves two locales" .GPE_MULB 8i .AL .LI The locale set in the original shell on the client machine (client locale) .br Determines what can safely be displayed and how it must be encoded. .br It is already defined before the ssh(1) client program is even started. .LI The locale set in the remote shell on the server machine (server locale) .br Only this can influence what may get printed on the client .br to the terminal in which ssh(1) is run. .LE .MULN .PSPIC Images/GoatSouth.eps .GPE_SM "Goat Range (2700-2800m)" .GPE_SM "from Goat Pond" .MULE .sp -0.5v .SUBTITLE "How is the server locale selected?" Lots of competing mechanisms for setting environment variables: .BL .LI Operating system defaults when starting new processes .LI Variables set or unset by sshd(8) on the server when forking the login shell .LI SSH initialization files, for example ~/.ssh/environment .LI System wide and user specific shell initialization files, and so on .LI \&... .LE None of that host of possibilities depends on the locale used on the client side, or can even inspect the locale on the client side. .GPE_TIME 80 .GPE_NEXT "Any way out?" .TITLE "ssh(1) cannot be fixed either" .BL .LI So we end up with a problem similar to the write(1) case: .LI If the client side is using an arbitrary locale, the server cannot safely send any string in any encoding, not even plain US-ASCII. .LI OpenSSH provides no way for the client to communicate the required locale to the server. .LI And even if it could, there is no guarantee that that particular locale is available on the server. .LI And even if there were a locale of the same name on the server, there is no guarantee that it is compatible with the client locale, because neither locale names nor the semantic of any locale except C and POSIX is standardized by POSIX. .LE .SUBTITLE "Generic problem for any kind of inter-process communication" .sp -0.5v .MULB 3i 0.5i 3i 0.5i 3i .PSPIC Images/GoatPond1.eps .MULN .PSPIC Images/GoatPond2.eps .MULN .PSPIC Images/GoatPond3.eps .MULE .GPE_SM "At the Goat Pond dam (1670m), Spray Lakes area" .GPE_TIME 50 .GPE_NEXT "Any way out?" .TITLE "Partial mitigations for the ssh(1) problem" .GPE_MULB 7i .BL .LI If you happen to know the default locale of the remote account you want to connect to and the same locale happens to be available on your client system, you can start a terminal using that locale on the client system before typing the ssh(1) command, and you are safe. But that's a special case. .LI Besides, let me ask a question to the audience: Who has done that at least once in the past? Considering what the server locale was going to be, and start a matching terminal before typing the ssh command? .LI Note that opening the connection first, then setting LC_CTYPE in the remote shell to whatever you need locally is not safe - the remote system may already print to your local terminal before you ever get to the shell prompt: .BL .LI A banner even before authentication, .LI the motd(5), .LI and then the shell prompt itself... .LE .LI All that might already screw up your terminal, and in the worst case cause your terminal to misinterpret the input you type. .LE .MULN .PSPIC Images/ChapmanBridge.eps .GPE_SM "Chapman Bridge (ca. 1600m)" .GPE_SM "Elbow River Campground" .MULE .GPE_TIME 90 .GPE_SECTION "THE OPENBSD WAY" .GPE_NEXT "What can we do?" .TITLE "The OpenBSD way" .GPE_MULB 5i .SUBTITLE "We made a drastic decision!" The OpenBSD base system supports .br exactly two LC_CTYPE locales: .AL .LI UTF-8 .LI C = POSIX = US-ASCII .LE .P We don't even support ISO-LATIN-1 .br any longer in the base system. .sp 2v .SUBTITLE "Isn't that seriously inconvenient?" .BL .LI Usability is not as bad as it may seem at first. .LE .MULN .PSPIC Images/OldGoat.eps .GPE_SM "Old Goat Mountain (3109m), Spray Lakes Reservoir" .MULE .BL .LI If you get text in different encodings, it is very easy to install conversion tools from ports .br and simply convert the data once before using it. .LI Besides, Unicode and UTF-8 support all languages. .LI So even without relying on ports, the base system is still able to support all languages. .LE .GPE_TIME 60 .GPE_NEXT "How does that help write(1)?" .GPE_MULB 7i 0i .TITLE "Benefits for write(1)" .MULN .PSPIC Images/WhitemansPond.eps .GPE_SM "Looking across Whiteman's Pond" .GPE_SM "to the Goat Range (ca. 2700m)" .MULE .sp -8v .AL .LI ASCII printable characters always safe to print on OpenBSD .br (UTF-8 ASCII-compatible: both encode ASCII the same way) .br That allows partial write(1) functionality: .br Allow passing ASCII only no matter what the two locales are. .LI It allows filtering out ASCII control bytes (C0 characters). .br Important because the escape character is dangerous. .br Possible because no UTF-8 sequence contains such a byte. .LI If the sender's terminal is set to UTF-8 and non-ASCII characters are actually typed, they can safely be filtered out (just in case the receiver's terminal is set to US-ASCII, which we cannot know). That's possible because UTF-8 is stateless, that is, codepoints can safely be deleted from the stream and it still remains a valid stream of characters (which would not be true for arbitrary locales). .LI If invalid bytes not forming UTF-8 occur in the input stream, they can safely be filtered out, allowing to recover from encoding errors. That's possible because after an encoding error, UTF-8 allows to find the beginning of the next character by simply looking for the next byte not having the most significant bit set or having the two most significant bits set. Consequently, the sending terminal can never become unusable, which might well happen when allowing arbitrary encodings. .LE .GPE_TIME 90 .GPE_NEXT "Are there any downsides?" .TITLE "Prices to pay in write(1)" .SUBTITLE "OpenBSD write(1) now violates POSIX" Yet it does implement the maximal safe and useful and reasonable part of POSIX: .br All printable ASCII characters, space characters, and the BEL are sent as typed. .BL .LI Locales are ignored. .LI UTF-8 continuation bytes are silently ignored. .LI ASCII control characters and UTF-8 start bytes are replaced with question marks. .LI Consequently, if the sender uses UTF-8 or control characters, the recipient sees .br that something got lost and can ask the sender about the missing parts. .LE .SUBTITLE "The implementation is extremely simple and robust" It doesn't even need or , neither setlocale(3) nor getwchar(3) nor mbtowc(3) nor anything like that. It gets away with elementary single-byte character handling functions like fgets(3), isprint(3), and putchar(3), which makes code review and maintenance a lot easier. .MULB 6i 0.2i 3.8i .PSPIC -R Images/SpraySouth.eps .MULN .S -4 .sp 3v Spray Mountains (2700-2900m) .br seen from the Kananaskis Lakes .S P .MULE .GPE_TIME 80 .ig usr.bin/write : write.1 write.c 5 Feb 2016 12:00:40 -0700 (MST) usr.bin/wall : wall.c wall.1 8 May 2016 10:19:36 -0600 (MDT) .. .GPE_NEXT "What about ssh(1)?" .TITLE "Best practice for ssh(1)" .BL .LI It is obvious that, if the server runs OpenBSD, which only supports the C/POSIX and UTF-8 locales, connecting to it becomes safe if you follow the simple rule to always use a UTF-8 enabled terminal to run ssh(1). .LI Of course, that does not yet secure connections FROM OpenBSD systems: Connecting from OpenBSD to other operating systems is still dangerous because they might send text in arbitrary locales. .LI So, the best practice i recommend is to: .AL .LI On all your servers that you may ever want to SSH into, no matter on which operating system, make sure the \m[red]default system locale and all login locales\m[] are set to either C/POSIX or to UTF-8. .LI Only ever run ssh(1) from \m[red]UTF-8\m[] enabled terminals. .LE .LE .sp 2v .GPE_MULB 7i 3.5v While much of the stuff discussed here is subtle, .br this is one simple pair of rules .br i recommend that you remember. .MULN .PSPIC Images/ElpocaFlowers.eps .GPE_SM "Elpoca Mountain (3029m)" .GPE_SM "from the Smith-Dorrien Trail" .MULE .GPE_TIME 120 .GPE_NEXT "What about xterm(1)?" .GPE_MULB 7i 0i .TITLE "Benefits for xterm(1)" .SUBTITLE "Upstream xterm(1) runs in ASCII mode by default" .BL .LI Traditionally, that default was also used on OpenBSD. .LI Bad idea: Some common UTF-8 characters are interpreted as control codes. For example, a stray German \(oq\(ss\(cq will lock up your ASCII xterm(1). .LE .MULN .PSPIC Images/MountainSheep.eps .GPE_SM "Kananaskis Trail at Pocaterra Creek" .MULE .sp -1v .SUBTITLE "OpenBSD 6.0 runs xterm(1) in UTF-8 mode by default" If you use a C/POSIX locale, even if you don't intend to ever use UTF-8, .br that's OK because a UTF-8 terminal handles ASCII output just fine. .P In addition to that, the UTF-8 enabled terminal is obviously more resilient to UTF-8 accidentally sneaking in, in particular, but not only, for the case of running ssh(1) as explained above. Actually, even when fed garbage or unsupported encodings, a UTF-8 xterm(1) is more robust than an ASCII xterm(1) because the UTF-8 xterm(1) honours *fewer* terminal escape codes than the ASCII xterm(1). That may seem surprising at first because Unicode defines *more* control characters than ASCII does. But as explained on http://invisible-island.net/xterm/ctlseqs/ctlseqs.html xterm(1) never treats decoded multibyte characters as terminal control codes, so the ISO 6429 C1 control codes do not take effect in UTF-8 mode; but they do take effect in ASCII mode, even though they fall outside the scope of ASCII. .GPE_TIME 90 .ig app/xterm : XTerm.ad 8 Mar 2016 10:26:30 -0700 (MST) .. .GPE_NEXT "Any downsides?" .TITLE "Caveats for xterm(1)" Do not use this non-standard default setting on any other system except OpenBSD. .GPE_MULB 6i .sp .BL .LI It only works because OpenBSD deliberately does not support any locales except UTF-8 and C/POSIX/ASCII. .LI Terrible things will happen if you force the default to UTF-8 in this way on a system where people can opt into arbitrary locales that differ from UTF-8. .LI On other operating systems except OpenBSD, there is no way in hell to make the interaction of locales with terminal controls truly safe. .LE .MULN .PSPIC Images/FairholmeRange.eps .GPE_SM "Fairholme Range from Whiteman's Pond" .MULE .sp The main goal of having UTF-8 xterms by default on OpenBSD is improving robustness. .br But it also improves usability. If you usually run your shells inside xterm(1) in C/POSIX mode, there should be few visible changes for you. .br But if you ever stumble upon a directory containing UTF-8 filenames, you can simply say .VERBON 7 16 $ LC_CTYPE=en_US.UTF-8 ls .VERBOFF which would have given you garbage output in the past, and which just works in OpenBSD 6.0. .GPE_TIME 80 .GPE_NEXT "Another example?" .TITLE "Benefits for pod2man(1) manuals" .BL .LI Many perl manuals contain UTF-8. .LI So do several ports manuals using perlpod(1) format. .\" 36 on my system right now .LI A few ports manuals contain ISO-LATIN-1: latex2man(1), a2ping(1), ... .br OpenBSD man(1), which is the mandoc implementation, silently converts that to UTF-8. .LI So we enabled UTF-8 by default for pod2man(1) in OpenBSD, .br improving output for both UTF-8 and C/POSIX/ASCII users. .LI Problem unsolvable on any system trying to support arbitrary locales, .br because man(1) must not print UTF-8 for users using a different locale. .LE .PSPIC Images/YellowBelliedMarmot.eps .GPE_SM "Yellow Bellied Marmot near Lower Kananaskis Lake" .GPE_TIME 90 .ig In pod2man(1), enable UTF-8 output by default and provide a --no-utf8 command line option to disable it. The new default improves the formatting of Perl manuals using UTF-8 characters (for example perlunicook(1)) with man(1) and mandoc(1) no matter which locale the user has set. Issue discovered by and fix OK by afresh1@. Trying to push this change upstream would make no sense. It's the right thing to do only because we decided to not support any other locales except ASCII and UTF-8. A system trying to provide arbitrary locales simply cannot handle manuals containing UTF-8 characters at build time, so the change would produce wrong results. gnu/usr.bin/perl/cpan/podlators/scripts: pod2man.PL 19 Apr 2016 02:06:52 -0600 (MDT) .. .GPE_SECTION "IMPLEMENTATION TECHNIQUES" .GPE_NEXT "What about implementations?" .sp -0.5v .TITLE "Overview of implementation techniques" .sp -1v .MULB 11c 0c 10.6c \" total width is 21.6c .SUBTITLE "for small base system utilities" .VERBON 1 12 command deci parse ins sani eval iddo cvitgw spw inau cwsp rev lc-- c----- --- ---- ---- ksh .--- c----- --- ---- ---- tty(4) ---- c----- --- ---- -c-- write ---- c----- --- -?p? ---- ypldap ---- c----- --- -cc? c--- cut -fd l-ld -v---- --- ---- --s- cut -cn l-lc --i--- --- ---- c--- uniq -s l-lc --i--- --- ---- c--- uniq -f l-tf ---t-- s-- ---- ---- wc -m le-c ---t-- s-- ---- c--- colrm l-t- ---t-- -pw ---- -w-- fold lctb c--t-- -pw ---- -w-- column l-t- ---tG- spw ---- -wS- fmt l-t- ---t-- spw ?ppp -w-- ls l-t- ---t-w -pw ??pp -w-- rs le-- ---t-w -pw ??cc -w-- ps l-t- ---t-w -pw vrpp -w-- ssh l-t- ---t-w -pw vvcc -w-- ul l-g- ----g- -pw sspp cw-p man l--l p----w --w ??pp -w-p .VERBOFF .SUBTITLE "utility functions" .VERBON 1 10 ls int mbsprint(const char *mbs, int print) rs int mbsavis(char** outp, const char *mbs) ps int mbswprint(const char *, maxw, trail) ssh int vasnmprintf(c **, size_t, int *, fmt, va) man int preconv_encode(...) .VERBOFF .MULN .sp 0.5v .nr VerbinSave \n[Verbin] .nr Verbin 12c .VERBON 16 9 decision making: i - initialization l - setlocale(3) called . - setlocale(3) called but essentially unused d - decision c - MB_CUR_MAX inspected in isu8cont() e - MB_CUR_MAX inspected before calling multibyte functions l - implicit in mblen(3) t - implicit in mbtowc(3) g - implicit in fgetwc(3) o - options deciding whether multibyte functions are used at all d - option to specify delimiter (may be UTF-8) b - option to count bytes (alternative is to count characters) c - option to count characters (alternative is to count bytes) f - option to count fields l - option to call setlocale(LC_CTYPE, "") parsing: c - \m[red]direct inspection with isu8cont()\m[] p - dedicated UTF-8 parser v - validate multibyte character with mblen(3) i - iterate multibyte characters with mblen(3) t - \m[red]iterate multibyte characters with mbtowc(3)\m[] g - get wide characters with fgetwc(3) G - get command line argument with mbstowcs(3) w - wrapper to isolate UTF-8 handling from the main code inspection: s - check for whitespace with iswspace(3) / iswblank(3) p - check printability with wcwidth(3) w - \m[red]get display width with wcwidth(3)\m[] sanitation: classes: i - invalid byte n - non-printable character a - printable ASCII character u - printable Unicode character actions: s - skip ? - replace with question mark v - encode with vis(3) r - replace with Unicode replacement character c - copy character p - print character evaluation: c - count characters w - count display width for columnation and/or tabulation s - split strings with strstr(3) S - split strings with wcschr(3) p - print with putwchar(3) .VERBOFF .nr Verbin \n[VerbinSave] .MULE .GPE_TIME 210 .GPE_NEXT "Why this table?" .SUBTITLE "Why am i showing this table?" .S -6 .BL .LI Each line provides information about one program; some programs have very different modes, so some have two lines. .LI Each column represents one particular implementation technique. .LI My expectation, when showing this table, is not that anybody might understand the whole table during this talk. That is impossible, it encodes a huge amount of information in an extremely terse format. But there are several reasons why i do show the table anyway. .LI First reason: It shows that the number of utilities we have to deal with is surprisingly small. You might expect that almost everything might need multibyte character handling. But this table only lists about three handfuls of utilities. Admittedly, we didn't fix all utilities yet, but most are done by now. .LI Second reason: The table shows that the number of implementation techniques is surprisingly large. You might expect that there might basically be just one technique: Read a stream, assemble wide characters, process the characters, and write out the result. But it turns out that would be a bad approach almost everywhere, and different utilities have very different needs. .LI Third reason: The table shows that so far, we did not not find a single pair of utilities that could be handled in exactly the same way. No line in the table agrees with any other line. Well, with one exception: cut -c and uniq -s can be implemented using exactly identical techniques - but the main technique used in that case appears literally nowhere else, and both utilities need quite different techniques when called with other options. So, basically, everything is different and nothing can be done schematically. .LI Fourth reason, and i'll come back to that after explaining some of the techniques: Those techniques that people would probably have expected to be ubiquitous barely appear at all. For example, fgetwc(3) and fputwc(3) appear in one out of 16 cases, and fgetws(3), fputws(3), *wprintf(3), *wscanf(3), mbrtowc(3), wcrtomb(3), and wmem*(3) don't occur at all. .LI At this point, some of you will probably wonder whether i screwed up the title of my talk. "Why and how you ought to keep multibyte character support simple" - but now the guy is saying there is a large number of different techniques and everything differs from everything else? That doesn't sound simple at all! .LI The point is: While the number of techniques is indeed not all that small, every single technique is very elementary, so the code of every individual utility does remain very short and simple. Much simpler, in fact, than it would become if you tried to apply one and the same general-purpose coding scheme everywhere. .LE .S P .GPE_TIME 5 .GPE_NEXT "The simplest approach" .TITLE "Technique 1: pure isu8cont()" .SUBTITLE "rev(1) - reverse characters in each line" .VL 6m .LI requirements: no need for character properties, only need to know where chars start .br hence, no need for character decoding .br not even any need for character validation .LI solution: isu8cont() one-liner by Ted Unangst : .br \m[red]Does this byte continue a character?\m[] .nr VerbinSave \n[Verbin] .nr Verbin 6m .VERBON 23 14 int isu8cont(unsigned char c) { return MB_CUR_MAX > 1 && (c & (0x80 | 0x40)) == 0x80; } .VERBOFF .nr Verbin \n[VerbinSave] setlocale(3) to set up MB_CUR_MAX .LI "algorithm:" read lines into a char[] buffer, skip newlines .br loop over characters by looping over bytes and skipping on isu8cont() .br then just copy the multibyte sequences without ever decoding them .LE .MULB 4.5i 0i 2.5i 0i 3i .sp This technique is very robust and never fails. .br Of course, .br it only works for UTF-8 only systems; .br no similar technique is possible .br for arbitrary encodings. .MULN .PSPIC Images/GoatCreek.eps .MULN .S -4 .sp 5v Smith-Dorrien Trail (1650m) .br with East End of Rundle (2550m) .S P .MULE .GPE_TIME 80 .ig martijn@ rev.c 10 Apr 2016 11:06:52 -0600 (MDT) OK schwarze@ commit msg: Enable UTF-8 support in rev. Some minor cleanups while here. .. .GPE_NEXT "What's the alternative?" .TITLE "Technique 1: pure isu8cont()" .SUBTITLE "The alternative: rev(1) on FreeBSD and NetBSD" .GPE_MULB 6i .BL .LI setlocale(3) used in the same way .br loop over lines with fgetwln(3) .br store wide character strings .br then print wide characters in reverse order .LI very fragile, \m[red]dies on the first encoding error\m[] .LI even worse: all known implementations of fgetwln(3) were buggy and return success on encoding errors, effectively converting all encoding errors into newline characters \(-> silently gave wrong results .LE .MULN .PSPIC Images/MountSparrowhawk.eps .GPE_SM "Dust on the Smith-Dorrien Trail" .GPE_SM "with Mount Sparrowhawk (3121m)" .MULE .SUBTITLE "Summary regarding isu8cont()" .BL .LI pure isu8cont() is the right move when you need \m[red]no validation and no character properties\m[] .LI another example of pure isu8cont() is write(1) mentioned above .LE .GPE_TIME 70 .ig martijn@ rev.c 10 Apr 2016 11:06:52 -0600 (MDT) OK schwarze@ commit msg: Enable UTF-8 support in rev. Some minor cleanups while here. .. .GPE_NEXT "Another application?" .TITLE "Technique 1: pure isu8cont(1) for ksh(1)" .SUBTITLE "It can also be used as a quick and dirty stopgap\ for complex programs" .SUBTITLE "for example OpenBSD ksh(1) emacs input mode" .GPE_MULB 5i .BL .LI Make sure that moving left and right .br can only move by whole characters, .br not into the middle of a character. .LI Make sure that deleting characters .br can only delete characters whole, .br not individual bytes out of characters. .LI Improve all functions involving words .br by allowing non-ASCII characters .br to be part of words. .LI Allow insertion of non-ASCII characters .br without screwing up the display, .br by backing up to the start byte .br after inserting a continuation byte, .br and starting to re-print there. .LE .MULN .PSPIC Images/ElbowFalls.eps .GPE_SM "Elbow Falls" .MULE .GPE_TIME 60 .ig Based on parts of a patch by Frederic Nowak , tweaked by me. OK tedu@ semarie@ mpi@ bin/ksh : emacs.c 10 Dec 2015 03:00:14 -0700 (MST) .. .ig 3. Fix forward movement which i didn't get quite right in my previous commit: Always advance to a start byte, never to a final continuation byte, or the next insertion would split the character in the middle. OK mpi@ bin/ksh : emacs.c 8 Jan 2016 06:17:57 -0700 (MST) .. .GPE_NEXT "On a yet lower level?" .TITLE "Technique 1: pure isu8cont(1) in the kernel" .SUBTITLE "A quick and dirty stopgap for the tty driver" .GPE_MULB 6.5i .BL .LI For OXTABS and sysctl(3) KERN_TTY_INFO, .br sys/kern/tty.c wants to calculate display widths. .LI We don't want wcwidth(1) tables in the kernel, .br so double- and zero-width characters will remain wrong. .LI But let's at least handle the common case .br of single-width multibyte characters correctly: .LE .sp .VERBON 1 14 int ttyoutput(int c, struct tty *tp) { int col = 0; /* lots of code deleted */ switch (CCLASS(c)) { case ORDINARY: if (!isu8cont(c)) ++col; break; } } .VERBOFF .MULN .PSPIC Images/BowTowerUp.eps .GPE_SM "Calgary, Bow Tower" .MULE .GPE_TIME 60 .GPE_NEXT "Another technique?" .TITLE "Technique 2: pure mblen(3)" .SUBTITLE "cut -d \(em select delimited fields out of lines" .GPE_MULB 5.5i .VL 6m .LI "requirements:" no need for character properties, .br \m[red]but need validation\m[] .LI "solution:" .LE .VERBON 7 14 case 'd': dlen = mblen(optarg, MB_CUR_MAX); if (dlen == -1) usage(); memcpy(dchar, optarg, dlen); dchar[dlen] = '\0'; .VERBOFF .MULN .PSPIC Images/TsuuTina.eps .GPE_SM "Tsuu T'ina land near Calgary" .MULE .SUBTITLE "Alternatives" .VL 6m .LI FreeBSD: uses mbrtowc(3), which is also possible .LI NetBSD: no multibyte support .LE .GPE_TIME 90 .ig schwarze@ cut.c 1 Dec 2015 17:56:46 -0700 (MST) OK tedu@ czarkoff@ zhuk@ commit msg: UTF-8 support: Implement -c and -n and let -d accept a multibyte delimiter character. While here, simplify the code by switching from fgetln(3) to getline(3) and from hand-crafted string parsing to strstr(3) and strchr(3). .. .GPE_NEXT "A less simple example" .TITLE "Technique 3: iteration with pure mblen(3)" .SUBTITLE "cut(1) -c \(em select by character count" Again no need for character properties, but need validation: .VERBON 7 14 while(*cp != '\0') { len = mblen(cp, MB_CUR_MAX); if (len == -1) /* Handle encoding error, at least set len. */; /* Do something with the character. */ cp += len; } .VERBOFF Decision in OpenBSD: treat each invalid byte as one character and keep going. .GPE_MULB 4i .sp 2v .SUBTITLE "Alternative" FreeBSD and NetBSD: .P Use getwc(3) .br and error out of the file .br on the first encoding error. .MULN .PSPIC Images/LinehamRidge.eps .GPE_SM "Lineham Ridge (2700m) from the Highwood Valley (1880m)" .MULE .GPE_TIME 90 .ig OpenBSD: same for uniq(1) -s \(em ignore a certain number of initial characters .. .GPE_NEXT "Alternatives?" .TITLE "cut(1) in FreeBSD" .BL .LI FreeBSD: uses fgetln(3) and mbrlen(3)/mbrtowc(3). .LI That is slightly inconsistent. Even though POSIX 2016-TC2 now requires that L'\en' be encoded as 0x000a in wchar_t, that doesn't imply that an arbitrary locale must encode it as the single byte 0x0a in a multibyte char * string, or as any single byte at all. .LI In any case, an encoding error causes the rest of the file to be lost. .LI mbstate is never reset, encoding errors in earlier files may compromise decoding of later files. .LE .SUBTITLE "Looks like a pack of well-fed dragons..." .GPE_MULB 7i .BL .LI This teaches that it's very easy to write code that looks perfectly general on first sight, but turns out to actually be full of subtle issues on closer inspection. .LI As OpenBSD prefers correctness, security, and usability over featurism, we believe that it's advantageous to sacrifice full generality up front and allowing a simpler, more powerful, and less fragile implementation for UTF-8 and ASCII only. .LE .MULN .PSPIC Images/AlbertaHighway532.eps .GPE_SM "Johnson Trail (Hwy. 532)" .MULE .GPE_TIME 140 .GPE_NEXT "What about character properties?" .TITLE "Technique 4: iteration with mbtowc(3)" .sp -1.5v .GPE_MULB 7i 0i .SUBTITLE "This one is most often needed in practice." .VERBON .COLOR red char *mbs; /* Multibyte string (input). */ int len; /* Encoded length in bytes. */ .sp 0.5v wchar_t wc; /* Wide character (decoded). */ .COLOR P .COLOR blue int width; /* Display width in terminal columns. */ .sp 0.5v .COLOR P .COLOR red for (mbs = INPUT; *mbs != '\e0'; mbs += len) { .COLOR P HANDLE_SPECIFIC_BYTES(*mbs); /* Optional. */ .COLOR red len = mbtowc(&wc, mbs, MB_CUR_MAX); if (len == -1) { /* Encoding error, reset state: */ mbtowc(NULL, NULL, MB_CUR_MAX); .COLOR P .VERBOFF .MULN .PSPIC Images/SprayNorth.eps .GPE_SM "Many summits: Spray Mountains" .GPE_SM "from the Smith-Dorrien Trail" .MULE .VERBON .sp -1v .COLOR red /* After handling an invalid byte, retry with the next one. */ len = 1; .COLOR P HANDLE_INVALID_BYTE(*mbs); /* Optional. */ wc = L'?'; /* e.g. fmt, ls, rs, uniq */ wc = L' '; /* e.g. wc */ width = 1; /* e.g. column, colrm, fmt, ls, rs */ width = -1; /* e.g. ssh */ .COLOR blue } else { width = wcwidth(wc); if (width == -1) { .COLOR P HANDLE_NONPRINTABLE_CHARACTER(wc); /* Optional. */ width = 1; /* Usually. */ .COLOR blue } .COLOR P .COLOR red } .COLOR P HANDLE_CHARACTER(wc, width); .COLOR red } .COLOR P .VERBOFF .GPE_TIME 120 .ig - ls Support UTF-8: use wcwidth(3) for column adjustment and replace non-printable Unicode codepoints and invalid bytes with ASCII question marks. No change for the SMALL version. Using ideas developed by tedu@, phessler@, bentley@ and feedback from many OK yasuoka@ czarkoff@ sthen@. bin/ls : Makefile extern.h ls.1 ls.c print.c util.c bin/ls : utf8.c 1 Dec 2015 11:36:13 -0700 (MST) Fix a regression (and POSIX violation) introduced with UTF-8 support: When neither running on a terminal nor with -q, names must be passed through as they are, nothing must be replaced with question marks. Effectively, -q was always in effect. SMALL was not affected. bin/ls : utf8.c 18 Jan 2016 12:06:37 -0700 (MST) - rs UTF-8 support: In a UTF-8 locale, properly align columns in the presence of zero-width and double-width characters and replace non-printable codepoints and invalid bytes with ASCII question marks. No change in the C/POSIX locale. As a side effect, get rid of all pointer to pointer variables and simplify some of the code. Partially based on ideas from tedu@. Feedback and OK czarkoff@, OK tedu@. usr.bin/rs : Makefile rs.c usr.bin/rs : utf8.c 3 Dec 2015 05:23:15 -0700 (MST) - wc UTF-8 support: implement -m for character counting and use iswspace(3) for word counting. Requires using getline(3) rather than read(2) to make sure that characters aren't chopped to pieces. Using feedback from millert@ on an earlier version. Feedback and OK tedu@. usr.bin/wc : wc.1 wc.c 7 Dec 2015 18:00:45 -0700 (MST) - fmt UTF-8 support; does not yet handle the -c option. No longer expand tabs up front in get_line(), their width depends on the width of characters earlier on the line. Always NUL-terminate the input buffer for easier and safer handling. Get rid of the hand-rolled output buffer, just let stdio do its work. OK tedu@ usr.bin/fmt : fmt.1 fmt.c 15 Dec 2015 09:26:17 -0700 (MST) UTF-8 support for fmt -c. This implies two small changes in behaviour: 1. Let fmt -c replace invalid bytes with ASCII question marks just like when called without -c. 2. On lines to be centered, replace each tab with a single blank, simply because there is no useful way to define the meaning of a tab on such a line. Having the width of a tab depend on what is to the right of it would be completely crazy (and complicate the code a lot), and otherwise, tabs on adjacent lines of different length wouldn't align anyway. OK millert@ usr.bin/fmt : fmt.c 7 Jan 2016 11:02:43 -0700 (MST) - uniq UTF-8 support: Let -f recognize non-ASCII blank characters and let -s count characters rather than bytes. OK zhuk@ bentley@ usr.bin/uniq : uniq.1 uniq.c 19 Dec 2015 03:21:01 -0700 (MST) - ps UTF-8 support: In a UTF-8 locale, columnate correctly and replace valid, but non- printable characters with the Unicode replacement character U+FFFD. No change in the C/POSIX locale, and no change for invalid bytes. Grand total, the code becomes shorter by almost 30 lines. Feedback from czarkoff@, OK millert@. bin/ps : Makefile extern.h print.c ps.c bin/ps : utf8.c 10 Jan 2016 07:04:16 -0700 (MST) - colrm UTF-8 support: Cut by display columns rather than by character positions because the latter would be useless in the presence of combining zero-width characters. Similarly, let tab advance to display columns rather than to character positions. For compatibility with nroff and man(1) output, let backspace back up one character rather than on display column. But for compatibility with POSIX fold(1), *if* two backspaces follow a double-width character, ignore the second one. Fix some bugs while here: Delete backspaces that immediately follow deleted characters. Expand tabs intersecting deletions, such that part of the blanks can be removed. Expand tabs following deletions, or they would no longer align with adjacent lines without tabs. OK jmc@ on a previous version of the manual. No opposition when shown on tech@. usr.bin/colrm : colrm.1 colrm.c 18 Jan 2016 13:31:36 -0700 (MST) - fold UTF-8 support. Using feedback about bugs in earlier versions from Matthew Martin and from tsg@ who tested it with afl(1). OK czarkoff@ tsg@ usr.bin/fold : fold.1 fold.c 23 May 2016 04:31:42 -0600 (MDT) .. .GPE_NEXT "Modularize?" .TITLE "Technique 5: utf8.c utility files" .BL .LI Advantage: isolate all multibyte = UTF-8 handling in one file .LI Avoid encumbering the main code .LI Not always possible, sometimes not even desirable, in particular if the main code doesn't do much except character handling in the first place, like in cut(1) or fmt(1) .LI Typical tasks: \m[red]parsing, validation, sanitation, output\m[] .LI Typically uses technique 4, iteration with mbtowc(3) .LI Sanition concerns invalid byte sequences and non-printable characters .LI Sanitation options are passthrough, skip, replace with question marks or UTF-8 replacement characters, or vis(3) .LE .GPE_MULB 5i .BL .LI Examples: ls(1), ps(1), rs(1), OpenSSH .LI All have subtly different requirements, in particular regarding sanitation, width measurement, width limitation, and output disposition .LI Hence, it was not yet possible to design a set of standard functions, but we still hope more experience might allow to do so in the future. .LE .MULN .PSPIC Images/ThreeSistersDam.eps .GPE_SM "Looking from the Three Sisters Dam (1710m)" .GPE_SM "to the northern Goat Range (2730m)" .MULE .GPE_TIME 90 .GPE_NEXT "POSIX standard functions?" .TITLE "Technique 6: iteration with fgetwc(3)" .sp -1v .GPE_MULB 7i .BL .LI Only program so far too complex for these techniques: ul(1) .LI I implemented and tested a version using techniques 4 and 5, .br iteration with mbtowc(3) and wcwidth(3) in utf8.c, .br but it wasn't simple at all, so it was never committed. .LI The reason why it wasn't simple: It does all kinds .br of string manipulation, almost like an editor: .br splitting, joining; deleting, inserting, transforming characters .LI What i did commit was a version doing the full .br char * to wchar_t * to char * double conversion. .LI In part inspired by the FreeBSD version which is in turn .br based on Bruno Haible's work in util-linux, .br but not sharing any UTF-8 code with either version. .LE .MULN .PSPIC Images/StormPass.eps .GPE_SM "Storm Mountain (3095m)" .GPE_SM "from Highwood Pass (2206m)" .MULE .BL .LE .sp -0.5v .SUBTITLE "Examples of problems with the FreeBSD version of ul(1)" .BL .LI Errors out on the first encoding error \(em can't be helped when surporting arbitrary encodings. .LI Backspace backs up one column position, but should backup one character. .LI Always treats _\eb_ as underlined, never as bold. .LI Fails to move the rest of the buffer right when _ later gets overlaid with a double-width char. .LE .GPE_TIME 90 .ig As a bonus reimplement overstrike() and iattr() almost from scratch, getting rid of useless malloc(3)ed local buffers. Add lots of missing information to the manual. No opposition when shown on tech@. usr.bin/ul : ul.1 ul.c 18 Jan 2016 10:34:26 -0700 (MST) .. .GPE_NEXT "What is not recommended?" .TITLE "Techniques to avoid" .BL .LI *r*() functions like mb\m[red]\s+8R\s-8\m[]towc(3) .br Don't use them unless you really need multithreading. .br They are considerably harder to use correctly than mbtowc(3). .LI fgetws(3) or fgetwln(3) .br Don't use them unless you must support arbitrary locales. .br getline(3) allows much better error handling and isn't harder to use. .br .S -4 (And even for arbitrary locales, read(2) + mbtowc(3) is an option.) .S P .LI *towcs() functions like mbstowcs(3) .br Iterating with mbtowc(3) allows much better error handling .br and needs only marginally more code (typically a dozen lines of code). .LE .GPE_MULB 6i .sp 0.5v Some people recommend .br to always use the most complicated functions .br because they work in all circumstances. .SUBTITLE "I don't." Where simpler functions suffice, they are easier to use and cause less bugs. And the simpler functions themselves may be less buggy, too. .MULN .PSPIC Images/CrowchildTrail.eps .GPE_SM "Biking the Crowchild Trail, Calgary" .MULE .GPE_TIME 120 .GPE_SECTION "BUGS' PARADISE" .GPE_NEXT "What about library quality?" .TITLE "Library quality" .BL .LI In general, the BSD C libraries are of good quality: .br solid code that has been scoured for bugs for decades. .LI Multibyte and wide character code is no longer exactly young, .br but younger than much other code, and much more buggy. .LI In the following, i'm showing various examples from OpenBSD. .br Other BSD implementations sometimes differ in detail, .br but my impression is that quality is similar in all three systems. .LI All bugs found by chance, no complete audit yet! .LE .PSPIC Images/KingCreek.eps .GPE_SM "Going from the Kananaskis Lakes toward Highwoood Pass" .GPE_TIME 50 .GPE_NEXT "What about the most basic functions?" .TITLE "Examples of bugs in conversion functions" .BL .LI mbtowc(3) neglected to set errno(2) to EILSEQ .br when given an incomplete character (fixed Feb 2016) .LI mbrtowc(3) accepted some invalid UTF-8 sequences .br and silently produced invalid code points above U+10FFFF (fixed Sep 2015) .LI wcrtomb(3) accepted code points above U+10FFFF .br and silently produced invalid multibyte sequences (fixed Sep 2015) .LI wcrtomb(3) accepted UTF-16 surrogates in UTF-8 mode .br and silently produced invalid multibyte sequences (fixed Oct 2015) .LE .PSPIC Images/GlasgowElbow.eps .GPE_SM "Little Elbow River (ca. 1600m) washed out by the 2013 flood,\ with Mount Glasgow (2935m)" .GPE_TIME 60 .ig - mbrtowc(3) - accepted > U+10FFFF src/lib/libc/citrus/citrus_utf8.c#rev1.9 2015/09/05 15:22:04; author: semarie OK stsp - If an incomplete character is passed to mbtowc(3), set errno to EILSEQ. lib/libc/locale: mbtowc.c 27 Feb 2016 07:02:13 -0700 (MST) - wcrtomb(3) - accepted > U+10FFFF src/lib/libc/citrus/citrus_utf8.c#rev1.11 2015/09/26 14:22:40; author: semarie OK bentley stsp - accepted surrogates src/lib/libc/citrus/citrus_utf8.c#rev1.14 2015/10/13 02:17:46; author: bentley OK stsp .. .GPE_NEXT "What about standard I/O?" .TITLE "Bugs in libraries: examples in standard I/O" .BL .LI fgetwc(3) didn't set the error indicator for encoding errors (fixed Dec 2015) .LI fputwc(3) didn't set the error indicator for invalid characters (fixed Jan 2016) .br The Austin Group thinks that even the C standard itself is buggy here. .LI fgetws(3) discarded any characters read and reported bogus EOF when errno happened to be EILSEQ upon entry and the file ended without a terminating L'\en' character (fixed Jan 2016) .LI fgetwln(3) ignored most encoding errors and .br sometimes returned partial lines truncated at random places (fixed Aug 2016) .LI printf(3) %ls destroyed all file flags on encoding errors, .br making the file permanently unreadable and unwriteable (fixed Jan 2016) .LI printf(3) silently treated encoding errors in the format string .br as the end of the format string (fixed Jan 2016) .LI printf(3) accessed a NULL pointer when out of memory or on encoding errors (fixed Jan 2016) .LE .MULB 5i 0.2i 4.8i .PSPIC -R Images/ElpocaRock16.eps .MULN .S -4 .sp 4v The scree of the Rock Glacier (ca. 2100m), Pocaterra Valley, .br and the Elpoca Mountain (3029m) .S P .MULE .GPE_TIME 60 .ig - fgetwc(3) - set the error indicator when an encoding error occurs lib/libc/stdio : fgetwc.c 24 Dec 2015 12:55:39 -0700 (MST) - fputwc(3) - When encoding fails in fputwc(3), set the error indicator as required by POSIX and as FreeBSD, SunOS 10/11, and glibc also do it. Note that an enquiry to the Austin Group led to the conclusion that this change probably violates the C standard: C and POSIX unintentionally conflict. But the POSIX behaviour makes more sense (easier to write correct error handling code for it, and a lower risk that programs miss errors) and is much more widespread, and the Austin Group intends to approach the C committee in order to adjust the C standard. See: http://austingroupbugs.net/view.php?id=1022 While here, do not set errno a second time, wcrtomb(3) already did that, and it is required to do it by the standard. OK millert@ and tedu@, and jca@ no longer objects lib/libc/stdio : fputwc.c 26 Jan 2016 06:57:02 -0700 (MST) - fgetws(3) - When errno happens to be EILSEQ upon entry to fgetws(3), and when the file ends without a terminating L'\n' character, fgetws(3) discarded any characters read and reported bogus EOF. Never inspect errno(2) unless right after an error occurred! OK millert@ 4 Jan 2016 09:14:19 -0700 (MST) - fgetwln(3) - ignores most encoding errors - printf(3) - Fix lots of bugs. 1. When fprintf(fp, "...%ls...", ...) encounters an encoding error, do not destroy all the fp->_flags, which made the file permanently unreadable and unwriteable. 2. Do not change fp->_flags at all in case of encoding errors. Neither the manual nor POSIX ask for it, no other conversions set the error indicator, and it isn't needed because the return value reports failure and must be checked anyway. 3. Detect failure in mbrtowc(3), do not silently treat invalid bytes in the format string as the end of the format string. 4. Detect failure of __find_arguments(), no matter whether due to out of memory conditions or encoding errors, and gracefully fail rather than accessing an invalid pointer. 5. Remove the pointless and slightly dangerous errno = EILSEQ overrides after functions that already do that and are required by the standard to do so. OK jca@ on items 1, 2, and 5. OK millert@ on the complete diff. "Completely brutal mix of bugs." deraadt@ lib/libc/stdio : vfprintf.c 4 Jan 2016 08:47:48 -0700 (MST) .. .GPE_NEXT "More examples?" .TITLE "Various other errors in the C library" .BL .LI The character property tables contained no data whatsoever .br for characters in the range U+FF00 to U+10FFFF. (fixed Oct 2015) .LI Due to a bug in mklocale(1), character type and width data was wrong .br for many exotic characters designating numbers. (fixed May 2016) .LI The C library code parsing character property tables contained .br out of boundary memory access for corrupt input files. (fixed Oct 2015) .LE .PSPIC Images/ElkSouth.eps .GPE_SM "Repair of the Highwood River (1870m)\ after the 2013 flood below the Elk Range (2750m)" .GPE_TIME 45 .ig - properties - update all of en_US.UTF-8.src share/locale/ctype: en_US.UTF-8.src 31 Oct 2015 14:56:19 -0600 (MDT) - locale description file handling - Validate input files to prevent out of boundary accesses. lib/libc/locale: rune.c 6 Dec 2015 04:54:59 -0700 (MST) - mklocale(1) - Delete encoding code for the unused TODIGIT information. I'm not aware of plans to add any TODIGIT support, and when shown on tech@, people were more or less indifferent and showed confusion about what this code even did. But the encoding code was buggy, in particular lacking validity checks, and hence clobbered other important data, in particular character type and character width data, with consequences that are hard to judge. usr.bin/mklocale: mklocale.1 yacc.y 8 May 2016 09:25:44 -0600 (MDT) .. .GPE_NEXT "Another library?" .TITLE "Examples of bugs in libedit" .sp -0.5v .BL .LI el_wgetc(3), el_wgets(3) etc. sometimes discarded valid bytes after reading invalid bytes .LI el_getc(3) silently converted non-ASCII Unicode characters into bogus bytes .LI el_getc(3) didn't set errno(2) for out-of-range errors .LI el_getc(3) didn't set the return argument to the NUL byte on read errors .LI Several functions reading characters broken on systems where wchar_t doesn't use UCS-4. .LE .P All these bugs were found during a partial audit and fixed in March 2016. .PSPIC Images/StormSouth.eps .GPE_SM "Crumbling rock on the south face of Storm Mountain (3095m)" .GPE_TIME 50 .ig - Fix the CHARSET_IS_UTF8 case in read_char(). 1. After reading an invalid byte sequence, do not throw away additional valid bytes; fix by me using mbrtowc(3), obsoleting utf8_islead(). lib/libedit : chartype.h read.c 20 Mar 2016 11:19:48 -0600 (MDT) - Fix read_char() for the non-UTF-8 case, in particular for systems supporting other multibyte locales or having an internal representation of wchar_t that doesn't match UCS-4. No functional change on OpenBSD, but it makes the code less confusing. OK czarkoff@. lib/libedit : read.c Sun, 20 Mar 2016 12:20:10 -0600 (MDT) - Fix the public interface function el_getc(3). On OpenBSD, the effects are to set the return argument to the NUL byte in case of a read failure (for robustness) and to properly set errno when the character is out of range and cannot be stored in a byte. Once we enable UTF-8, this will be needed to avoid returning bogus bytes for valid Unicode characters. On systems where the internal representation of wchar_t doesn't match UCS-4, breakage was potentially even worse. OK czarkoff@. lib/libedit : chartype.h eln.c 20 Mar 2016 13:14:30 -0600 (MDT) .. .GPE_NEXT "Examples in stand-alone programs?" .TITLE "Examples of bugs in various programs" .BL .LI mandoc(1) violated ISO C99 by mixing putchar(3) and putwchar(3) on the same stream, .br resulting in corrupt output on glibc (fixed July and September 2016) .LI mandoc(1) accepted UTF-16 surrogates in \e[uXXXX] escapes .br and silently produced invalid UTF-8 output (fixed Oct 2015) .LI mandoc(1) failed to apply bold and italic markup to non-ASCII characters (fixed Oct 2015) .LI tmux(1) contained wrong display widths for various characters .br in its internal width tables (fixed Nov 2015) .LI ypldap(8) contained buggy hand-rolled UTF-8 validation code that failed to actually validate .br but instead caused buffer overruns and loss of input data for invalid input (fixed Apr 2016) .LE .MULB 5.8i 0.2i 4i .PSPIC -R Images/SmutsCreek.eps .MULN .S -4 .sp 8v Even the tiny Smuts Creek (ca. 1900m) .br was washed out and needed repair. .P Background: Spray Mountains (up to 3400m) .S P .MULE .GPE_TIME 60 .ig - man(1) - reject surrogates in \e[uXXXX] escapes usr.bin/mandoc : mandoc.c 13 Oct 2015 17:30:42 -0600 (MDT) - apply bold and italic to all non-ASCII Unicode codepoints usr.bin/mandoc : term.c 23 Oct 2015 08:49:13 -0600 (MDT) - ISO C99 7.19.2.5 doesn't like mixing putchar(3) and putwchar(3) on the same stream, and actually, it fails spectacularly on glibc. Portability issue pointed out by Svyatoslav Mishyn after testing on Void Linux. mdocml: main.c main.h term_ascii.c 8 Jul 2016 17:29:35 -0500 (EST) - tmux(1) - update the internal wcwidth(3) table usr.bin/tmux : utf8.c 5 Nov 2015 09:44:25 -0700 (MST) - ypldap(8) - Simplify overengineered and buggy code that looked like as if it did some kind of UTF-8 validation, but actually didn't, but instead, for malformed UTF-8 input, caused buffer overruns in some cases and caused skipping of valid ASCII characters in other cases. Problem originally discovered and fix OK by stsp@. eric@ agrees with the direction. usr.sbin/ypldap: aldap.c 27 Apr 2016 04:53:27 -0600 (MDT) .. .GPE_NEXT "Something really important?" .TITLE "The situation in OpenSSH" .nr LspSave \n[Lsp] .nr Lsp \n[Lsp]/2 .sp -1v .BL .LI Input and output streams are treated as narrow throughout. .LI In most places, incoming text data is treated as opaque byte strings, .br simply passing it through, not even trying to validate or decode. .LI Where text data is interpreted, the code does not restrict itself to UTF-8, .br but attempts to support arbitrary encodings, though not always in full generality. .LI Informational and diagnostic messages are written in ASCII throughout, .br which is compatible with many, but not with all encodings. .LI In scp(1) and sftp(1), \m[red]untrusted text data sent from the server\m[] - for example file and directory names - could silently screw up terminal settings on the client host in addition to wrong data being displayed. Part of that was fixed in May 2016 using validation and sanitation techniques. Some fixes could not yet be committed because part of the output is produced in signal handlers. .LI For exotic encodings, several unknown bugs likely exist. .LE .GPE_MULB 5i .BL .LI For maximum security, make sur .br both endpoints run OpenBSD .br (to avoid exposure to arbitrary locales) .br and set LC_CTYPE to UTF-8 on both sides. .LI sftp(1) libedit usage needs review with .br respect to multibyte character handling. .LI Overall, \m[red]auditing has barely begun\m[]. .LE .MULN .PSPIC Images/SheepValleyTrails.eps .MULE .nr Lsp \n[LspSave] .GPE_TIME 120 .ig - usability and security issues with non-UTF-8 locales in base - ssh, scp, sftp To prevent screwing up terminal settings when printing to the terminal, for ASCII and UTF-8, escape bytes not forming characters and bytes forming non-printable characters with vis(3) VIS_OCTAL. For other character sets, abort printing of the current string in these cases. In particular, * let scp(1) respect the local user's LC_CTYPE locale(1); * sanitize data received from the remote host; * sanitize filenames, usernames, and similar data even locally; * take character display widths into account for the progressmeter. This is believed to be sufficient to keep the local terminal safe on OpenBSD, but bad things can still happen on other systems with state-dependent locales because many places in the code print unencoded ASCII characters into the output stream. Using feedback from djm@ and martijn@, various aspects discussed with many others. deraadt@ says it should go in now, i probably already hesitated too long usr.bin/ssh : progressmeter.c scp.c sftp-client.c sftp.c usr.bin/ssh/lib: Makefile usr.bin/ssh : utf8.c utf8.h 25 May 2016 17:48:45 -0600 (MDT) Fix two rare edge cases: 1. If vasprintf() returns < 0, do not access a NULL pointer in snmprintf() , and do not free() the pointer returned from vasprintf() because on some systems other than OpenBSD, it might be a bogus pointer. 2. If vasprintf() returns == 0, return 0 and "" rather than -1 and NULL. Besides, free(dst) is pointless after failure (not a bug). One half OK martijn@, the other half OK deraadt@; committing quickly before people get hurt. usr.bin/ssh : utf8.c 30 May 2016 06:05:56 -0600 (MDT) Even when only writing an unescaped character, the dst buffer may need to grow, or it would be overrun; issue found by tb@ with malloc.conf(5) 'C'. While here, reserve an additional byte for the terminating NUL up front such that we don't have to realloc() later just for that. OK tb@ usr.bin/ssh : utf8.c 30 May 2016 06:57:21 -0600 (MDT) Backout rev. 1.43 for now. The function update_progress_meter() calls refresh_progress_meter() which calls snmprintf() which calls malloc(); but update_progress_meter() acts as the SIGALRM signal handler. "malloc(): error: recursive call" reported by sobrado@. usr.bin/ssh : progressmeter.c 30 May 2016 12:34:41 -0600 (MDT) support UTF-8 characters in ssh(1) banners using schwarze@'s safe fmprintf printer; bz#2058 feedback schwarze@ ok dtucker@ usr.bin/ssh : ssh.c sshconnect2.c 16 Jul 2016 22:20:16 -0600 (MDT) .. .GPE_SECTION CONCLUSION .GPE_NEXT "Conclusions" .TITLE "Conclusions" .BL .LI Multibyte character handling code in full generality is huge, complicated, buggy, must error out on the slightest problem, and using arbitrary encodings is never fully secure. .LI Consequently, providing full multibyte support in a general-purpose POSIX C library is bad for usability, correctness, and security. .LI Instead, better leave full generality where it belongs: In dedicated conversion and text processing software. .LE .GPE_MULB 5i .BL .LI \m[red]Supporting UTF-8 only allows good usability, simplicity, and hence a better chance for correctness, reliability, and security.\m[] .LI No silver bullet found yet to solve all implementation tasks in one unified way, but every practical task encoutered so far allowed a simple solution. .LI A full toolbox may require up to ten different simple implementation techniques, each one adapted to a specific set of requirements. .LI Look at the source code of the OpenBSD utilities mentioned here for guidance how to solve similar tasks. .LE .MULN .PSPIC Images/MountRundle.eps .GPE_SM "East End of Rundle (2550m) seen from Canmore" .MULE .GPE_TIME 90 .GPE_NEXT "Future directions" .TITLE "Work to do in OpenBSD" .BL .LI Small utilities almost done, but a few remain: lam(1), pr(1), talk(1), tr(1), ... .LI Continue audit: libc, libedit .LI POSIX regular expression library, fnmatch(3), glob(3) .LI Work was barely started: ksh(1), OpenSSH .LI Work was not yet started: vi(1), mg(1) .LI Unknown status, maybe not much to do: libcurses, less(1), ... .LE .MULB 5.8i 0.2i 4i .PSPIC -R Images/JohnsonTrail.eps .MULN .S -4 .sp 10v Outlook from The Hump (2020m) .br along Johnson Trail (Hwy. 532) .br across the foothills towards the prairies .S P .MULE .GPE_TIME 60 .GPE_NEXT "Who contributed?" .GPE_MULB 7i 0.05i .TITLE "Thanks!" .BL .LI The OpenBSD foundation provided financial support for part of my time. No way to make so much progress without that \(em but without the contributions by lots and lots of other developers, nothing would have been achieved at all: .LI \fBMarc Espie\fP: initial import of partial multibyte handling code .LE .MULN .PSPIC Images/PrincesIsland.eps .GPE_SM "Bridge to Prince's Island, Calgary" .MULE .sp -1v .nr LspSave \n[Lsp] .nr Lsp \n[Lsp]*6/10 .BL .LI \m[red]Stefan Sperling\m[]: import of partial Citrus code; contributed to many of the ideas explained here; joint work, patch reviews, bug reports, many useful discussions .\" citrus [mult], ypldap(8), man(1) .LI \m[red]Ted Unangst\m[]: developed many of the ideas explained here; many patch reviews \" stdio, cut(1), fmt(1), ksh(1), rs(1), wc(1) .LI \fBAnthony Bentley\fP: contributed to many of the ideas explained here; joint work, bugfixes, patch reviews, bug reports, many useful discussions .\" citrus [mult], man(1) [2], uniq(1) .LI \fBAndrew Fresh\fP: gen_ctype_utf8.pl author and maintainer; .br bug reports and feedback on some patches \" pod2man(1), mklocale(1) .LI Martijn van Duren: fix rev(1), wall(1), write(1); many code reviews for libedit; some initial ideas and lots of feedback for scp(1) and sftp(1) .\" and patch reviews for stdio and xterm(1) .LI S\('ebastien Marie: joint work, some bugfixes, patch reviews, useful discussions .\" citrus [mult], ksh(1), write(1), man(1) .LI Vadim Zhukov: review and meticulously refine the TODO list; .br feedback concerning some of the ideas explained here; some patch reviews \" cut(1), uniq(1) .LI Todd Miller: large numbers of patch reviews in libc and utilities .\" libc/stdio [6], libc/locale, libedit, .\" fmt(1), ps(1), mklocale(1), wall(1), wc(1) .sp -1v .LE .nr Lsp \n[LspSave] . .GPE_TIME 80 .GPE_NEXT "Who contributed?" .MULB 8.3i 0.2i 1.5i .TITLE "Thanks! (2)" .BL .LI Theo de Raadt: support for the OpenSSH, stdio, and citrus patches; support for the general direction with small utilities; a few patches and patch reviews \" ftpd(8), rs(1), w(1) .LI Dmitrij Czarkoff: many code reviews for libedit and patch reviews for utilities \" cut(1), fold(1), ls(1), ps(1), rs(1) .LI Damien Miller: patch to allow UTF-8 in ssh(1) banners .br and much support working on OpenSSH code in general .LI Nicholas Marriott: UTF-8 work in tmux(1); .br feedback concerning some of the ideas explained here .LI Christian Weisgerber: feedback concerning some of the ideas explained here; important suggestion and patch review for xterm(1) .LI Peter Hessler: important contributions to some of the ideas explained here .LE .SUBTITLE "Additional patches, patch reviews, help and feedback from:" Darren Tucker, \" feedback on OpenSSH patches Eric Faurot, \" review of ypldap(8) patch Giannis Tsaraias, \" review fold(1) patch, tested with afl(1) Jason McIntyre, \" help with manual pages J\('er\('emie Courr\(`eges-Anglas, \" review of four patches for stdio Jonathan Gray, \" bug fix in locale/rune.c Martin Natano, \" feedback regarding ls(1) Martin Pieuchot, \" review citrus and ksh(1) patches Masahiko YASUOKA, \" review ls(1) patch Masao UEBAYASHI, \" feedback concerning some of the basic ideas Matthieu Herrb, \" review xterm(1) patch Philip Guenther, \" review of stdio patch Stuart Henderson, \" review ls(1) patch Theo B\(:uhler, \" review OpenSSH patch Tobias St\(:ockmann (OpenBSD), \" bug fix and patch reviews in locale/rune.c Andrey Chernov (FreeBSD), \" important help with fgetwln(3) Christos Zoulas (NetBSD), \" maintaining libedit, checking patches Svyatoslav Mishyn (Void Linux), \" bug report: mixing put[w]char in man(1) Christian Heckendorf, \" some code reviews for libedit Frederic Nowak, \" some initial ideas for ksh(1) emacs mode Matthew Martin, ... \" bug reports for fold(1) .MULN .sp 1i .PSPIC Images/ElbowFlower1.eps .sp 1i .PSPIC Images/ElbowFlower2.eps .MULE .GPE_TIME 10 .ds gpe_next The end. .\"so fin/images.roff ¡®Yes, sir. I felt sure you understood that. She said she had told you.¡¯ "Why, eh,--I--I don't know that my movements need have anything to do with his. Yours, of course,--" "Ah, but if it saved your life!" "No, I'm not," grumbled the Doctor, "I've had enough of this wild-goose chase. And besides, it's nearly dinner time." "I am coming to that," Lawrence said, lighting a fresh cigarette. "As soon as Bruce was in trouble and the plot began to reel off I saw that it was mine. Of course there were large varyings in the details, but the scheme was mine. It was even laid on the same spot as my skeleton story. When I grasped that, I knew quite well that somebody must have stolen my plot." Judy In a coach-house, through which we passed on our way to see the prince's favourite horses with the state carriages¡ªquite commonplace and comfortable, and made at Palitana¡ªwas a chigram,[Pg 68] off which its silk cover was lifted; it was painted bright red and spangled with twinkling copper nails. This carriage, which is hermetically closed when the Ranee goes out in it, was lined with cloth-of-gold patterned with Gohel Sheri's initials within a horseshoe: a little hand-glass on one of the cushions, two boxes of chased silver, the curtains and hangings redolent of otto of roses. "Are you certain of it? You have seen so very little of him, and you may be mistaken." "And your wife?" "I drawed on my man's bundle o' wood," said Gid, "and then dropped a little, so's to git him where he was biggest and make sure o' him." HoME²¨¶àÒ°½áÒÂ×óÏßÊÓÆµ ENTER NUMBET 0016longidc.com.cn
www.jdzrctc.org.cn
www.glchain.com.cn
gdhyzdh.org.cn
www.hebiao2.com.cn
pf9jf.com.cn
uxbwpb.com.cn
www.szcct0755.com.cn
tsbxrm.com.cn
nijmsu.com.cn
欧美西方美女艺术图片 美女掰b图片 外国操逼成人片 肏屄蓝魔mp5官网 骚穴骚影网址 藤冲有关的电视剧 偷拍包厢内少男少女激情狂欢 大中华色露脸对话 黄片俄罗斯大学校园 久久炮图 类似狠狠鲁的网站 台湾大佬自拍偷拍网 人于兽性爱小说 美女上厕所刚出炉图 来插妹妹的小穴 美红换交 丝袜女女舔足 性爱换妻乱伦故事 女同性恋热吻优酷视频 苍井空裸阴毛 日本美女百濑图片 成人毛片快播高清影视 宫藤新一的性福生活 成人小妹 少女金频双艳 人与shuo 武汉教室做爱吉吉 人体艺术黑木耳西西 射极品空姐超碰 骚女自慰照片 江西公安局政委用枪逼儿媳妇通奸 浅川花里子 粉嫩逼贴图 一进大门观四方 诱奷表妹 和大姨子肏屄之续 日本 大胆人体艺术图片 松岛枫 中国女裸名单 美女露体没有马赛克 国产强奸乱伦加电影yingyinxianfeng 熟女人妻15p 诱奸 少妇 小说 天使的眼泪黄色群 伦理 偷拍 皮皮 骚女浪妇乱伦爱爱 幼幼性交比赛 魔王av亚洲无码bt 就要在线撸电影站 快播欧美荡妇聚会 影音先锋韩国女主 亚洲 性交图 看黄片不用要播放器 猪猪影院百度影音艺术片 人性交电影 妹妹看黄色片 浙江真实乱伦迅雷 一个最真实 的我 一个 的我 的我 歌曲 成人爱爱在线观看 色妹妹天天撸 插妹妹成人电影院在线观看 风间由美熟女人妻黑丝商务交会种子 男性大鸡巴被操的故事 人之初性本善 致橡树原文 神经系统体格检查 狠狠撸操干妈图片大全 中国成人电影小说 日姥姥性爱淫乱三从四德群交30p 爱爱jj发综合网 黑木香一经典番号 黄蓉与刘老汉 大黑鸡大吧操妈妈 印尼大胆人体艺术 草别人媳妇 人体艺体肉欲 互联网jiqingwuyue 白虎穴尿尿艺术图片 操操淫乱穴 寡妇自拍 三级色逼一片 林心如两腿分开 人与动物三客优 金伟哥vga说明书 日本av女优馒头逼图片 十九摸 儿子和母亲激情性交 有快播可以上的色网址 幼幼色色 ribenchengrenxiaoshuo 外国性爱网站 欧美美女人穴 性爱小说撸啊撸 呕美操逼图 女人的屄照片 人体艺体阴部插图片 百度云照烧铁板 我的八年性情史 淫香淫色插逼图 有黄的的qq号 欧美色亲片 zuixinmuziluanlunxiaoshuohetupian 杜比影院最佳座位 白雪公主摸乳干 美女露屄毛的图片 绫川まどか在线先锋 影音先锋操大奶妹 手机动漫成人电影网 强奸女护士 亚洲陈丽佳人体艺术 欧美经典七夕电影 WWWANGOULECOM 我和村嫂乱伦 母子通奸网 亚州色图就射 www777影音先锋 邓紫棋图片亚洲色图偷拍自拍 白嫩美女做爱爽图36p 瑶瑶的淫叫声 大树梨纱 人体私处大胆艺术图 母子插逼视频 母子淫水 狠狠干五色天 换妻17p 性之图吧偷拍自拍亚洲色图 激情性爱动漫欧美 WWW53AVCOM 启东操逼 艳母峰臀 欧美激情裸体艺术图片 激情明星狠狠碰 欧美强奸少女图 孟十朵 妻子玲儿 黑大炮群交内射白虎视频 西西人体艺术大胆人体艺术亚洲 白领穿丝被干视频 亚洲电影第10 归乡义母种子 骚丝袜老师 性爱亚洲色图大鸡巴 长谷川美红母 2014uuucom 神马影院未来电影在线 强奸少妇图片乱伦小说 我与妈妈做爱抽插操逼 ss5678 美丽的丝袜老师妈妈4 樱井知香裸体照 摸女人大胸图片 丁香社操逼 操美女av 久久热无码在线视频 去去去电影 女性尿道口真人版图片无衣 liaozaiyantan成人免费电影网站 超碰视频网友自拍第一页 色波波影院 迅雷下载 诱色天使香香裤高腰款 熟妇张柏芝丝袜 2015最新操 酒鬼电影院会员 秋露伦理Av 咪咪色色老师小说 i嘿片网 AV换妻电影 风间由美club 奶子大骚Pmsewuyecom 自拍爱色综合社区 手机看片操女优 av丝袜教师手机天堂 16668伪C0m 花俏小姨子mp4 久久撸久久肏开心五月 搜索sexinsexnet 厨房干同学妈妈的屁眼wwwwanren8netmwanren8net 日本小金井 美国十次啦老熟女 www色色网络com 天堂网妞赶 插蝴蝶逼 青青3p luanlun爱 小姐的的性感生活作爱片 激情短篇小学 牛牛超碰免费公开在线视频 日夜色先锋资源站 在线网站AV黑人wwwgzyunhecom 手机av受美国法律保护 性感老师校园春色 天天啪夜夜操www9eyycom 洪爷小说幼女交换 美女洞毛电影 雪白的古典武侠 三级片三级片的坏的视频播放 小说专区乱伦小说目录99 色第四色wwwav565com色就是色 舔岳母的小穴 肥佬能用的三级片网站 淫荡的肉弹美女教师 熟女香蕉无a在线视频 美女操逼黄色大片 www1111r80s 欧美色图迷人的骚女15P 哥哥干av成人社区男人天堂 武藤蓝av 搜索炼狱岛 精液降头 偷拍女厕所自慰图片 xxoo成人影院熟女 freefronvideos人母 卡通图片另类变态 春药潮吹痉挛视频 2017最新www超碰com 最新kkbokk 欧美露大乳图 激情电影乱伦小说 邻家丰满少妇 绑架调教性感丝袜性奴女 风流情哥哥网站 激情小说加多撸 欧美性虐哥哥射 变态淫荡 seyuyue 人体穴鲍图 伊人在线视频变身6 中国十大禁书黄小说 国产自拍秀mp4 性交口交天天撸 裸体摄影优果网 少妇人妻自拍图区 久草在线新时代3wwwczyzxcnzxzy8com 最大胆一千美女 大几把骚逼 全国最大三级图文 干了校长的骚穴 与妻侄女 户田惠梨香av片 家庭乱伦之人妻 淫贱孝姨 干美女狗趴 变态孕妇母乳片 sss480 偷拍穴毛 yazhouxingjiaosetu 明日花绮罗京都不伦妻 se牛牛视频网址 婶婶的原味内内 偷拍自拍做爱操逼 天天瑟瑟天天撸 蕾丝兔宝宝图片 华为网盘饭岛爱 天天鲁鲁天天在线 岳母与女婿性爱故事 2016年的小萝莉网站 狐狸色成人AV网站 18岁女主播直播自慰 qinglouscom 农夫激情基地 免费小萝莉自慰在线直播 wwwmmtt11me 性交透明内衣美女乳头无遮挡 亚州av潮吹视频 高跟熟女性奴 淫秽黄色啪啪手机视频 幼同志 怡红院更新前的主页 青青草肛交灌肠视频在线播放 搜索100avcom 老熟女下一篇7p 国语对白AV在线观看wwwyaob111com 成人动漫在线免费 11ppdd页面访问升级 189df、com wwwsesoucom 舞会电影台湾 www550ai30in 伊人情人网综合wwwggsao58info 都市淫秽 朋友妻子穿着丝袜让我舔 激情网址五月天 奶大妞女 兽皇英国 老熟女喷水小说 淫淫撸 偷拍影音先锋电影网 网上聊操逼 www路ppp7cnm 色色成人Cy视频 性吧春暖花开性吧有你旧版 丝袜制服人妻交换 鬼父第十七集番号 映像av加勒比先锋影音 wwwtutu10cnM 欧美乱伦18p 大香蕉超 国产看片点我 日本女穴 阿v手机天堂 稻田淫 wwwyouijzzsmcnm 我的办公室老婆三邦年 郑重声明我们立足于52avav xingnuchuanqi 黄色电影露阴毛 黄色录像番金莲 AV91 影音先锋强奸影片 亚洲在线a手机 曰本女优在线www5sdlcom 帮帮鲁色老汉 在线成人小视频下载 黄色网站都有那些 大胆人体露阴艺术 浅田美姬 草榴免费视频 捷克??机 丝袜美女被性侵 日日性 丝袜人体艺术偷拍 凹凸视频在线av 情欲超市小说 西西里大胆艺 黄片免费网址 东方a∨在线亚州色图狠狠撸 苍井空超短裙丝袜诱惑图片 奇米成人影视色和尚、色尼姑 ckck爱情电影 恋爱记录短片分享 局长成长史686 AV成人播放器免费的 纯洁的玛利亚邪恶漫画 五月激情综合狠狠色 wwwbbb552cm 性插网页 美女丝袜撸撸五月天 色色泡泡影院 一部女生被插jj的完整黄片 wwwadcrrr222con 秋霞伦理片在线播放 教室调教老师 亚洲色图日本AV 很很艹 日骚妇内射在线视频观看 ya亚洲麒麟色影影院 qqb66666 亚洲色图欧美色图美腿丝袜 曰一日 农村少妇电影magnet 移动成人你射精 20158韩国女演员激情视频合集 酒店小姐裸体艺术照 足浴小姐做爱过程 av女教师自慰动态图 AV影音先锋影院 123红色播放 女友自拍偷拍刺激 wwwseebimei 123CTCTCOM 婷婷xx youjuzz小说专区 一级黄色wangzhan 男人体摄影 裸体两性 欧美精品超碰 强奸小村花千骨 幼少女肏屄视频18 五月婷婷婷婷五月丁香 色魔在线 国产父女乱伦小说 KTKX089 www454HUcom 在线手机播放器 偷拍自拍32 成人激情图片,电影mmmnn7777 ww777rvvom WWW9itKcom 美国名模啪啪啪 用力的操狠狠的干 小浪穴妹妹亚洲色图 525zzzcom 乱伦妈妈15p 唐伯虎点秋香不是三级片 最大色牛牛 欧美色kuaibo 手机在线tokyo 欧美幼女网mp4 760yycom 少妇干净迷人鲍优优 绳艺magnet 身穿民族服饰的中国少数民族漂亮美女大胆人体艺术7国内 www47escomxz34 Www2222magnet 色偷怕自拍视频 丁香五月天拍拍播放` 青苹果影院噜噜妈妈 公然妄想露出在线 图片区偷窥自拍亚洲色图欧美色图动漫图片美腿丝袜清纯唯美乱伦图区电 黄色三级片77天天撸 美国女孩成人网站 东京热亚洲色色 超碰av大帝在线视频 西瓜成人资源网 一级片城年5 孝姨大阴唇 pp529com 青青色草在线 504hu迅雷下载 殴州1314 母子乱论视频 微信自拍成人视频在线观看 www44cim 东方av官方 297Pmp4 骚av老师 小明看看成人永久免费视频在钱imgcctuocom 越南人体艺术露鲍 两个女人用道具做爱 影音先锋制服丝袜偷拍 爱搞搞爱撸撸爱色堂 闹洞房就去干 男根的诱惑系列 樱井亚莉偷拍自拍 色dogcom 里中结衣在线观看 特大鸡巴碰上大波霸 躶体狂插相片 熟女撒尿视频 国产成人在线视频网站 武汉18中教师门 刘亦菲阴道毛多吗 操b网址大全 欧美性生活色图 母子尾交 图 操山村老大妈 淫母之穴 00后人体图片少女无毛掰开图片 6655人体艺术果果人体艺术波谷人体艺术 刘嘉大胆人体艺术 欧美色图 成人动漫第一页 淋浴做爱av 骚逼yaoyao 9115视频在线资源sss 黑屌做爱爽片 女优性交免费电影 欧美熟肥女图片 韩国女主播朴妮唛的黄色小说 撸哇哽播 在线自拍干幼女 高清晰自拍偷拍图色色网 去哦v大 欧美父女性爱 淫秽网站肥女视频 666亚洲无码 最新日韩乱伦小说网站 操少妇游戏 日本少妇11p图片 什么片好黄 多人合集9部 河合优衣ed2k 狠狠撸美女手掰穴图片 女儿交换乱轮 欧美妈妈和她的大屌儿子 富婆和年轻帅男性交 夫妻居家性爱自拍 色色偶性爱自拍 强奸模特小说 少年与熟妇爱图 体操美女之性生活 色骚逼在线高清播放 www1314xxx 韩国三级片一对男女在大学教师xxoo后来女的怀孕了男女结婚后女的跑了男的和 夫妻坐爱一级片 成人男女做爱视频 人体艺术图片绘狗网 入江辉美在线电影 鲁av影院 动漫同志片 风间由美爱爱网站 乱伦3p生活 最美妙的骚逼 王梦溪迅雷种子下载 我姐尻屁片 亚洲色图人妻p 日本人体艺术波之轩 偷拍黄色照片 汤加l丽裸照 伦理片日本家庭教师 操逼撸撸撸吧影院 WWW_BB152_COM 保险知识 左边杨丞琳 女人淫乱图 穿挺屄裤子的图片 色色公公与好儿媳 苍井空完全服从 色姐姐乱依 人妻乱伦星野绫香 熟女的角色扮演性爱快播 快播奸少女阴道电影 都市激情撸情 女人小穴很很日 狠狠肏老婆 亚欧人体摄影 中华医药艾叶作用在线视频 美女粉乳头10p 俩根大吊插一个美眉 品色堂俺去也 l少女luluse 模逼图片 山下智久柚木提娜种子ed2k 桃色播播激情五月天 沙绪里与狗 橹二哥影院影视先锋 小说快播综合网 12306影院第一页 巨乳无码xfplay 萩原舞电影 学生逼逼电影 欧美群交欧美色图 操少妇的逼短片 韩国主播吉吉影音 影音先锋厕所偷拍片 欧美熟女系列 鹿鼎淫记 3p性感尤物内射她的小骚穴 国产强奸幼幼逼 屌配屄毛 美女做热爱性交口交 女主人的人体厕所 顶级黄色图片可看到阴道口 无码少妇在线色 美女吹萧爽吗 黄色网站是多少翱 妹妹视 人体模特毛婷人体艺术 佐藤爱理番号 开心宝贝色播网 大屌巨乳系列 重温陈冠希图片做爱 美女豪乳50p 黑网袜性爱 操熟女老师 日韩av逍遥社区 安装香港恋夜场秀 美女制服诱惑男女 亚洲人妻岛国线播放 色小说色图色电影 日韩美女映画网 成人动漫转帖区 www狠狠射c 露脸绝对领域 妻子地铁失身 淫老婆电影第一页 中文字幕都市激情家庭乱伦亚洲色图 日本美女特大胆裸体露逼 口工教师av 色色999偷拍自拍日韩美女 www968ddcnmagnet jjjhhh1com日本 老王社区lw78cc ed2kyounv 美女口交舔逼小说 类似古丽阁网站 帅男同的鸡鸡ed2k 少妇的小骚玩 av黄鳝自慰小说 经点乱系列 百度久久做爱视频 a狼电影网成人 干小嫩b10Pwwwneihan8com 原国产母子做爱乱伦 美国十四拉人体艺术 老公我要插深点快点啊 涩涩乱伦小说 波霸暴露 994yyccm 美国女人和动物zzzwww 后入式微拍 中国国模03150p 色色色乱伦熟女图片 欧美大胆嫩肉穴爽大片 这个经典的给你吧rr123win 老婆乱伦片 另类少妇AV 巨乳俏女医漫画 WWWSWWWMITAOCOM 糖糖激情操逼 男屁眼被曰小说 yinsezonghewang 爆操人妻熟女15P 福利脱衣麻将 久久热偷偷撸黑丝袜免费经典视频 电影三级mp4黄色电影 超级黑人巨屌操白妇 人妻偷拍自拍强奸 男男激情小说肉文 韩日女优大奶视频 杀神有声小说 有声性小说mp3mp3 天春色图片 春色龙 樱井莉亚口才 日本成人 求h网导航 h网游游戏 有没有免费的黄网 设置加www 开心网 五月天 的五月天 qvod东京热图片 东京热n0383.rmvb 皮皮看黄片 沙发看黄片 黄色小说短篇 大色鱼情迷 风月阁论坛 日韩色姐姐 色兔子成人 师傅搞综合 18人体艺术 amod在线 大M成人综合 我爱弟弟影院 给力QVOD色 最纯洁少女做爱 無双帝國谁有E谁有G adultbig影院 日日顺 久久爱 歪歪小说网 九型人格分析 分分操导航 2017 一本道va手机在线 youjizzⅹⅹx 婷影院 一楼一凤影院欧美首页 猫蛋蛋很黄 shenyifuli ssni 049bt下载 另类图片-色爱 汤姆影院atom55 五月丁香深爱基地 皮特成人影院 gay pornhub video chitu x77223 色色ev 客车偷拍图片大全 动漫av ftp 青青草免费线观综合网 兄嫁はいじっぱり认证补丁 人妻少妇视频系列 小林瞳电车痴汉代码 秋霞短片福利 青娱乐视频盛会 日韩成人午夜视屏 影音先锋 熟女系列 在线观看皇片你的懂的 原千岁全集协和影视 日本一级性交视频 日本人性交视频 又黄又色的影剧院 大型AV 好屌色狼 早乙女由依 最佳专辑 尹人影院大香蕉禄现 吉泽明步私处流出资源大 伦理战场 情欲影院云播 bestfornmc0m 搭讪 mgnet 玩农村小姑娘裂缝电子书 日本AⅤ无码在线观看 伦理聚合高请无码在线播放 b里香视频在线2白色爽 有个妮可舍宾怎么找她视频 妇女磁力下载 凹凸視頻免賛在線 caosaobi在线观看免费 东方影库300 wwwxyc123con ipz862在线观看 武则天一级全黄视频 迅雷下载 亚洲在线一区 丰满女人多毛 思思热在线色视频 噜死你们资源站 mum骑兵在线播放 欧美色图狠狠插 邪恶插阴口动态图 风月影院黄视频 黄片VX 厕所偷窥视频 ay电影院 wwwkp99 木村都那厕所无码 good电影神马小丢 黑人和中国人群交视频 色色网址在线 九月婷婷在线 久草在线首页老司机 500dh com福利 杀戮都市里番在线 电影天堂在线福利 淫色视频网 苍木玛娜教师链接 97人妻C○m piczz漫画 橘梨纱作品Bd播放 激情按摩胸部无码 yy6080最稳定的资源 shtv123 日本成人影片 magnet 里番姐弟的关系3 超级碰av公开在线 thzvip 草帽国产综合网 日本色木木 dv1456 老色哥第四色 亚洲av欧美av电影av视频 日日夜夜插天天插 桃花影院今日新鲜事 A片网主名字 大屁股啪啪无码高清视频 东方阿v视频在线最新 xxoo淫交视频 电击女神asia fox在线 东北可爱小骚妻又一次3p娇小身材力战大屌-9 扒开双腿拳交 给我来个娘们操逼的黄片黄片 日本黄色枧频 j鸡巴小視频 叛客与雷鬼高清 迅雷 下载 美女自慰流淫水 在线 人妻小悠福利在线 巨乳王瑞儿在线视频 啪啪肉捧往哪叉?视频 劲暴欧美 一本道 东京热 自拍 国产 无码 黑丝自拍做爱 538prom久久日逼 www骚逼 XXXWWWUUU 求个一本道的资源 杨幂醉酒视完正版视频在线观看 日本女用胸为客人性按摩影院 仲村里绪 影音先锋女主播视频网站 vtt944 国产自拍福利社视频 18v韩国主播 在线翘臀福利 校园多女一男类番号 微福利吃精 香坂里奈 热舞福利120 538porm在线插放视频 wwwtaosetvcom 日本三级天堂网无码 韩国歪歪漫画官网进入 欧美图片亚洲色理论电影 欧美AVmm625 爱色影分类 56popo体验区 www奔驰宝马成人网站 射丝袜足 在线影院 kanmitao1视频 巨乳女教师の诱惑电影 欧洲老女人肏屄 ygyg66怼粉逼 热带夜中文字幕mp4 洁泽明步 m3u8人妻 蛇精脸网红主播小兔兔现场啪啪大秀 mmavtv少妇 萝莉小妹妹av 免费做爱视频网站免费 AISS模特索菲 在线福利视频 多毛龟 66ffff视频 佐伯雪菜在线av先锋 51zhiyuan 播播影院免费A片 av5685 真人下体抽插下阴喷水视频 色喜 王丹 美女动态图张又黄又色 www,O3ⅩXX,Cm 国内在线自拍人人澡人人看 BT 冲田杏梨巨乳女教师の诱惑 神马影院午夜伦dy888mmm 大香蕉亚洲人妻小说 丝袜自慰视频在线观看 涩播音频 popo福利网盘2018 性污秽小视频 想上你日你视频 享悦国产在线 校园禁断介护 校花在浴池被强视频 香港十大禁片黄e 小苹果性交影院 老湿影院未成年人 Xxxxx161116骆驼祥子 熟女papa视频 亚州影院午夜-一 97jibayingyuan 色欲直播 处女中出视频 爱爱福利区 鸡巴操逼的黄片 500夜趣福利免费 亚州香焦视频 porn在线播放制服丝袜 91c仔内裤哥在线观看 wwwz9k6com 偷拍美女浴室伦理电影 4438x3全国最大情人网站 速看网在线观看 新妈妈 ftp 亚洲国际成人综合 4455qu类似的网站 水菜丽无码粪便av连接 wwwabc300wom 唐朝TV360 国产AV,亚洲AV 采桑洗浴中心 波多结野衣与无码观看 比比琼斯作品合集磁力 山谷两日 thunder 日本高清凌辱免费三级片 magnet 欧美虐性 av无码 中文字幕 迅雷 樱井步禁断介护在线 四房色播av 织田真子113在线观看 罗马狼窝影院 插进去拔出来综合网 台湾R级在线 SOD时间禁止器 s跳蛋调教视频 WANQUEYINGYUAN 超级碰夜色猫视频 成人日日夜拍拍 RCTD mp4 v ip eeusssvv 求手机能看哪种视频的网站 欧美激情第一页在线观看 俺去也看免费视频 视频小姐中出 三七影视成人福利播放器 伦理无料 日韩SM高清 赫敏被强奸视频 泄欲哥导航为什么看不了了 1啪啪啪视频app 色无极亚洲影院东京热 黄色福利1000 要要橹福利 乱伦1视频 岛田阳子拍了多少AV片 正在播放肛交视频 萝莉无码小视频 本田英里子MP4迅雷种子 厕所偷拍无修。 北京彪哥真皮大床激战学院派 苍井空在线毛钱 波波兔 磁力 操奶子视频 波多野结衣与狗激情直接视频 久久干视频 ftp 橹先生成人影片 色妻视频观看 暴力插美女屁眼视频 48号缚师绑美女 草榴影院女同 空姐不愿意拍视频被男友强干到高潮的视频 sskanzyz新资源 小岛南无码链接magnet 小草草大黄瓜在线观看 renrencaome 虐阴漫画 国产第一页天天拍 wwwsesw 优酷 解放军在巴黎 18禁自拍偷拍 韩国肛交视频播放 美女做爱磁力连接 捆绑天海翼 5xsq四虎 美国十次啦福利视频 射精丝袜的视频 美女被人舔阴帝视频 乱伦香蕉色视频 色狐免费无码电影 美国成人性爱电影 六月丁香手机在线观看 教练和学生作爱视频 加勒比在线视频网 强奸乱伦1 骑士美女AV视频 去火涩 成人av视 ed2k 婷婷基地色色网 五月婷综合官网 撸必撸 足交鸣人漫画 gvg567 黑丝FJ 菲菲去网站 如果电话亭 avi rhj-073 黄片日屄视频 shiwaizhipaizhao 偷偷撸电影院 手机看片永久免费在线观看国产频道 国产另类自拍亚洲 天堂国产手机a 自拍视频在线观看 teen萝莉 不打码木叶性处理医院 风月海棠空姐 3d hy工房 在线观看 femmina在线观看 陈冠希艳照1300张阿娇 黄色网站在线视频 黄色综合网站 好看的中文字幕色拍拍噜 哦哦弟弟 黄色A片直播 后人动态图 qplayer在线播放网址 www第四色 野战情线国产视频在线观看 偷拍自拍在线赌博 护士无码视频 鬼佬 在线 云播 桃乃木香奈在线三上悠亚在线 ipx247在线观看 天降艳福不是福绝版在线看 一本道0588视频 日本无码光谍区 尻屄AV 老太太h类在线视频 国产3P自拍偷拍 伦理视频黄片大全 91小青蛙红杏出墙3p 内藤幸恵 你懂的 国产在线观啪啪啪网站 国产自拍、欧美 呦呦禁处 福利国产成人强奸少女50集 玉狼影院 国产偷拍自拍中文对白在线 阴道内窥镜凌辱女友 嘿嘿嘿影院永久 可知子 无码 成人在线黄色电影 快播八戒网 我要干色和尚 大胸美女一级黄色毛片 日本绣惑电影 yingyinxianfeng ziyuang 露脸操白妹国语 日本女性爱视频在线观看 成人狂欢福利网 91偷拍视频在线观看 操同事的小女友爱剪辑 粗暴轮奸视频 国产自自拍永久免费 天天干b天天插 国产25P 凹凸视频线观看免费 不雅视频磁力链接下载 香蕉影院超频在线视频 新加坡人美国艳星口交视频 肛塞自慰视频 8338磁力链接 成人色色网美国av幼女 把鸡巴插入妈妈的阴道 24美图野战 我和老师做爱漫画 淫乱肛胶美女 日撸神 清纯漂亮的嫩妹女孩与男友在家激情做爱流出高清视频mp4 阿姨熟女丝袜 黑丝袜电影院 色图网大全 丝袜阿姨的淫荡生活 日本乱伦变态 熟妇性交视频女人色网站 1234qec0m 六年级学生屄 波谷桐原 爆操巨乳妈妈 阿v天堂2012关于苍井空的视频 淫秽的我 官场艳情纪莜竹 少妇性交图25p